Sources of data in a data lake will include all data from an organisation or one of its divisions. Unlike queries to the data warehouse or mart, to interrogate the data lake requires a schema-on-read approach. So, a data lake holds large quantities of data in its original form. A data mart or hub may allow for data that is even more easily consumed by departments. While the data lake contains multiple stores of data, in formats not easily accessible or readable by the vast majority of employees – unstructured, semi-structured and structured – the data warehouse is made up of structured data in databases to which applications and employees are afforded access. Perhaps there will be some metadata tagging to facilitate searches of data elements, but it is intended that access to data in the data lake will be by specialists such as data scientists and those that develop touchpoints downstream of the lake.ĭownstream is appropriate because the data lake is seen, like a real lake, as something into which all data sources flow, and they are potentially, many, varied and unprocessed.įrom the lake, data would go downstream to the data warehouse, which is taken to imply something more processed, packaged and ready for consumption. It is the repository for all data collected from the organisation’s operations, where it will reside in a more or less raw format. The data lake is conceived of as the first place an organisation’s data flows to.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |