Dr. Jens Dittrich, Professor of Information Systems Group at Saarland University.
Last year, Software AG formed a Scientific Advisory Board to provide it with a scientific perspective on trends – and potential trends - in technology. In this series of blogs, we highlight each Scientific Advisory Board member’s area of expertise and some of his views for the future.
Today we feature Dr. Jens Dittrich.
"With the advent of Big Data it has become more and more difficult to manage and analyze data through databases or No SQL systems.
Enter data lakes, which collect all of a company’s data in raw format in a central place - without enforcing schema or any other data-cleaning or data-import operations initially. In a data lake those operations are only performed as a second step. Thus, full flexibility for data alignment and analytics is preserved.
Business data has traditionally been kept in highly structured relational databases, as well as specialized analytical systems such as data warehouses. Technically, data lakes are typically implemented as a distributed file system (like HDFS), with all data belonging to a company collected there.
All further analysis - be it structured queries, data mining, traditional machine learning or deep learning - is then done in a structure-as-you-go fashion. For instance, the raw data in the data lake is distilled, cleaned, and enriched to crisp and clear information in steps using an appropriate combination of workflows and tools.
So, in contrast to relational database systems which “own” the data, in a data lake the data is not necessarily owned by any specific tool or system. The idea of a data lake is very similar to the concept of a “dataspace” where data from different sources gets integrated over time.
Data lakes are a good match for the typical exploratory workflow of data scientists, who rarely rely on managing data in relational database systems. Dive into a data lake and see how much is going on beneath the surface."