Building an analytical data lake with Apache Spark and Apache Hudi - Part 1

Using Apache Spark and Apache Hudi to build and manage data lakes on DFS and Cloud storage.

Most modern data lakes are built using some sort of distributed file system (DFS) like HDFS or cloud based storage like AWS S3. One of the underlying principles followed is the “write-once-read-many” access model for files. This is great for working with large volumes of data- think hundreds of gigabytes... [Read More]