Partner Integration: Delta Lake + StarRocks
What is Delta Lake?
Delta Lake is an open-source storage layer that builds on top of Apache Spark and provides ACID transactions, scalable metadata, and unified streaming and batch data processing. It is designed to be a lakehouse storage layer, which means that it can be used to store both batch and streaming data.
What is StarRocks?
StarRocks is a next-generation, blazing-fast massively parallel processing (MPP) database designed to make real-time analytics easy for enterprises. It is built to power sub-second queries at scale. StarRocks can read data in Delta Lake.
StarRocks + Delta Lake = The Modern Open Data Lake
Ali Ghodsi, the CEO of DataBricks talking about StarRocks' support of Delta Lake
There is a mistake in his video starting at 93 seconds. StarRocks supports all the major open table formats: Apache Hudi, Apache Iceberg, Apache Hive, Delta Lake and even more.
Technical Benefits
-
No lock in on the query layers. You can change the query layer when it doesn't meet the technical or financial requirements anymore.
-
Get all the capabilities of an OLAP database like the ability to do JOINs and materialized views on the data within Delta Lake (you can also do a JOIN across an Delta Lake, Apache Iceberg, Apache Hudi and Apache Hive table).
-
Many database tools just work out of the box through the Mysql wire compatible protocol support within StarRocks.
Resources
Documentation: StarRocks Delta Lake External Catalog