Partner Integration: Apache Hive + StarRocks
What is Apache Hive?
Apache Hive is an open-source data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.
What is StarRocks?
StarRocks is a next-generation, blazing-fast massively parallel processing (MPP) database designed to make real-time analytics easy for enterprises. It is built to power sub-second queries at scale. StarRocks can read stored in Apache Hive.
StarRocks + Apache Hive = The Modern Open Data Lake
Technical Benefits
-
Our performance tests have shown that StarRocks can get to near local disk performance when using Apache Hive.
-
No lock in on the query layer. You can change the query layer when it doesn't meet the technical or financial requirements anymore.
-
Get all the capabilities of an OLAP database like the ability to do JOINs and materialized views on the data within Apache Hive (you can also do a JOIN across an Apache Iceberg, Apache Hudi and Apache Hive table).
-
Many database tools just work out of the box through the Mysql wire compatible protocol support within StarRocks.
Resources
Documentation: StarRocks Hive External Catalog