StarRocks vs. ClickHouse: The Quest for Analytical Database Performance
In late 2022, ClickHouse released its open source performance benchmark project, ClickBench. This benchmarking tool quickly generated a lot of attention and discussion. Data warehouse vendors and data infrastructure engineers rushed to the site to check out who ranked as the 'fastest' analytical database.
We applaud ClickHouse for not only providing such a helpful tool, but also for supporting a culture of healthy competition in the community by vetting and accepting results submitted by other groups. The team behind StarRocks had the pleasure of collaborating with the ClickHouse team to discuss test results, and this experience was nothing short of enlightening.
A Tale of Two Databases: Who Is the Fastest Analytical Database?
If you've been keeping an eye on ClickBench's latest results, you've probably come across some of the entertaining discussions related to the products competing for the number one spot on the chart. ClickHouse could have easily turned ClickBench into a vendor-biased marketing tool, but to their immense credit, the ClickHouse team displayed great sportsmanship by accepting results from other projects in the space. This included StarRocks, which ran up near the top of the chart immediately. In fact, StarRocks briefly held the number one spot on its first day on ClickBench.
We could end the story there, but ClickHouse turned around with some impressive results from their next release that put them back on top. Just like any great athlete, ClickHouse wasn't going down without a fight.
StarRocks in the top spot on ClickBench (February 27, 2023).
But the StarRocks community is not so easily discouraged. Only a few months later StarRocks re-claimed the top spot with its latest release.
This race is reminiscent of the competition you'd find between famous sports rivalries: Ronaldo and Messi, Federer and Nadal, and even the Lakers and Celtics. Okay, maybe that's a little dramatic, but the StarRocks community continues to enjoy its intense, but friendly, competition with ClickHouse.
While ClickBench is an excellent indicator of performance for certain scenarios, and StarRocks and ClickHouse are basically neck and neck in that race, we believe there is much more to a great analytical database than what is covered by ClickBench alone.
Going Beyond ClickBench: Properly Evaluating High-Performance Analytical Databases
Sticking with the sports analogy, ranking in ClickBench is just one of the many competitions (query performance) under a larger category (top analytics databases), like the 100-meter freestyle in swimming. To be seen as the best, StarRocks needs to compete and win in different competitions, not just one. Because analytical workloads in real life vary drastically from customer to customer, we need to support all scenarios well.
One great athlete comes to mind in this example: Michael Phelps. Not only did he win the 100-meter freestyle gold in the Olympics, but he also won the 200-meters, 400-meters, butterfly, and medley competitions.
Could StarRocks and ClickHouse be the greatest rivalry since Messi vs. Ronaldo? Probably not, but it's fun to think about.
That's the level of success the StarRocks community aspires to. At StarRocks, we believe there are other scenarios not covered in ClickBench that matter in real life. So we have also been publishing test results against other important test sets such as TPC-H and SSB.
Factors we believe should be highlighted for proper evaluation are:
-
Query performance on joined tables without de-normalization - This is a critical feature to simplify the analytics data pipeline and improve timeliness. ClickBench only focuses on de-normalized table query performance.
-
Scalability to handle growing data volumes - Modern analytics architectures needs to be distributed and scalable. ClickBench is great for single node configurations, but how easy is it to add or remove a server from the distributed platform? This is important to know.
-
High concurrency queries - More and more use cases require support for large numbers of concurrent queries. ClickBench only tests a single query session, so we need to investigate the performance of 100s or 1000s of concurrent queries.
-
Ingestion speed - This is another area ClickBench doesn't cover. While processing queries is critical, it is also important to handle high-speed data ingestion in real-time.
This isn't an exhaustive list of factors that should be tested, but for ClickBench to adopt them would make the tool more valuable for evaluation purposes.
Driving Greater Analytical Database Performance Through Open Source
The success of ClickBench is a testament to the the power of open source, both in how it brings together developer communities like StarRocks to take on new challenges, and in its fostering of open competition and innovation between projects. On a similar note, earlier this month, the StarRocks project was donated to the Linux Foundation, and we are sure the project will grow even faster in its new home.
Hats off to ClickHouse. It's an honor to compete with them. It pushes StarRocks to be a better project.
What do you think of StarRocks' latest achievement on ClickBench? Join the StarRocks Slack and share your thoughts.