Some additional information in one line

About the Author (Kim Byung Ju)

The author currently works at CloudShift with a strong passion for open-source and cloud-native data technologies. Having supported numerous clients with data consulting and implementation projects across various industries, the author also actively operates the StarRocks community.


 

Haezoom's data engineering team leads Korea's renewable energy shift through Virtual Power Plant operations, specializing in solar energy. Haezoom support over 6,000 power stations and 2.3 million users, managing a vast ecosystem of solar data for accurate generation forecasting and energy market optimization.

As Haezoom’s workloads grew, Apache Druid became costly and inefficient, limiting query flexibility and scalability. Partnering with CloudShift, Haezoom migrated to StarRocks—achieving 1.74x higher throughput, 44% faster queries, 4x gains for complex analytics, and 30% lower infrastructure costs

 

About CloudShift

CloudShift was founded in 2020 and has been providing expert consulting services in Korea across various industries for the past five years. With deep expertise in modern data platform technologies, CloudShift supports comprehensive cloud and on-premises implementations. 

Through their strategic partnership with CelerData, they’re delivering cutting-edge data solutions to their clients.

 

Original Architecture With Apache Druid

To handle large-scale IoT data and deliver real-time analytics, Haezoom had been leveraging Apache Druid as its core time-series database. While Druid enabled fast queries and real-time bidding, Haezoom’s platform's rapid growth revealed architectural limitations that required a transformation. 

Druid Architecture (1)

Resource Inefficiency

While consuming various data types through Haezoom’s Kafka (Confluent) + Druid stack, Haezoom discovered that not all data fit well into streaming pipelines. Some data sources delivered updates only hourly, yet Druid's peon processes consumed computing resources continuously throughout the day, regardless of actual data ingestion frequency. This mismatch between data arrival patterns and resource allocation led to significant waste.

 

High Infrastructure Costs

Running Apache Druid on AWS EKS with high availability requirements—including standby master pods, Zookeeper ensemble, and deep storage—resulted in considerable infrastructure overhead. The complexity of maintaining these components for reliability came at a steep financial cost that increasingly strained Haezoom’s operational budget.

 

Query Performance Bottlenecks

Certain analytical queries placed excessive load on Haezoom’s Druid cluster, requiring far more resources than typical operations. This forced them to scale up the entire cluster size to handle peak loads, even though most queries consumed minimal resources. This all-or-nothing scaling approach proved both costly and inefficient.

 

Limited Data Source Integration

Building a unified data platform became increasingly challenging as Haezoom needed to integrate multiple data sources such as S3, PostgreSQL, and Kafka. Druid's architecture constraints made it difficult to create a seamless data ecosystem that could efficiently handle diverse data sources and formats.

 

Complex Query Requirements

As Haezoom’s AI team's requirements grew more sophisticated, the limitations of minimizing JOIN and WITH clauses in Druid queries became apparent. Fully preprocessing all data before ingestion to avoid these operations incurred significant computational costs and development overhead, making it an unsustainable approach for their evolving analytical needs.

 

Why Haezoom Chose StarRocks Over Druid: A Feature-Based Evaluation

These limitations across cost, scalability, and query flexibility prompted Haezoom to seek an alternative—and that search ultimately led them to StarRocks. As Haezoom explored StarRocks, they found that its architecture directly addressed these pain points—offering higher efficiency, lower costs, and more flexible query capabilities out of the box.

Apache Druid Feature Requirements

Feasibility for Migration

Expected Benefits with StarRocks

Implementation Strategy

Real-time Ingestion (Kafka)

Low feasibility if only performing real-time ingestion

Increased efficiency for cost-effective real-time ingestion

Resource-efficient periodic ingestion using Routine Load

Real-time Ingestion (Kinesis)

Low (StarRocks doesn't support native ingestion)

 

Requires transition to Flink or StarRocks Pipe

Batch Ingestion

High: Supports various batch ingestion methods

Various ingestion strategies available with INSERT statement support

Loading

Real-time Query (Denormalized)

Low feasibility for high real-time requirements, but StarRocks supports all necessary functions

SQL API

Additional complex ANSI SQL support

 

Join Queries

High: Druid has MSQ as a new feature, but it's inefficient due to MiddleManager usage

Stable execution of join queries

 

Long-term Report Queries

High: Druid has MSQ as a new feature, but it's inefficient due to MiddleManager usage

Stable execution of join queries

 

Data Lake Export

High: Druid only supports CSV export via MSQ feature

Support for various export formats

Supports various features, including External Catalog

Data Lake Federation

High: Druid has released an early version in this area, but has architectural inefficiencies

Support for various federation environments

Supports various features including External Catalog

BI Connection

High: Druid supports Avatica JDBC Driver

StarRocks can use mysql connector driver

 

(Imply) Pivot: Real-time BI

Requires in-depth analysis: If business importance is high, alternatives should be considered

Metabase can be used, but requires paid license for certain features

 

Haezoom partnered with CloudShift to migrate its data infrastructure to StarRocks, leveraging CloudShift's expertise in modern data platforms to ensure a seamless transition.

 

Migrating From Apache Druid to StarRocks

For the data migration, much of Haezoom’s data resides in Confluent (Kafka) and S3, so Haezoom decided to migrate directly from those sources. If they need to migrate data from Druid to StarRocks, they’re considering exporting it via MSQ to S3, and then ingesting it into StarRocks from there.

  • Deploy StarRocks on EKS

    • Determine optimal cluster size and recommended parameters based on current workload patterns and expected growth

    • Configure appropriate node types, storage classes, and resource allocation for FE (Frontend) and BE (Backend) components

    • Provide current status, including Druid cluster specifications, data volume metrics, and query patterns for accurate resource provisioning.

  • Ingest Data from Kafka via Routine Jobs and Stream Load

    • Implement efficient data ingestion pipelines using StarRocks' native Kafka connector to minimize resource usage and latency

    • Configure appropriate parallelism, batch sizes, and commit intervals for optimal throughput

    • Share comprehensive topic and table name mappings, including partition strategies and data retention policies

  • Test and Transition SQL API Compatibility

    • Validate existing SQL syntax compatibility by running current Druid SQL queries against StarRocks

    • Conduct comprehensive performance benchmarking, comparing query execution times, resource utilization, and concurrency handling

    • Document any SQL syntax modifications required and create migration scripts for seamless transition

  • Connect BI Tools

    • Integrate with Metabase using StarRocks' MySQL protocol compatibility for dashboard migration

    • Enable Python client connectivity through SQLAlchemy and PyMySQL for Haezoom’s data science workflows

    • Test and validate all existing visualizations and ensure consistent data accuracy

  • Parallel Operation and Druid Decommissioning

    • Run both systems in parallel for up to two weeks to ensure data consistency and system stability

    • Implement data validation checks and monitoring to compare outputs between systems

    • Execute phased cutover plan before safely shutting down Druid cluster and reclaiming resources

 

Results from the Migration (vs Apache Druid)

Resource efficiency

By leveraging StarRocks' Routine Load feature, Haezoom efficiently resolved this issue by consuming data on-demand based on actual arrival patterns, eliminating the continuous resource consumption that plagued Druid peon processes regardless of data ingestion frequency.

Charts (1)

 

1.74x Higher Throughput

Haezoom’s new architecture now handles 1.74 times more requests than the previous Druid setup, enabling Haezoom to support its rapidly growing user base of 2.3 million users.

 

44.3% Faster Average Response Time

Query performance improved dramatically with average response times decreasing by 44.3%, providing near-instantaneous insights for Haezoom’s solar energy forecasting models.

 

4x Performance Gain for Complex Queries

For specific complex analytical queries that previously bottlenecked Haezoom’s system, they achieved up to 4x performance improvements, enabling their AI team to run sophisticated models without preprocessing overhead.

 

30% Infrastructure Cost Reduction

Through optimized resource utilization and elimination of unnecessary standby components, Haezoom reduced its overall data infrastructure costs by 30%.

 

Future Plans

  1. Performance Optimization: Fine-tuning query execution plans and resource allocation strategies to maximize throughput and minimize latency for Haezoom’s time-series solar energy data.

  2. Data Lake Integration: Building a comprehensive Data Lake architecture using Apache Iceberg and implementing External Catalogs to seamlessly integrate diverse data sources, including S3, PostgreSQL, and streaming platforms.

  3. Self-Service Analytics: Establishing a self-service analytics environment that empowers Haezoom’s data analysts and scientists to independently explore, query, and derive insights without engineering bottlenecks.

 

Curious to learn more about how StarRocks handles complex JOINs and other analytics challenges? Join the StarRocks Slack community to connect with us and explore further!