StarRocks 4.0 Now Available
StarRocks 4.0 is here! Unveiled at the StarRocks Global Summit, version 4.0 marks the next major milestone in the StarRocks journey. In just five years, the open-source project has reshaped the analytics landscape with a relentless focus on speed, simplicity, and scale.
-
1.0 made on-the-fly joins possible without denormalization
-
2.0 brought mutable data to real-time analysis
-
3.0 re-architected with a shared data architecture for elasticity and the cloud
Now 4.0 is more open, faster, and better governed. This release combines open data lake flexibility with even greater performance and unified governance built for enterprises.
StarRocks 4.0 at a Glance
-
~60% faster performance YOY with deep JOIN, aggregation, and spill optimizations.
-
First-class Apache Iceberg support delivers faster Iceberg metadata parsing, hidden partition read/write support, compaction API, and optimized file writes.
-
JSON as a first-class type ensures 3–15× faster queries without flattening.
-
Real-time at lower cost: file bundling, metadata caching, and smarter compaction together reduce cloud API calls by up to 90%.
-
Lakehouse governance with catalog-centric access control on Apache Iceberg.
-
Expanded workloads through Decimal256 for high-precision, large-scale aggregations; multi-statement transactions for financial and multi-stage pipelines; ASOF JOIN for time-series and AI use cases.
-
Operational improvements include node blacklisting, case-insensitive names, and global connection IDs.
Extreme Performance, Evolved Further
Performance is in StarRocks’ DNA. Version 4.0 takes it to the next level, delivering higher performance and greater consistency across a wider range of workloads.
Core operator optimizations
Over the last year, we have continuously improved StarRocks' performance for both internal tables and external catalog-managed tables. JOIN, aggregation, count distinct, and spill handling have all been deeply tuned, delivering 1.6× faster performance on a TPC-DS 1TB benchmark year over year.
This performance can be attributed to core improvements in the operators that define query execution:
JOIN operations just got significantly faster. Hash joins and merge joins can handle complex multi-table queries with less memory overhead and better parallelization. The engine automatically selects the most efficient join strategy for each query, requiring no manual tuning and working seamlessly across both internal and external catalogs.
Aggregations use less CPU. COUNT DISTINCT, GROUP BY, and similar operations now use global dictionaries and optimized hash tables. The new partition-wise spillable aggregate operators reduce overhead in high-cardinality scenarios, and string-heavy aggregations that previously dominated CPU time now run as lightweight integer operations through dictionary encoding.
Spill handling doesn't kill throughput. When queries exceed memory limits, partition-wise spill mechanisms prevent out-of-memory errors (OOMs) without sacrificing speed. This approach minimizes disk I/O and keeps queries performing reliably under memory pressure.
First-class JSON Support
JSON querying and data manipulation now offer performance similar to native columnar storage, powered by the new Flat JSON V2 engine. Queries run 3–15× faster than 3.5, with no flattening or pipeline changes required. Simply load JSON and query it like you would structured data for real-time applications, such as logs, clickstreams, and user profiles.
The latest Flat JSON V2 solves remedial issues experienced in the past by attacking the execution layer:
-
Zone map indexes skip irrelevant data blocks before scanning starts
-
Global dictionaries convert string operations into integer comparisons—no character-by-character matching
-
Late materialization only decodes rows that survive filtering, eliminating wasted CPU on filtered-out data
Simply load JSON and query it like structured data. And enjoy JSON flexibility without performance penalties.
Predictable Performance with SQL Plan Manager (Preview)
Fast, on average, isn't enough. Production systems need consistent latency. However, the query optimizer can be fragile, as data updates may cause the optimizer to select unexpected execution plans.
SQL Plan Manager lets you bind a query to a known, reliable plan. Even as data evolves or statistics update, the query uses the plan you validated. This prevents plan degradation without modifying SQL or adding query hints, so your plans stay locked even when data volumes or cluster states change.
Apache Iceberg as a First-Class Citizen
Since 2.0, StarRocks has supported open formats like Apache Iceberg, but raw data lakes can be messy: tiny files, fragmented partitions, and stale metadata slow queries down.
With the 4.0 release, StarRocks brings warehouse-grade discipline to the lakehouse, so querying formats like Apache Iceberg feel closer than ever to working with a native StarRocks table.
File/Storage: Write Once, Query Immediately
StarRocks 4.0 delivers comprehensive enhancements to file writing and management, improving not only ingestion performance but also ensuring that written data is immediately optimized for querying:
-
Apache Iceberg Feature Support:
-
Full support for creating and writing to Iceberg hidden partition tables.
-
Supports defining sort keys at table creation.
-
-
Write performance improvements:
-
Global shuffle avoids generating small files when the number of partitions is high.
-
Spillable writes avoid OOMs during large-scale ingestion.
-
Local sort generates files better suited for query performance.
-
-
File compaction: A new Compaction API lets users merge files on demand, ensuring query efficiency over time.
Query: Stable and Accelerated
Beyond the scope of compaction, Iceberg tables are often large and fragmented with incomplete or outdated statistics. Traditional query engines struggle when stats are stale or missing.
StarRocks 4.0 handles this by delivering:
-
Optimizer improvements generate cost-efficient plans even when table statistics are missing or stale, resulting in no more query degradation after partition changes.
-
Faster statistics collection that's lighter-weight and less intrusive, gathering what's needed without scanning entire tables unnecessarily.
-
Smarter metadata refresh keeps catalog information current without constant polling or expensive full scans.
-
Optimized metadata parsing helps avoid redundant work. COUNT/MIN/MAX queries skip data file scans entirely when metadata already contains the answer, turning what used to be full table scans into metadata-only operations.
Governance: Catalog-Centric Access Control
StarRocks 4.0 introduces JWT-based Session Catalog integration for the Iceberg REST Catalog and supports temporary credential vending across AWS, GCP, and Azure.
This means user identities are passed end-to-end for unified authorization at the catalog level, and storage credentials no longer need to be hardcoded or repeatedly configured.
Real-Time Analytics, Without the Overhead
Object storage like S3 may be cheap, but frequent file writes and metadata lookups quickly drive up API costs. StarRocks 4.0 delivers end-to-end optimizations that make real-time analytics fast, stable, and cost-efficient.
-
File bundling: Small writes are automatically packed into larger files to reduce object count and eliminate write amplification.
-
Metadata caching: StarRocks now serves metadata from backend memory whenever possible, minimizing S3/API calls.
-
Smarter compaction: Keeps data organized for queries without over-consuming resources. Compared to version 3.3, cloud API calls are reduced by 70%–90% without compromising data recency (freshness) or latency, and in some cases, queries run even faster than before.
Expand into More Demanding Workloads
Some workloads demand more than just speed. StarRocks 4.0 adds new capabilities purpose-built for finance, payments, Web3, IoT, and other high-stakes environments:
High-precision Decimal256: Support for 256-bit decimal arithmetic ensures your aggregations don't lose precision even on a large scale. It's ideal for currency settlement, trade reconciliation, and risk modeling, where rounding errors aren’t an option.
Multi-statement transactions: New support for BEGIN, COMMIT, and ROLLBACK allows atomic operations across multiple tables, simplifying complex data pipelines and ensuring analytical consistency.
ASOF JOIN for time-series: StarRocks now supports ASOF JOIN based on timestamps or sequence IDs. Whether aligning quotes with trades or synchronizing IoT data streams from multiple sources, ASOF JOIN delivers high-performance processing.
Operational Improvements and Ease of Use
Running StarRocks should be as smooth as querying it. 4.0 adds practical upgrades that simplify operations at scale:
-
Node blacklisting: Automatically (or manually) isolate unstable compute/backend nodes so the scheduler avoids them, improving reliability during faults or maintenance.
-
Case-insensitive identifiers: Optional setting to treat database/table/view names case-insensitively, easing migrations and reducing tool-chain mismatches.
-
Global connection IDs: A cluster-wide ID per session propagated through logs and profiles, making distributed debugging and tracing straightforward.
Conclusion
StarRocks 4.0 builds on the core strengths of StarRocks, performance, scalability, and governance, while extending them into a new era of openness, intelligence, and enterprise readiness.
From seamless Iceberg integration to first-class JSON support and real-time analytics at lower cost, this release empowers teams to analyze more data, faster, with greater control than ever before.
Join the Release Webinar to see a deep dive into StarRocks 4.0 and explore how these capabilities can power your low-latency, high-concurrency workloads. [Register here]