StarRocks 4.1: Built for Production, Designed to Simplify

StarRocks is recognized for high-performance analytics, offering fast queries, real-time ingestion, and a unified architecture for both data lakehouse and data warehouse workloads. However, speed alone does not ensure production stability. Teams operating StarRocks at scale, especially for customer-facing, multi-tenant analytics, frequently encounter recurring operational challenges:
- Static distribution at creation: users must define partitioning, bucketing, and key settings upfront, often without sufficient insight into future access patterns.
- Data skew amplifies silently: a single large tenant can funnel all traffic into a small number of tablets, creating cascading instability that is difficult to diagnose.
- Remediation is costly: correcting distribution errors often requires data re-ingestion, schema redesign, or both.
- Expert knowledge is required: reasoning about colocation, bucket counts, and distribution keys demands deep system knowledge that most teams lack.

These are not rare occurrences. They are the most frequently reported issues from long-term user deployments and proofs of concept (PoCs). Small mistakes at table creation often escalate into significant operational problems as workloads increase.
StarRocks 4.1 directly addresses these challenges. Key features include:
- Automatic multi-tenant data management: when enabled (opt-in via enable_range_distribution), the system dynamically splits tablets along sort-key ranges as data shape evolves — without schema changes, SQL changes, or data re-ingestion. Tablet merge ships in a subsequent release.
- Large-capacity tablets: tablets can safely grow to approximately 30 GB in shared-data mode, significantly reducing tablet count and FE metadata pressure.
- Fast Schema Evolution v2 in shared-data: second-level DDL operations that are independent of tablet or partition count.
- Cache observability and latency control: end-to-end cache metrics at cluster, table, partition, and query levels, integrated into audit logs and SQL profiles.
- Iceberg v2/v3 enhancements: deeper compatibility, improved partition evolution, and incremental materialized views on Iceberg tables.
- Additional capabilities: recursive CTEs, improved skew join v2, expanded window functions, inverted index on shared-data architecture, and more.
Note: For a complete list of features, please refer to the release notes: https://docs.starrocks.io/releasenotes/release-4.1/
Automatic Multi-Tenant Data Management
StarRocks 4.1 eliminates the need to predict data distribution at table creation by introducing a system that adapts dynamically as workloads evolve. This is the release’s most significant usability and operability improvement.
Under the hood, 4.1 introduces tablet-level automatic splitting in shared-data mode. Tablets are no longer static units; when a tablet exceeds a size threshold, the system automatically splits it into smaller tablets. Range-based organization derived from sort keys determines split boundaries and preserves query correctness.
By design, this mechanism operates entirely at the metadata layer. It does not require changes to the table schema, user SQL, or data re-ingestion.
In practice, table creation is significantly simpler. Teams no longer need to over-engineer DISTRIBUTED BY configurations or anticipate tenant growth patterns far in advance. The system automatically corrects imbalances as they arise, removing the need for manual intervention.
In benchmark testing on a 200 GB dataset with 32 concurrent threads, adaptive range-based distribution delivered 1.86× the throughput of static hash distribution, and P99 latency dropped from 36.6 seconds to 11.5 seconds. For a detailed breakdown of the benchmark, please refer to a separate feature deep dive blog.


Large-Capacity Tablets Eliminate Single-Bucket Hotspots
Many severe production issues stem from small tablets combined with rigid distribution rules. When partitions have single buckets or skewed keys, traffic concentrates on a few tablets, creating hotspots that lead to query timeouts and ingestion failures.
StarRocks 4.1 introduces the initial phase of large-capacity tablet support:
- Approximately 30 GB tablet capacity: tablets in shared-data mode can safely grow to about 30 GB, a significant increase over previous limits.
- Reduced tablet count per table: fewer tablets lower FE metadata pressure and reduce scheduling overhead across the cluster.
- Validated ingestion and compaction paths: this phase prioritizes stability and correctness, validating behavior under production conditions before further capacity expansion.
Large-capacity tablets represent a foundational change in how StarRocks manages data distribution. They are designed to reduce sensitivity to bucket configuration and mitigate the single-bucket hotspot pattern that has caused production incidents.
Fast Schema Evolution Without the Operational Tax
Production systems must continuously evolve schemas; new columns, changed types, and adjusted partitioning are routine operations. However, when DDL operations scale linearly with table size, tablet count, or partition count, even simple changes become operationally prohibitive.
Fast Schema Evolution v2 in Shared-data builds on Fast Schema Evolution (introduced in StarRocks 3.2) to deliver second-level schema evolution for supported operations. DDL performance is decoupled from tablet-level metadata rewrites, bringing shared-data DDL behavior closer to the speed teams expect from shared-nothing architectures.
This feature does not provide instant re-sharding or zero-cost distribution changes. Instead, it focuses on reducing the cost of safe, iterative schema corrections, allowing teams to adapt data models without scheduling maintenance windows.
Cache Observability Makes Latency Predictable
In shared-data architectures, query latency is driven not just by execution efficiency but by data access paths and cache behavior. Without strong observability, cache effects amplify tail latency, introduce variance across replicas, and make production behavior difficult to reason about, especially under concurrency.
StarRocks 4.1 delivers a comprehensive set of cache and latency improvements:
- Per-query and cluster-level cache visibility: cache hit ratio surfaced in audit logs (per query), Prometheus BE metrics (cluster-wide), and SQL profile I/O counters.
- Audit log and SQL profile integration: cache metrics are embedded directly in existing diagnostic workflows, not siloed in a separate monitoring system.
- Cache replication foundations: groundwork for controlled cache warmup across compute replicas, reducing cold-start latency after scaling events.
- Reduced cache amplification: less cache churn during compaction, migration, and failure recovery scenarios.
Together, these changes give operators the visibility needed to diagnose cache-driven latency variance, rather than absorbing it as background noise. The goal is to make latency behavior predictable and diagnosable in production, rather than focusing solely on peak benchmark results.
Deeper Iceberg Integration
StarRocks 4.1 introduces rich enhancements in lake analytics, each addressing gaps that previously required external tools or workarounds.
- Native SQL DELETE on Iceberg tables: StarRocks can now execute DELETE as a distributed SQL operation directly on Iceberg tables, producing standard V2 position delete files with atomic snapshot commits. Teams no longer need Spark jobs or custom scripts to correct data, enforce retention policies, or propagate CDC deletes.
- Iceberg Variant type support: Iceberg V3 introduces Variant, a structured binary format that balances schema flexibility and performance, replacing the tradeoff between slow JSON strings and rigid STRUCT columns.
StarRocks 4.1 integrates Variant directly into its vectorized execution engine, resolving field access through offset-based binary lookup and eliminating the repeated parsing overhead that caused JSON-as-STRING performance to degrade under concurrency.
- Incremental Materialized Views on Iceberg: Traditional MV refresh scales with total table size; as tables grow, refresh slows, and acceleration goes stale. StarRocks 4.1 shifts from partition-level recomputation to version-range delta processing, so refresh cost tracks data change volume rather than data history. On a 100 GB benchmark, subsequent incremental refreshes ran 7–30x faster than partition-based refresh.
Additional Capabilities Close Long-Standing Gaps
In addition to major architectural changes, version 4.1 includes targeted improvements that reduce daily friction for analytics development:
- Inverted index on shared-data (Beta): extends indexing capabilities to shared-data deployments with the builtin parser. Additional analyzers planned in subsequent releases.
-
Improved skew join v2: enables statistics-based skew detection with histogram support and NULL-skew awareness, improving join performance on skewed data distributions.
- Recursive CTE support: enables hierarchical queries and graph traversal patterns directly in SQL.
- Expanded window functions: includes support for ARRAY types in window expressions.
- USING clause for FULL OUTER JOIN: a small but meaningful ergonomic improvement for join-heavy queries.
While individually smaller in scope, these enhancements significantly improve the daily developer experience and address capability gaps reported by customers and field teams.
Getting Started with StarRocks 4.1
StarRocks 4.1 is a release focused on making large-scale analytics systems safer to operate, easier to evolve, and more predictable in production. It addresses the most acute operational risks faced by customer-facing workloads — not by adding complexity, but by removing the decisions and configurations that caused problems in the first place. To get started, download StarRocks 4.1 or try CelerData’s managed offering to experience these improvements firsthand.
💬 Join the StarRocks Community
Have questions about StarRocks? Join the StarRocks community on Slack — it’s where the team and thousands of users discuss real-world deployments, troubleshooting, and best practices.