Announcing StarRocks Version 2.4
Some additional information in one line
Publish date: Dec 7, 2022 11:26:10 AM
StarRocks has taken another big step in its mission to be the next generation of analytics platforms for unifying all analytics query capabilities with the release of version 2.4.
Thanks to the hard work of our community of contributors, StarRocks is now better than ever when it comes to delivering cost-efficient, industry-leading query performance, and helping its enterprise users extract insights from their massive stores of data to make better decisions faster.
New Features in Version 2.4
With the release of StarRocks version 2.4 (SR 2.4), users will notice several new features. SR 2.4 makes great strides in further enhancing query performance while reducing data engineering costs.
Here's what's new:
Stateless Compute Nodes: In SR 2.4, we've introduced a new node type: Compute Node (CN), a stateless node that doesn't store data locally. CN represents a first step for StarRocks towards separating compute and storage resources in order to cut analytics costs in the cloud.
Compute Node Kubernetes Operator: Compute Nodes can also be deployed and managed using StarRocks' CN Kubernetes Operator. This feature allows customers to manage StarRocks clusters using an Infrastructure-as-Code approach to reduce DevOps effort and improve quality.
Table Formats Support: Starting with SR 2.4, StarRocks supports the commonly used lakehouse table formats Hudi and Iceberg. This feature enables StarRocks users to query the lakehouse without the need to create external tables, reducing application development work.
Multi-Table Materialized Views (MTMV): In SR 2.4, materialized views can now be created on top of multiple base tables and refreshed asynchronously. MTMV significantly simplifies data modeling in high-concurrency scenarios.
Enhanced Primary Key Model: StarRocks' Primary Key Model was specifically designed for real-time analytics. In SR 2.4, the Primary Key Model has been further enhanced with the capability to flush the VARHCAR-type of PK indexes to the disk.
SR. 2.4 is available now. Start enjoying these powerful new features today by visiting the StarRocks download page. Get started now.
Learn More About These New Features
With every new update, there are always new questions. We've put together a more detailed overview of StarRocks' new features below to provide some additional insight into why these features were added and how they work.
Elastic Data Lake Analytics
The most significant savings come when the system is idle. In scenarios that are dominated by ad-hoc queries (such as data lake analytics), computing resources are often occupied even when no workload is running, causing a huge waste of computing resources. An architecture that is easy to scale is valuable. Recognizing this, version 2.4 begins the shift of StarRocks towards stateless elasticity with this latest update. Compute Nodes and the StarRocks Operator are two features supporting this transition.
In version 2.4, StarRocks is introducing a new type of node: Compute Node (CN). Like frontend (FE) and backend (BE) nodes, Compute Nodes can be horizontally scaled. Unlike BE nodes, CNs are a stateless compute service that can be deployed on Kubernetes. Since CNs don't store data, users do not need to re-distribute data when scaling out or scaling in with CNs. CNs also offload part of the compute workloads from BE nodes, and run compute (such as SCAN operations) on data in the data lake directly, eliminating data ingestion.
Operators are software extensions to Kubernetes that allow users to use custom resources to manage applications and their components. CNs can be deployed in a Kubernetes environment using the StarRocks Operator. This simplifies the operating and managing of the cluster and allows the auto-scaling of compute resources, reducing resource consumption.
Support for Hudi and Iceberg Table Formats
In version 2.4, StarRocks adds Hudi and Iceberg catalogs to its external catalog system. Now, users can directly query Hudi and Iceberg data without creating external tables. This solution offers an integrated lakehouse analytics experience.
Multi-Table Materialized View
To accelerate multi-table JOIN queries, users often need to perform pre-computation upstream or in StarRocks and store the intermediate results to disks. This introduces more dependencies to the data pipeline. Moreover, users have to deal with the complexity caused by the maintenance and scheduling that comes with intermediate tables.
In version 2.4 you can now create materialized views based on multiple base tables, accelerating multi-table JOIN queries. SR 2.4 also supports the asynchronous refreshing of multi-table materialized views. StarRocks periodically refreshes materialized views using INSERT OVERWRITE, without requiring external scheduling efforts. This feature eliminates the need for denormalization and facilitates data modeling and queries, further simplifying the entire data preprocessing pipeline.
Enhanced Primary Key Model
Prior to version 2.4, the Primary Key Model stored indexes in memory only, which could sometimes cause out-of-memory (OOM) errors. Starting from version 2.4, the PK Model supports flushing the VARCHAR-type primary key indexes to disks. Disk-persistent primary key indexes support the same data types as in-memory primary key indexes. This reduces the usage of memory and prevents OOM.
Adaptive Multi-thread Scan of Tablets
In previous versions, tablets could be scanned by only one thread. Users needed to configure multiple tablets for a disk to achieve optimal scan performance. In version 2.4, multiple threads can now be used to scan a tablet, which significantly reduces the dependence of scan performance on the number of tablets. Users can also specify the number of buckets more easily.
Support for Fully Qualified Domain Name (FQDN) Access
Before version 2.4, nodes in the cluster were identified only by their IP addresses. Users' querying experiences were impacted if they couldn't access the node in the cluster for whatever reason. In version 2.4, StarRocks adds support for Fully Qualified Domain Name (FQDN). With FQDN, users can better mitigate instances where there would be a loss of access to StarRocks.
Start Using StarRocks Version 2.4 Today
StarRocks version 2.4 represents a major milestone for the project, and it will be exciting to see what's next in the coming year. Fortunately, you don't have to wait to start using StarRocks version 2.4. Download the latest version of StarRocks here.
Again, none of this would be possible without the dedication and contributions of the StarRocks community. If you've been enjoying StarRocks and would like to get more involved, be sure to join the community Slack channel. It's a great way to stay in touch, make suggestions, and help other members of the community make the most of StarRocks. Join the Slack channel here.