StarRocks Disaster Recovery on Kubernetes: A Practical Guide to Cluster Snapshot

Publish date: Dec 22, 2025 3:27:02 PM

With the release of StarRocks 3.5, a brand-new Cluster Snapshot recovery mechanism was introduced, further enhancing StarRocks' data protection and disaster recovery capabilities.

Cluster Snapshot offers an efficient, low-cost, and automated approach to data protection. It significantly improves system availability and fault tolerance while addressing gaps in backup and recovery under shared-data architecture. In the event of system failures, operational mistakes, or regional outages, snapshots enable recovery within minutes, minimizing data loss and reducing the risk of business disruption.

By capturing the complete cluster state and backing it up to object storage, Cluster Snapshot simplifies the traditionally complex disaster recovery process, making recovery faster and more convenient. This mechanism is particularly well-suited for mission-critical workloads in industries such as finance, retail, and SaaS, where system stability and data safety are critical.

Building on this foundation, this article draws on real-world StarRocks deployments in Kubernetes environments to explain disaster recovery mechanisms, recovery workflows, and snapshot strategies in shared-data architecture. The goal is to help users quickly restore system state during cluster failures or abnormal conditions, ensuring business continuity and data security.

Terminology

Cluster Snapshot

StarRocks 3.5 introduces support for cluster-level snapshots in the shared-data architecture, providing a complete snapshot of the entire cluster. A snapshot can be created automatically and captures the full cluster state at a specific point in time, including metadata such as catalogs, databases, tables, and users. The snapshot is stored in object storage and can be used for fast in-place or cross-region recovery.

Snapshots do not include external dependencies, such as external configuration files referenced by catalogs or locally stored UDF JAR files.

A Cluster Snapshot consists of two components: Metadata Snapshot and Data Snapshot.

Metadata Snapshot

The Metadata Snapshot is generated periodically by the Frontend (FE) through its checkpoint mechanism. The resulting image file contains the cluster's metadata, including databases, tables, users, and privilege information.

1-Dec-22-2025-09-55-48-5999-PM

Data Snapshot

In a shared-data architecture, data is stored in object storage, which offers high durability and near-infinite capacity. As a result, the snapshot process does not require copying data files. Instead, the snapshot records the specific data versions referenced by the corresponding metadata snapshot. During recovery, metadata is mapped back to these preserved data versions.

Automated Cluster Snapshots

Once automated cluster snapshots are enabled, the system generates a full snapshot at a configured interval (10 minutes by default) and writes the FE metadata image file to the specified object storage path. Since the underlying data files already reside in object storage, no additional data movement occurs at the data layer. By default, the system retains only the most recent snapshot and the data versions it depends on.

As shown above, each data ingestion operation on a tablet produces a corresponding versioned rowset. For tablets that are still referenced by an active metadata snapshot, garbage collection tasks triggered by operations such as DROP TABLE, DROP PARTITION, DELETE, or Compaction are blocked until the next metadata snapshot is generated. This ensures that all data versions required by the current snapshot are preserved and not prematurely deleted.

Cluster Restore

When recovery is needed, users simply specify the snapshot's object storage path and the corresponding storage volume configuration in either the original cluster or a new cluster. The system then restores the cluster to the selected point in time by automatically loading metadata and cleaning up redundant data, ensuring both data consistency and system availability.

Design Goals

The StarRocks shared-data architecture supports automated cluster snapshots, enabling both local and remote cluster recovery through snapshots. Snapshots are stored in the same location as the data directory and preserve data from the most recent checkpoint version, allowing the cluster to be restored to the latest snapshot state.

Each time a new snapshot is generated, the system automatically creates the corresponding snapshot and deletes any previous snapshot files, retaining only the latest version. As a result, there is always exactly one automated snapshot per cluster.

Automated snapshots are triggered by the system. Users can adjust snapshot-related parameters via SQL commands to control the snapshot generation interval.

Limitation: This feature is currently supported only in shared-data architectures.

Operational Guidance

For brevity, this article omits the detailed steps for managing automated cluster snapshots and viewing snapshot and task information. Please refer to the official documentation for complete instructions: https://docs.starrocks.io/docs/administration/cluster_snapshot/

Recovery Workflow in a Kubernetes Deployment Environment

In a Kubernetes deployment environment, to allow the operator to gracefully complete reconciliation in disasterRecovery mode, the configuration options disasterRecovery and disasterRecoveryStatus are introduced.

spec:
  disasterRecovery:
    generation: 1
    enabled: true

status:
  disasterRecoveryStatus:
    phase: todo/doing/done
    reason: ""
    observedGeneration: 1
    startTimestamp: xxx
    endTimestamp: yyy

The enabled field in the disasterRecovery configuration must be set to true.

The field disasterRecoveryStatus.phase indicates the current stage of the disaster recovery process. The possible phases are todo, doing, and done. The update logic for disasterRecoveryStatus is as follows:

When the Operator detects that the cluster is entering disaster recovery mode for the first time (that is, disasterRecoveryStatus is empty), or when observedGeneration is less than generation, the disaster recovery process enters the todo phase.
After the updated StatefulSet is applied, the Operator updates disasterRecoveryStatus.phase to doing. The duration of this phase depends on how long the disaster recovery operation takes to complete.
1. During this phase, the Operator periodically checks the status of the FE Pods. It first verifies that the Pods belong to the expected generation, and then confirms that they are in the Ready state.
Once the FE Pods reach the Ready state, disasterRecoveryStatus.phase is updated to done.
Finally, the Operator starts the cluster normally according to the StarRocksCluster configuration.

How to Perform Disaster Recovery on an Existing Cluster

It is recommended to enable disaster recovery on an empty StarRocks cluster. If you must use an existing StarRocks cluster, you need to clean up both FE metadata and CN data before enabling disaster recovery.

1. Clean Up Existing Metadata

Manually scale the FE StatefulSet replicas down to 0.
Delete the corresponding FE metadata PVCs.

2. Clean Up Existing Data

Manually scale the CN StatefulSet replicas down to 0.
Delete the corresponding CN cache storage PVCs.

3. Start Disaster Recovery

After the cleanup is complete, enable disaster recovery and start the recovery process.

4. Exception Handling

4.1 CN Local Cache Cleanup

During in-place recovery, CN nodes may still retain old local cache data. Since the ID generator is reset during recovery, the same file name may refer to different content, leading to cache corruption.
Manual cache cleanup is required. This can be done by logging into the CN Pods and clearing the cache, or by forcibly deleting the PVs or PVCs so that the StatefulSet recreates fresh volumes for the CN nodes.

4.2 FE In-Place Recovery

In some cases, existing FE nodes may already form a high-availability (HA) cluster. After recovery, the fe-0 node restores successfully using the new metadata, while fe-1 and fe-2 may still retain old metadata and continue to form another HA cluster, potentially with its own leader.
This can result in a split state where fe-0 becomes the leader of the recovered cluster, while fe-1/fe-2 form a separate cluster.
After recovery, manually inspect and clean the metadata directories of fe-1 and fe-2 to ensure the cluster can fully converge to a consistent state.

4.3 Re-running Disaster Recovery Using generation

The generation field can be used to trigger disaster recovery multiple times. If a previous recovery attempt does not produce the expected result, you can manually adjust the configuration and increment the generation value to re-run the recovery process.

4.4 Snapshot Selection After Failure

If an FE exits abnormally during snapshot creation, two snapshots may exist in object storage. In this case, recovery should be performed using the older snapshot to ensure consistency.

➜ disaster-recovery git:(master) ✗ s3cmd ls s3://ydx-starrocks-cluster-bucket/data/5b0125af-7ff6-45df-9b60-c896797458ba/meta/image/
DIR s3://ydx-starrocks-cluster-bucket/data/5b0125af-7ff6-45df-9b60-c896797458ba/meta/image/automated_cluster_snapshot_1739446285344/
DIR s3://ydx-starrocks-cluster-bucket/data/5b0125af-7ff6-45df-9b60-c896797458ba/meta/image/automated_cluster_snapshot_1739446405766/

Disaster Recovery Practices in a Kubernetes Deployment

To make this guide easier to follow and reproduce, we use the kube-starrocks Helm chart to deploy the cluster. Please note the following prerequisites:

You must use Operator v1.10.0 or later, along with the corresponding YAML manifests.
In this document, xxx is used as a placeholder for sensitive information. Please replace it with appropriate values for your environment.

1. Creating an Available Cluster

First, prepare the ./starrocks-values.yaml configuration file.

operator:
  starrocksOperator:
    image:
      repository: starrocks/operator
      tag: v1.10.0
    imagePullPolicy: IfNotPresent
    replicaCount: 1
    resources:
      requests:
        cpu: 1m
        memory: 20Mi
starrocks:
  starrocksCluster:
    enabledBe: false
    enabledCn: true
  starrocksCnSpec:
    config: |
      sys_log_level = INFO
      # ports for admin, web, heartbeat service
      thrift_port = 9060
      webserver_port = 8040
      heartbeat_service_port = 9050
      brpc_port = 8060
    image:
      repository: starrocks/cn-ubuntu
      tag: 3.4.1
    replicas: 3
    resources:
      limits:
        cpu: 8
        memory: 8Gi
      requests:
        cpu: 1m
        memory: 10Mi
    storageSpec:
      name: cn
      logStorageSize: 1Gi
      storageSize: 10Gi
  starrocksFESpec:
    feEnvVars:
    - name: LOG_CONSOLE
      value: "1"
    config: |
      LOG_DIR = ${STARROCKS_HOME}/log
      DATE = "$(date +%Y%m%d-%H%M%S)"
      JAVA_OPTS="-Dlog4j2.formatMsgNoLookups=true -Xmx8192m -XX:+UseG1GC -Xlog:gc*:${LOG_DIR}/fe.gc.log.$DATE:time -XX:ErrorFile=${LOG_DIR}/hs_err_pid%p.log -Djava.security.policy=${STARROCKS_HOME}/conf/udf_security.policy"
      http_port = 8030
      rpc_port = 9020
      query_port = 9030
      edit_log_port = 9010
      mysql_service_nio_enabled = true
      sys_log_level = INFO
      run_mode = shared_data
      cloud_native_meta_port = 6090
      enable_load_volume_from_conf = true
      cloud_native_storage_type = S3
      aws_s3_path = xxx
      aws_s3_region = xxx
      aws_s3_endpoint = xxx
      aws_s3_access_key = xxx
      aws_s3_secret_key = xxx
      # we add this configuration because we want to get cluster snapshot quickly
      automated_cluster_snapshot_interval_seconds = 60
    replicas: 3
    image:
      repository: starrocks/fe-ubuntu
      tag: 3.4.1
    resources:
      limits:
        cpu: 2
        memory: 4Gi
      requests:
        cpu: 1m
        memory: 20Mi
    storageSpec:
      logStorageSize: 1Gi
      name: fe-storage
      storageSize: 10Gi

Note: We set automated_cluster_snapshot_interval_seconds to trigger every minute.

Next, use Helm to create the cluster.

helm install -f ./starrocks-values.yaml starrocks starrocks-community/kube-starrocks --version 1.10.0

# make sure the cluster has been successfully deployed
kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
kube-starrocks-cn-0      1/1     Running   0          23s
kube-starrocks-fe-0      1/1     Running   0          79s
kube-starrocks-fe-1      1/1     Running   0          79s
kube-starrocks-fe-2      1/1     Running   0          79s

2. Create Tables and Insert Data

Connect to the FE

# enter FE pod
kubectl exec -it kube-starrocks-fe-0 bash

# use mysql client to login
mysql -h 127.0.0.1 -P9030 -uroot
...
mysql>

Execute the following SQL statements:

# create database and table
CREATE DATABASE IF NOT EXISTS quickstart;

USE quickstart;

-- create table
CREATE TABLE source_wiki_edit
(
    event_time      DATETIME,
    channel         VARCHAR(32)      DEFAULT '',
    user            VARCHAR(128)     DEFAULT '',
    is_anonymous    TINYINT          DEFAULT '0',
    is_minor        TINYINT          DEFAULT '0',
    is_new          TINYINT          DEFAULT '0',
    is_robot        TINYINT          DEFAULT '0',
    is_unpatrolled  TINYINT          DEFAULT '0',
    delta           INT              DEFAULT '0',
    added           INT              DEFAULT '0',
    deleted         INT              DEFAULT '0'
)
DUPLICATE KEY(
    event_time,
    channel,
    user,
    is_anonymous,
    is_minor,
    is_new,
    is_robot,
    is_unpatrolled
)
PARTITION BY RANGE(event_time)(
    PARTITION p06 VALUES LESS THAN ('2015-09-12 06:00:00'),
    PARTITION p12 VALUES LESS THAN ('2015-09-12 12:00:00'),
    PARTITION p18 VALUES LESS THAN ('2015-09-12 18:00:00'),
    PARTITION p24 VALUES LESS THAN ('2015-09-13 00:00:00')
)
DISTRIBUTED BY HASH(user);

-- insert data
INSERT INTO source_wiki_edit
VALUES
    ("2015-09-12 00:00:00","#en.wikipedia","AustinFF",0,0,0,0,0,21,5,0),
    ("2015-09-12 00:00:00","#ca.wikipedia","helloSR",0,1,0,1,0,3,23,0),
    ("2015-09-12 08:00:00","#ca.wikipedia","helloSR",0,1,0,1,0,3,23,0);

-- select data
mysql> SELECT * FROM source_wiki_edit;
+---------------------+---------------+----------+--------------+----------+--------+----------+----------------+-------+-------+---------+
| event_time          | channel       | user     | is_anonymous | is_minor | is_new | is_robot | is_unpatrolled | delta | added | deleted |
+---------------------+---------------+----------+--------------+----------+--------+----------+----------------+-------+-------+---------+
| 2015-09-12 00:00:00 | #ca.wikipedia | helloSR  |            0 |        1 |      0 |        1 |              0 |     3 |    23 |       0 |
| 2015-09-12 00:00:00 | #en.wikipedia | AustinFF |            0 |        0 |      0 |        0 |              0 |    21 |     5 |       0 |
| 2015-09-12 08:00:00 | #ca.wikipedia | helloSR  |            0 |        1 |      0 |        1 |              0 |     3 |    23 |       0 |
+---------------------+---------------+----------+--------------+----------+--------+----------+----------------+-------+-------+---------+
3 rows in set (0.34 sec)

Generate a Cluster Snapshot

Start the backup:

mysql> ADMIN SET AUTOMATED CLUSTER SNAPSHOT ON STORAGE VOLUME builtin_storage_volume;
Query OK, 0 rows affected (0.10 sec)

Wait for the backup to complete.

mysql> SELECT * FROM INFORMATION_SCHEMA.CLUSTER_SNAPSHOT_JOBS;
+------------------------------------------+--------+---------------------+---------------------+-------------+-------------+---------------+
| SNAPSHOT_NAME | JOB_ID | CREATED_TIME | FINISHED_TIME | STATE | DETAIL_INFO | ERROR_MESSAGE |
+------------------------------------------+--------+---------------------+---------------------+-------------+-------------+---------------+
| automated_cluster_snapshot_1739857978127 | 10136 | 2025-02-18 13:52:58 | 2025-02-18 13:54:17 | FINISHED | | |
| automated_cluster_snapshot_1739858117584 | 10137 | 2025-02-18 13:55:17 | NULL | SNAPSHOTING | | |
+------------------------------------------+--------+---------------------+---------------------+-------------+-------------+---------------+
2 rows in set (0.02 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.CLUSTER_SNAPSHOTS;
+------------------------------------------+---------------+---------------------+---------------+--------------------+------------+------------------------+---------------------------------------------------------------------------------------------------------------------------------+
| SNAPSHOT_NAME | SNAPSHOT_TYPE | CREATED_TIME | FE_JOURNAL_ID | STARMGR_JOURNAL_ID | PROPERTIES | STORAGE_VOLUME | STORAGE_PATH |
+------------------------------------------+---------------+---------------------+---------------+--------------------+------------+------------------------+---------------------------------------------------------------------------------------------------------------------------------+
| automated_cluster_snapshot_1739857978127 | AUTOMATED | 2025-02-18 13:52:58 | 253 | 114 | | builtin_storage_volume | s3://xxx/7351ce6a-f4a4-4937-a876-cb8801085aea/meta/image/automated_cluster_snapshot_1739857978127 |
+------------------------------------------+---------------+---------------------+---------------+--------------------+------------+------------------------+---------------------------------------------------------------------------------------------------------------------------------+

Note: Since we configured backups to run every minute, the backup path used below may differ from the one shown earlier. The final result can be verified in S3.

s3cmd ls s3://xxx/data/7351ce6a-f4a4-4937-a876-cb8801085aea/meta/image/

sDIR s3://xxx/data/7351ce6a-f4a4-4937-a876-cb8801085aea/meta/image/automated_cluster_snapshot_1739858235830/

3. Delete the Created Cluster

# Uninstall the StarRocks cluster
helm uninstall starrocks

# Remove persisted data
kubectl get pvc | awk '{if (NR>1){print $1}}' | xargs kubectl delete pvc
persistentvolumeclaim "cn-data-kube-starrocks-cn-0" deleted
persistentvolumeclaim "cn-log-kube-starrocks-cn-0" deleted
persistentvolumeclaim "fe-storage-log-kube-starrocks-fe-0" deleted
persistentvolumeclaim "fe-storage-log-kube-starrocks-fe-1" deleted
persistentvolumeclaim "fe-storage-log-kube-starrocks-fe-2" deleted
persistentvolumeclaim "fe-storage-meta-kube-starrocks-fe-0" deleted
persistentvolumeclaim "fe-storage-meta-kube-starrocks-fe-1" deleted
persistentvolumeclaim "fe-storage-meta-kube-starrocks-fe-2" deleted

4. Create a New Cluster for Disaster Recovery

We will reuse the existing starrocks-values.yaml file, so make sure it has been saved properly. Next, prepare a new file named cluster_snapshot.yaml, which contains the configuration required for disaster recovery.

starrocks:
  starrocksCluster:
    disasterRecovery:
      enabled: true
      generation: 1

  starrocksFESpec:
    # mount the cluster_snapshot.yaml
    configMaps:
      - name: cluster-snapshot
        mountPath: /opt/starrocks/fe/conf/cluster_snapshot.yaml
        subPath: cluster_snapshot.yaml

configMaps:
  - name: cluster-snapshot
    data:
      cluster_snapshot.yaml: |
        # information about the cluster snapshot to be downloaded and restored
        cluster_snapshot:
          cluster_snapshot_path: s3://xxx/data/7351ce6a-f4a4-4937-a876-cb8801085aea/meta/image/automated_cluster_snapshot_1739858235830
          storage_volume_name: builtin_storage_volume

        # Operator will add the other FE followers automatically
        # just leave it blank
        frontends: []

        # Operator will add the CN nodes automatically
        # just leave it blank
        compute_nodes: []

        # used for restoring a cloned snapshot
        storage_volumes:
          - name: builtin_storage_volume
            type: S3
            location: s3://xxx/data
            comment: my s3 volume
            properties:
              - key: aws.s3.region
                value: xxx
              - key: aws.s3.endpoint
                value: xxx
              - key: aws.s3.access_key
                value: xxx
              - key: aws.s3.secret_key
                value: xxx

The command used to deploy the cluster is different from the initial deployment. This time, we need to specify two YAML files, where cluster_snapshot.yaml contains the configuration specifically for disaster recovery.

helm install -f ./starrocks-values.yaml -f cluster_snapshot.yaml starrocks starrocks-community/kube-starrocks --version 1.10.0

The detailed disaster recovery process is as follows:

1. The Operator starts an FE Pod and enables disaster recovery.

# ignore the operator pod
kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
kube-starrocks-fe-0   1/1     Running   0          4m37s

At this point, when we check the status of the StarRocksCluster, the output appears as follows:

kubectl get src kube-starrocks -o yaml | less

status:
  phase: running
  disasterRecoveryStatus:
    observedGeneration: 1
    phase: doing
    reason: disaster recovery is in progress
    startTimestamp: "1739860263"

2. After disaster recovery is complete, the Operator automatically starts the remaining Pods.

kubectl get pods
NAME                  READY   STATUS    RESTARTS   AGE
kube-starrocks-cn-0   1/1     Running   0          7m54s
kube-starrocks-fe-0   1/1     Running   0          7m1s
kube-starrocks-fe-1   1/1     Running   0          7m54s
kube-starrocks-fe-2   1/1     Running   0          7m54s

The cluster status appears as follows:

kubectl get src kube-starrocks -o yaml | less

status:
  phase: running
  disasterRecoveryStatus:
    observedGeneration: 1
    phase: done
    reason: disaster recovery is done
    startTimestamp: "1739860263"
    endTimestamp: "1739861262"

Verify Disaster Recovery Success

Check whether data recovery has been completed.

# enter the pod
kubectl exec -it kube-starrocks-fe-0 bash

# connect mysql
mysql -h 127.0.0.1 -P9030 -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 0
Server version: 8.0.33 branch-3.4-12d148f

Copyright (c) 2000, 2025, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# get the data
mysql> USE quickstart;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> SELECT * FROM source_wiki_edit;
+---------------------+---------------+----------+--------------+----------+--------+----------+----------------+-------+-------+---------+
| event_time          | channel       | user     | is_anonymous | is_minor | is_new | is_robot | is_unpatrolled | delta | added | deleted |
+---------------------+---------------+----------+--------------+----------+--------+----------+----------------+-------+-------+---------+
| 2015-09-12 08:00:00 | #ca.wikipedia | helloSR  |            0 |        1 |      0 |        1 |              0 |     3 |    23 |       0 |
| 2015-09-12 00:00:00 | #ca.wikipedia | helloSR  |            0 |        1 |      0 |        1 |              0 |     3 |    23 |       0 |
| 2015-09-12 00:00:00 | #en.wikipedia | AustinFF |            0 |        0 |      0 |        0 |              0 |    21 |     5 |       0 |
+---------------------+---------------+----------+--------------+----------+--------+----------+----------------+-------+-------+---------+
3 rows in set (2.00 sec)

Want to dive deeper into the technical details or join the discussion? Join StarRocks’ Slack channel to learn more.