ScyllaDB University LIVE, FREE Virtual Training Event | March 21
Register for Free
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Enterprise Scylla Architecture Raft Consensus Algorithm in ScyllaDB

Caution

You're viewing documentation for a previous version. Switch to the latest stable version.

Raft Consensus Algorithm in ScyllaDB¶

Introduction¶

ScyllaDB was originally designed, following Apache Cassandra, to use gossip for topology and schema updates and the Paxos consensus algorithm for strong data consistency (LWT). To achieve stronger consistency without performance penalty, ScyllaDB 5.x has turned to Raft - a consensus algorithm designed as an alternative to both gossip and Paxos.

Raft is a consensus algorithm that implements a distributed, consistent, replicated log across members (nodes). Raft implements consensus by first electing a distinguished leader, then giving the leader complete responsibility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines.

Raft uses a heartbeat mechanism to trigger a leader election. All servers start as followers and remain in the follower state as long as they receive valid RPCs (heartbeat) from a leader or candidate. A leader sends periodic heartbeats to all followers to maintain his authority (leadership). Suppose a follower receives no communication over a period called the election timeout. In that case, it assumes no viable leader and begins an election to choose a new leader.

Leader selection is described in detail in the Raft paper.

ScyllaDB 5.x may use Raft to maintain schema updates in every node (see below). Any schema update, like ALTER, CREATE or DROP TABLE, is first committed as an entry in the replicated Raft log, and, once stored on most replicas, applied to all nodes in the same order, even in the face of a node or network failures.

Following ScyllaDB 5.x releases will use Raft to guarantee consistent topology updates similarly.

Quorum Requirement¶

Raft requires at least a quorum of nodes in a cluster to be available. If multiple nodes fail and the quorum is lost, the cluster is unavailable for schema updates. See Handling Failures for information on how to handle failures.

Upgrade Considerations for SyllaDB 5.0 and Later¶

Note that when you have a two-DC cluster with the same number of nodes in each DC, the cluster will lose the quorum if one of the DCs is down. We recommend configuring three DCs per cluster to ensure that the cluster remains available and operational when one DC is down.

Enabling Raft¶

Enabling Raft in ScyllaDB 5.0 and 5.1¶

Warning

In ScyllaDB 5.0 and 5.1, Raft is an experimental feature.

It is not possible to enable Raft in an existing cluster in ScyllaDB 5.0 and 5.1. In order to have a Raft-enabled cluster in these versions, you must create a new cluster with Raft enabled from the start.

Warning

Do not use Raft in production clusters in ScyllaDB 5.0 and 5.1. Such clusters won’t be able to correctly upgrade to ScyllaDB 5.2.

Use Raft only for testing and experimentation in clusters which can be thrown away.

Warning

Once enabled, Raft cannot be disabled on your cluster. The cluster nodes will fail to restart if you remove the Raft feature.

When creating a new cluster, add raft to the list of experimental features in your scylla.yaml file:

experimental_features:
 - raft

Verifying that Raft is enabled¶

You can verify that Raft is enabled on your cluster in one of the following ways:

  • Retrieve the list of supported features by running:

    cqlsh> SELECT supported_features FROM system.local;
    

    With Raft enabled, the list of supported features in the output includes SUPPORTS_RAFT_CLUSTER_MANAGEMENT. For example:

    supported_features
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    CDC,CDC_GENERATIONS_V2,COMPUTED_COLUMNS,CORRECT_COUNTER_ORDER,CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX,CORRECT_NON_COMPOUND_RANGE_TOMBSTONES,CORRECT_STATIC_COMPACT_IN_MC,COUNTERS,DIGEST_FOR_NULL_VALUES,DIGEST_INSENSITIVE_TO_EXPIRY,DIGEST_MULTIPARTITION_READ,HINTED_HANDOFF_SEPARATE_CONNECTION,INDEXES,LARGE_PARTITIONS,LA_SSTABLE_FORMAT,LWT,MATERIALIZED_VIEWS,MC_SSTABLE_FORMAT,MD_SSTABLE_FORMAT,ME_SSTABLE_FORMAT,NONFROZEN_UDTS,PARALLELIZED_AGGREGATION,PER_TABLE_CACHING,PER_TABLE_PARTITIONERS,RANGE_SCAN_DATA_VARIANT,RANGE_TOMBSTONES,ROLES,ROW_LEVEL_REPAIR,SCHEMA_TABLES_V3,SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT,STREAM_WITH_RPC_STREAM,SUPPORTS_RAFT_CLUSTER_MANAGEMENT,TOMBSTONE_GC_OPTIONS,TRUNCATION_TABLE,UDA,UNBOUNDED_RANGE_TOMBSTONES,VIEW_VIRTUAL_COLUMNS,WRITE_FAILURE_REPLY,XXHASH
    
  • Retrieve the list of experimental features by running:

    cqlsh> SELECT value FROM system.config WHERE name = 'experimental_features'
    

    With Raft enabled, the list of experimental features in the output includes raft.

Safe Schema Changes with Raft¶

In ScyllaDB, schema is based on Data Definition Language (DDL). In earlier ScyllaDB versions, schema changes were tracked via the gossip protocol, which might lead to schema conflicts if the updates are happening concurrently.

Implementing Raft eliminates schema conflicts and allows full automation of DDL changes under any conditions, as long as a quorum of nodes in the cluster is available. The following examples illustrate how Raft provides the solution to problems with schema changes.

  • A network partition may lead to a split-brain case, where each subset of nodes has a different version of the schema.

    With Raft, after a network split, the majority of the cluster can continue performing schema changes, while the minority needs to wait until it can rejoin the majority. Data manipulation statements on the minority can continue unaffected, provided the quorum requirement is satisfied.

  • Two or more conflicting schema updates are happening at the same time. For example, two different columns with the same definition are simultaneously added to the cluster. There is no effective way to resolve the conflict - the cluster will employ the schema with the most recent timestamp, but changes related to the shadowed table will be lost.

    With Raft, concurrent schema changes are safe.

In summary, Raft makes schema changes safe, but it requires that a quorum of nodes in the cluster is available.

Handling Failures¶

Raft requires a quorum of nodes in a cluster to be available. If one or more nodes are down, but the quorum is live, reads, writes, and schema udpates proceed unaffected. When the node that was down is up again, it first contacts the cluster to fetch the latest schema and then starts serving queries.

The following examples show the recovery actions depending on the number of nodes and DCs in your cluster.

Examples¶

Cluster A: 1 datacenter, 3 nodes¶

Failure

Consequence

Action to take

1 node

Schema updates are possible and safe.

Try restarting the node. If the node is dead, replace it with a new node.

2 nodes

Cluster is not fully opetarional. The data is available for reads and writes, but schema changes are impossible.

Restart at least 1 of the 2 nodes that are down to regain quorum. If you can’t recover at least 1 of the 2 nodes, contact ScyllaDB support for assistance.

1 datacenter

Cluster is not fully opetarional. The data is available for reads and writes, but schema changes are impossible.

When the DC comes back online, restart the nodes. If the DC does not come back online and nodes are lost, restore the latest cluster backup into a new cluster. You can contact ScyllaDB support for assistance.

Cluster B: 2 datacenters, 6 nodes (3 nodes per DC)¶

Failure

Consequence

Action to take

1-2 nodes

Schema updates are possible and safe.

Try restarting the node(s). If the node is dead, replace it with a new node.

3 nodes

Cluster is not fully opetarional. The data is available for reads and writes, but schema changes are impossible.

Restart 1 of the 3 nodes that are down to regain quorum. If you can’t recover at least 1 of the 3 failed nodes, contact ScyllaDB support for assistance.

1DC

Cluster is not fully opetarional. The data is available for reads and writes, but schema changes are impossible.

When the DCs come back online, restart the nodes. If the DC fails to come back online and the nodes are lost, restore the latest cluster backup into a new cluster. You can contact ScyllaDB support for assistance.

Cluster C: 3 datacenter, 9 nodes (3 nodes per DC)¶

Failure

Consequence

Action to take

1-4 nodes

Schema updates are possible and safe.

Try restarting the nodes. If the nodes are dead, replace them with new nodes.

1 DC

Schema updates are possible and safe.

When the DC comes back online, try restarting the nodes in the cluster. If the nodes are dead, add 3 new nodes in a new region.

2 DCs

Cluster is not fully opetarional. The data is available for reads and writes, but schema changes are impossible.

When the DCs come back online, restart the nodes. If at least one DC fails to come back online and the nodes are lost, restore the latest cluster backup into a new cluster. You can contact ScyllaDB support for assistance.

Learn More About Raft¶

  • The Raft Consensus Algorithm

  • Achieving NoSQL Database Consistency with Raft in ScyllaDB - A tech talk by Konstantin Osipov

  • Making Schema Changes Safe with Raft - A Scylla Summit talk by Konstantin Osipov (register for access)

  • The Future of Consensus in ScyllaDB 5.0 and Beyond - A Scylla Summit talk by Tomasz Grabiec (register for access)

Was this page helpful?

PREVIOUS
Choose a Compaction Strategy
NEXT
Troubleshooting Scylla
  • Create an issue

On this page

  • Raft Consensus Algorithm in ScyllaDB
    • Introduction
    • Quorum Requirement
      • Upgrade Considerations for SyllaDB 5.0 and Later
    • Enabling Raft
      • Enabling Raft in ScyllaDB 5.0 and 5.1
      • Verifying that Raft is enabled
    • Safe Schema Changes with Raft
    • Handling Failures
      • Examples
    • Learn More About Raft
ScyllaDB Enterprise
  • 2022.2
    • 2024.2
    • 2024.1
    • 2023.1
    • 2022.2
  • Getting Started
    • Install Scylla
      • ScyllaDB Web Installer for Linux
      • Scylla Unified Installer (relocatable executable)
      • Air-gapped Server Installation
      • What is in each RPM
      • Scylla Housekeeping and how to disable it
      • Scylla Developer Mode
      • Scylla Configuration Reference
    • Configure Scylla
    • ScyllaDB Requirements
      • System Requirements
      • OS Support by Platform and Version
      • Scylla in a Shared Environment
    • Migrate to ScyllaDB
      • Migration Process from Cassandra to Scylla
      • Scylla and Apache Cassandra Compatibility
      • Migration Tools Overview
    • Integration Solutions
      • Integrate Scylla with Spark
      • Integrate Scylla with KairosDB
      • Integrate Scylla with Presto
      • Integrate Scylla with Elasticsearch
      • Integrate Scylla with Kubernetes
      • Integrate Scylla with the JanusGraph Graph Data System
      • Integrate Scylla with DataDog
      • Integrate Scylla with Kafka
      • Integrate Scylla with IOTA Chronicle
      • Integrate Scylla with Spring
      • Shard-Aware Kafka Connector for Scylla
      • Install Scylla with Ansible
      • Integrate Scylla with Databricks
    • Tutorials
  • Scylla for Administrators
    • Administration Guide
    • Procedures
      • Cluster Management
      • Backup & Restore
      • Change Configuration
      • Maintenance
      • Best Practices
      • Benchmarking Scylla
      • Migrate from Cassandra to Scylla
      • Disable Housekeeping
    • Security
      • Scylla Security Checklist
      • Enable Authentication
      • Enable and Disable Authentication Without Downtime
      • Generate a cqlshrc File
      • Reset Authenticator Password
      • Enable Authorization
      • Grant Authorization CQL Reference
      • Role Based Access Control (RBAC)
      • Scylla Auditing Guide
      • Encryption: Data in Transit Client to Node
      • Encryption: Data in Transit Node to Node
      • Generating a self-signed Certificate Chain Using openssl
      • Encryption at Rest
      • LDAP Authentication
      • LDAP Authorization (Role Management)
    • Admin Tools
      • Nodetool Reference
      • CQLSh
      • REST
      • Tracing
      • Scylla SStable
      • Scylla Types
      • SSTableLoader
      • cassandra-stress
      • SSTabledump
      • SSTable2json
      • SSTable Index
      • Scylla Logs
      • Seastar Perftune
      • Virtual Tables
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
    • ScyllaDB Manager
    • Upgrade Procedures
      • Scylla Enterprise
      • Scylla Open Source to Scylla Enterprise
      • Scylla AMI
    • System Configuration
      • System Configuration Guide
      • scylla.yaml
      • Scylla Snitches
    • Benchmarking Scylla
  • Scylla for Developers
    • Learn To Use Scylla
      • Scylla University
      • Course catalog
      • Scylla Essentials
      • Basic Data Modeling
      • Advanced Data Modeling
      • MMS - Learn by Example
      • Care-Pet an IoT Use Case and Example
    • Scylla Alternator
    • Scylla Features
      • Scylla Open Source Features
      • Scylla Enterprise Features
    • Scylla Drivers
      • Scylla CQL Drivers
      • Scylla DynamoDB Drivers
  • CQL Reference
    • CQLSh: the CQL shell
    • Appendices
    • Compaction
    • Consistency Levels
    • Consistency Level Calculator
    • Data Definition
    • Data Manipulation
    • Data Types
    • Definitions
    • Global Secondary Indexes
    • Additional Information
    • Expiring Data with Time to Live (TTL)
    • Additional Information
    • Functions
    • JSON Support
    • Materialized Views
    • Non-Reserved CQL Keywords
    • Reserved CQL Keywords
    • ScyllaDB CQL Extensions
  • Scylla Architecture
    • Scylla Ring Architecture
    • Scylla Fault Tolerance
    • Consistency Level Console Demo
    • Scylla Anti-Entropy
      • Scylla Hinted Handoff
      • Scylla Read Repair
      • Scylla Repair
    • SSTable
      • Scylla SSTable - 2.x
      • ScyllaDB SSTable - 3.x
    • Compaction Strategies
    • Raft Consensus Algorithm in ScyllaDB
  • Troubleshooting Scylla
    • Errors and Support
      • Report a Scylla problem
      • Error Messages
      • Change Log Level
    • Scylla Startup
      • Ownership Problems
      • Scylla will not Start
      • Scylla Python Script broken
    • Cluster and Node
      • Failed Decommission Problem
      • Cluster Timeouts
      • Node Joined With No Data
      • SocketTimeoutException
      • NullPointerException
    • Data Modeling
      • Scylla Large Partitions Table
      • Scylla Large Rows and Cells Table
      • Large Partitions Hunting
    • Data Storage and SSTables
      • Space Utilization Increasing
      • Disk Space is not Reclaimed
      • SSTable Corruption Problem
      • Pointless Compactions
      • Limiting Compaction
    • CQL
      • Time Range Query Fails
      • COPY FROM Fails
      • CQL Connection Table
      • Reverse queries fail
    • Scylla Monitor and Manager
      • Manager and Monitoring integration
      • Manager lists healthy nodes as down
  • Knowledge Base
    • Upgrading from experimental CDC
    • Compaction
    • Counting all rows in a table is slow
    • CQL Query Does Not Display Entire Result Set
    • When CQLSh query returns partial results with followed by “More”
    • Run Scylla and supporting services as a custom user:group
    • Decoding Stack Traces
    • Snapshots and Disk Utilization
    • DPDK mode
    • Debug your database with Flame Graphs
    • How to Change gc_grace_seconds for a Table
    • Gossip in Scylla
    • Increase Permission Cache to Avoid Non-paged Queries
    • How does Scylla LWT Differ from Apache Cassandra ?
    • Map CPUs to Scylla Shards
    • Scylla Memory Usage
    • NTP Configuration for Scylla
    • Updating the Mode in perftune.yaml After a ScyllaDB Upgrade
    • POSIX networking for Scylla
    • Scylla consistency quiz for administrators
    • Recreate RAID devices
    • How to Safely Increase the Replication Factor
    • Scylla and Spark integration
    • Increase Scylla resource limits over systemd
    • Scylla Seed Nodes
    • How to Set up a Swap Space
    • Scylla Snapshots
    • Scylla payload sent duplicated static columns
    • Stopping a local repair
    • System Limits
    • How to flush old tombstones from a table
    • Time to Live (TTL) and Compaction
    • Scylla Nodes are Unresponsive
    • Update a Primary Key
    • Using the perf utility with Scylla
    • Configure Scylla Networking with Multiple NIC/IP Combinations
  • ScyllaDB University
  • Scylla FAQ
  • Contribute to ScyllaDB
  • Glossary
  • Alternator: DynamoDB API in Scylla
    • Getting Started With ScyllaDB Alternator
    • Scylla Alternator for DynamoDB users
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 12 May 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6