Was this page helpful?
This article explains how to enable consistent topology changes when you upgrade from version 2024.1 to 2024.2.
ScyllaDB Open Source 2024.2 introduces consistent topology changes based on Raft. Clusters created with version 2024.2 use consistent topology changes right from the start. However, consistent topology changes are not automatically enabled in clusters upgraded from version 2024.1. In such clusters, you need to enable consistent topology changes manually by following the procedure described in this article.
Before you start, you must check that the cluster meets the prerequisites and ensure that some administrative procedures will not be run while the procedure is in progress.
Make sure that all nodes in the cluster are upgraded to ScyllaDB Enterprise 2024.2.
Verify that schema on raft is enabled.
Make sure that all nodes enabled SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
cluster feature.
One way to verify it is to look for the following message in the log:
features - Feature SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES is enabled
Alternatively, it can be verified programmatically by checking whether the value
column under the enabled_features
key contains the name of the feature in
the system.scylla_local
table. One way to do it is with the following bash script:
until cqlsh -e "select value from system.scylla_local where key = 'enabled_features'" | grep "SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES"
do
echo "Upgrade didn't finish yet on the local node, waiting 10 seconds before checking again..."
sleep 10
done
echo "Upgrade completed on the local node"
Make sure that all nodes are alive for the duration of the procedure.
Make sure that administrative operations will not be running while the procedure is in progress. In particular, you must abstain from:
Cluster management procedures (adding, replacing, removing, decommissioning nodes, etc.).
Running nodetool repair.
Running nodetool checkAndRepairCdcStreams.
Any modifications of authentication and authorization settings.
Any change of authorization via CQL API.
Schema changes.
Warning
Before proceeding, make sure that all the prerequisites are met and no forbidden administrative operations will run during the procedure. Failing to do so may put the cluster in an inconsistent state.
Issue a POST HTTP request to the /storage_service/raft_topology/upgrade
endpoint to any of the nodes in the cluster.
For example, you can do it with curl
:
curl -X POST "http://127.0.0.1:10000/storage_service/raft_topology/upgrade"
Wait until all nodes report that the procedure is complete. You can check whether a node finished the procedure in one of two ways:
By sending a HTTP GET
request on the /storage_service/raft_topology/upgrade
endpoint. For example, you can do it with curl
:
curl -X GET "http://127.0.0.1:10000/storage_service/raft_topology/upgrade"
It will return a JSON string that will be equal to done
after the procedure is complete on that node.
By querying the upgrade_state
column in the system.topology
table.
You can use cqlsh
to get the value of the column:
cqlsh -e "select upgrade_state from system.topology"
The upgrade_state
column should be set to done
after the procedure
is complete on that node:
After the procedure is complete on all nodes, wait at least one minute before issuing any topology changes in order to avoid data loss from writes that were started before the procedure.
If the procedure gets stuck at some point, first check the status of your cluster:
If there are some nodes that are not alive, try to restart them.
If all nodes are alive, ensure that the network is healthy and every node can reach all other nodes.
If all nodes are alive and the network is healthy, perform a rolling restart of the cluster.
If none of the above solves the issue, perform the Raft recovery procedure. During recovery, the cluster will switch back to the gossip-based topology management mechanism.
After exiting recovery, you should retry enabling consistent topology updates using the procedure described in this document.