ScyllaDB Docs ScyllaDB Enterprise ScyllaDB for Administrators Procedures ScyllaDB Best Practices Best Practices for Running ScyllaDB on Docker

Caution

You're viewing documentation for a previous version. Switch to the latest stable version.

Best Practices for Running ScyllaDB on Docker¶

This is an article on how to use the ScyllaDB Docker image to start up a ScyllaDB node, access nodetool and cqlsh utilities, start a cluster of ScyllaDB nodes, configure data volume for storage, configure resource limits of the Docker container, use additional command line flags and overwrite scylla.yaml settings. Finally, there is an additional section with some basic usage of ScyllaDB within Docker.

See also the image description on Docker Hub or our original blog.

Please note that these instructions assume that you have configured Docker so that you can run it as a regular user. Usually, this is done by adding the user to a Docker group. See your platform-specific Docker installation documentation on how to do that (see, for example, instructions for Fedora and Ubuntu). If you have not configured a Docker group, you need to prefix the Docker commands with sudo to have sufficient permissions to run them.

NOTE: You should allocate a minimum of 1.5 GB of RAM per container.

Basic Operations¶

Starting a Single ScyllaDB Node¶

To start a single ScyllaDB node instance in a Docker container, run:

docker run --name some-scylla -d scylladb/scylla

If you’re on macOS and plan to start a multi-node cluster (3 nodes or more), start ScyllaDB with –reactor-backend=epoll to override the default linux-aio reactor backend:

docker run --name some-scylla -d scylladb/scylla --reactor-backend=epoll

The docker run command starts a new Docker instance in the background named some-scylla that runs the ScyllaDB server:

docker ps

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                          NAMES
616ee646cb9d        scylladb/scylla     "/docker-entrypoint.p"   4 seconds ago       Up 4 seconds        7000-7001/tcp, 9042/tcp, 9160/tcp, 10000/tcp   some-scylla

As seen from the docker ps output, the image exposes ports 7000-7001 (Inter-node RPC), 9042 (CQL), and 10000 (REST API).

Viewing ScyllaDB Server Logs¶

To access ScyllaDB server logs, you can use the docker logs command:

docker logs some-scylla  | tail

INFO  2016-11-09 10:27:48,191 [shard 6] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 4] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 3] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 1] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy

Checking Server Status with Nodetool¶

The Docker image also has ScyllaDB’s utilities installed. Nodetool is a command line tool for querying and managing a ScyllaDB cluster. The simplest nodetool command is nodetool status, which displays information about the cluster state:

docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.17.0.2  125 KB     256     100.0%            c1906b2b-ce0c-4890-a9d4-8c360f111ad0  rack1

Using cqlsh¶

The cqlsh tool (CQL Shell) is an interactive Cassandra Query Language (CQL) shell for querying and manipulating data in the ScyllaDB cluster.

To start an interactive session, run the following command:

docker exec -it some-scylla cqlsh
Connected to Test Cluster at 172.17.0.2:9042.
[cqlsh 5.0.1 | Cassandra 2.1.8 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.

and then run CQL queries against the cluster:

cqlsh> SELECT cluster_name FROM system.local;

cluster_name
--------------
Test Cluster

(1 rows)

Starting a Cluster¶

With a single some-scylla instance running, joining new nodes to form a cluster is easy:

docker run --name some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"

If you’re on macOS, ensure to add the –reactor-backend=epoll option when adding new nodes:

docker run --name some-scylla2 -d scylladb/scylla --reactor-backend=epoll --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"

To query when the node is up and running (and view the status of the entire cluster) use the nodetool status command:

docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.17.0.3  177.48 KB  256     100.0%            097caff5-892d-412f-af78-11d572795d6f  rack1
UN  172.17.0.2  125 KB     256     100.0%            c1906b2b-ce0c-4890-a9d4-8c360f111ad0  rack1

Restarting ScyllaDB from within the Running Node¶

The Docker image uses supervisord to manage ScyllaDB processes. You can restart ScyllaDB in a Docker container using:

docker exec -it some-scylla supervisorctl restart scylla

Configuring a Data Volume for Storage¶

The default filesystem in Docker is inadequate for anything else than just testing out ScyllaDB, but you can use Docker volumes for improving storage performance.

To use data volumes, ensure first that it’s on a ScyllaDB-supported filesystem like XFS, then create a ScyllaDB data directory /var/lib/scylla on the host. This will be used by ScyllaDB container to store all data:

sudo mkdir -p /var/lib/scylla/data /var/lib/scylla/commitlog

Then launch ScyllaDB instances using Docker’s --volume command line option to mount the created host directory as a data volume in the container and disable ScyllaDB’s developer mode to run I/O tuning before starting up the ScyllaDB node.

docker run --name some-scylla --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla --developer-mode=0

Overriding scylla.yaml with a Master File¶

Sometimes, it’s not possible to adjust ScyllaDB-specific settings (including non-network properties, like cluster_name ) directly from the command line when ScyllaDB is running within Docker.

Instead, it may be necessary to incrementally override scylla.yaml settings by passing an external, master ScyllaDB.yaml file when starting the Docker container for the node.

To do this, you can use the --volume (-v) command as before to specify the overriding .yaml file:

NOTE: you can create a master_scylla.yaml in current host dir: just copy the file from https://github.com/scylladb/scylla/blob/master/conf/scylla.yaml.

On the host, create and edit master_scylla.yaml, for example. Uncomment and change the “cluster_name” parameter.
Start the ScyllaDB node, with the command to override scylla.yaml with master_scylla.yaml :

docker run --name some-scylla --volume ~/master_scylla.yaml:/etc/scylla/scylla.yaml -d scylladb/scylla

NOTE: You can start a Docker node with any other alternate parameter configured in scylla.yaml using this technique.

Finally, you can check that the setting was changed:

docker exec -it some-scylla nodetool describecluster

Cluster Information:
       Name: Doobie Snarf
       Snitch: org.apache.cassandra.locator.SimpleSnitch
       Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
       Schema versions:
               34259144-0f3f-305f-a777-2811e30e17b3: [172.17.0.2]

Getting Performance out of your Docker Container¶

By default, our Docker image defaults to a mode where ScyllaDB’s architectural optimizations are not enabled. With these command-line settings, you can introduce incremental changes that boost your ScyllaDB performance on Docker even more.

Configuring Resource Limits¶

ScyllaDB uses all CPUs and memory by default. To configure resource limits for your Docker container, you can use the --smp, --memory, and --cpuset command line options documented in the section “Network and command-line settings” below.

The recommended way to run multiple ScyllaDB instances on the same physical hardware is by statically partitioning all resources. For example, using the --cpuset option to assign cores 0 and 1 to one instance, and 2 and 3 to another.

In scenarios in which static partitioning is not desired - like mostly-idle cluster without hard latency requirements, the --overprovisioned command-line option is recommended. This enables certain optimizations for ScyllaDB to run efficiently in an overprovisioned environment.

NOTE: You should allocate a minimum of 1.5 GB of RAM per container.

Network and Command-Line Settings¶

The ScyllaDB image supports many command line options that are passed to the Docker run command. Keep in mind that these command-line settings override the corresponding settings in your scylla.yaml.

–seeds SEEDS¶

The --seeds command line option configures ScyllaDB’s seed nodes. If no --seeds option is specified, ScyllaDB uses its own IP address as the seed.

For example, to configure ScyllaDB to run with the seed node 192.168.0.100, run:

docker run --name some-scylla -d scylladb/scylla --seeds 192.168.0.100

See ScyllaDB Seed Nodes for details.

–listen-address ADDR¶

The --listen-address command line option configures the IP address the ScyllaDB instance listens for client connections.

For example, to configure ScyllaDB to use listen address 10.0.0.5:

docker run --name some-scylla -d scylladb/scylla --listen-address 10.0.0.5

–broadcast-address ADDR¶

The --broadcast-address command line option configures the IP address the ScyllaDB instance tells other ScyllaDB nodes in the cluster to connect to.

For example, to configure ScyllaDB to use broadcast address 10.0.0.5:

docker run --name some-scylla -d scylladb/scylla --broadcast-address 10.0.0.5

–broadcast-rpc-address ADDR¶

The --broadcast-rpc-address command line option configures the IP address the ScyllaDB instance tells clients to connect to.

For example, to configure ScyllaDB to use broadcast RPC address 10.0.0.5:

docker run --name some-scylla -d scylladb/scylla --broadcast-rpc-address 10.0.0.5

–smp COUNT¶

The --smp command line option restricts ScyllaDB to COUNT number of CPUs. The option does not, however, mandate a specific placement of CPUs. See the --cpuset command line option if you need ScyllaDB to run on specific CPUs.

For example, to restrict ScyllaDB to 2 CPUs:

docker run --name some-scylla -d scylladb/scylla --smp 2

–memory AMOUNT¶

The --memory command line option restricts ScyllaDB to use up to AMOUNT of memory. The AMOUNT value supports both M unit for megabytes and G unit for gigabytes.

For example, to restrict ScyllaDB to 4 GB of memory:

docker run --name some-scylla -d scylladb/scylla --memory 4G

**NOTE: You should allocate a minimum of 1.5 GB of RAM per container.**

–overprovisioned ENABLE¶

The --overprovisioned command line option enables or disables optimizations for running ScyllaDB in an overprovisioned environment. If no --overprovisioned option is specified, ScyllaDB defaults to running with optimizations enabled.

For example, to enable optimizations for running in an overprovisioned environment:

docker run --name some-scylla -d scylladb/scylla --overprovisioned 1

–cpuset CPUSET¶

The --cpuset command line option restricts ScyllaDB to run on only on CPUs specified by CPUSET. The CPUSET value is either a single CPU (e.g. --cpuset 1), a range (e.g. --cpuset 2-3), or a list (e.g. --cpuset 1,2,5), or a combination of the last two options (e.g. --cpuset 1-2,5).

For example, to restrict ScyllaDB to run on physical CPUs 0 to 2 and 4:

docker run --name some-scylla -d scylladb/scylla --cpuset 0-2,4

–developer-mode ENABLE¶

The --developer-mode command line option enables ScyllaDB’s developer mode, which relaxes checks for things like XFS and enables ScyllaDB to run on unsupported configurations (which usually results in suboptimal performance). If no --developer-mode command line option is defined, ScyllaDB defaults to running with developer mode enabled.

It is highly recommended to disable developer mode for production deployments to ensure ScyllaDB is able to run with maximum performance.

To disable developer mode:

docker run --name some-scylla -d scylladb/scylla --developer-mode 0

–experimental-features FEATURE¶

The --experimental-features command line option enables ScyllaDB’s experimental feature individually. If no feature flags are specified, ScyllaDB runs with only stable features enabled.

Running experimental features in production environments is not recommended.

For example, to enable the User Defined Functions (UDF) feature:

docker run --name some-scylla -d scylladb/scylla --experimental-features=udf

Other Useful Tips and Tricks¶

Checking the Current Version of ScyllaDB on the Node¶

docker exec -it some-scylla scylla --version

Using a Local CSV File to Import Data into ScyllaDB¶

First, download the file locally to the node:

sudo docker exec -it some-scylla.2.0.1 curl -o file.csv https://<url>.com/<path>/<path>/<file>.csv

Once you have the .csv downloaded, you can use the CQL COPY FROM command as explained here to load the data into ScyllaDB.

Such a copy command might look like this:

cqlsh:my_keyspace> COPY <table_name> FROM 'file.csv' WITH HEADER=true;

Searching for a setting in scylla.yaml¶

scylla.yaml can be found at /etc/scylla/scylla.yaml. In this case, you can search for a specific entry in the file. For example, if you wanted to determine if a setup was experimental and were to search for experimental in the file, you could try:

docker exec -it some-scylla grep -H 'experimental' /etc/scylla/scylla.yaml

Apache®, Apache Cassandra®, Cassandra®, the Apache feather logo and the Apache Cassandra® Eye logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Was this page helpful?