ScyllaDB Docs ScyllaDB Enterprise Getting Started Scylla Integrations and Connectors Integrate Scylla with Databricks

Caution

You're viewing documentation for a previous version. Switch to the latest stable version.

Integrate Scylla with Databricks¶

Scylla is Apache Cassandra compatible at the CQL binary protocol level, and any driver which uses CQL will work with Scylla (more here). Any application which uses a CQL driver will work with Scylla, for example, Databricks Spark cluster.

Resource list¶

Although your requirements may be different, this example uses the following resources:

Scylla cluster
Databricks account

Integration instructions¶

Before you begin

Verify that you have installed Scylla and know the Scylla server IP address. Make sure you have a connection on port 9042:

curl <scylla_IP>:9042

Procedure

Create a new Databricks cluster with the following configuration:

Databricks runtime version:

Runtime: 9.1 LTS (Scala 2.12, Spark 3.1.2)

Spark config:

spark.sql.catalog.<your_catalog> com.datastax.spark.connector.datasource.CassandraCatalog
spark.sql.catalog.<your_catalog>.spark.cassandra.connection.host <your_host>
spark.cassandra.auth.username <your_username>
spark.cassandra.auth.password <your_password>

2. Once this set up, install connector library by Maven: (Path: Libraries –> Install new –> Maven –> Search Packages –> Maven Centrall)

com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.1.0

Test case

Prepare test data [Scylla]:

CREATE KEYSPACE databriks WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3};
CREATE TABLE databriks.demo1 (pk text PRIMARY KEY, ck1 text, ck2 text);
INSERT INTO databriks.demo1  (pk, ck1, ck2) VALUES ('pk', 'ck1', 'ck2');

Create and play new notebook [Databricks]:

df = spark.read.cassandraFormat.table("<your_catalog>.databriks.demo1")
display(df)

Was this page helpful?