Apr 25, 2018

Get started with Cassandra Query Language (CQL) : Cassandra CRUD Operation using CQLSH Client

Prerequisite: In order to follow this tutorial and do hands-on, make sure you have installed Apache Cassandra and its up & running at least locally on one node. I have placed Apache Cassandra at
"/Users/n0r0082/opt/packages/cassandra/apache-cassandra-2.1.1".

The Apache Cassandra installation includes the cqlsh utilitya python-based command line client for executing Cassandra Query Language (CQL) commands.

Before going into details of database operations lets familiarise with Cassandra terminologies :
  • Cluster - It is collection of nodes(data centers) arranged in a ring architecture.
  • Keyspace - The keyspace is the outermost container for data in Cassandra. The main attributes to keyspace are the Replication Factor, the Replica Placement Strategy and the Column Families. Relational database schema is analogous to keyspace in Cassandra. 
  • Column Family - Column Families in Cassandra are like tables in Relational Databases. Each Column Family contains a collection of rows. The key gives the ability to access related data together,
  • Column- A column in Cassandra is a data structure which contains a column name, a value and a timestamp.
Lets start with outermost shell creation and later does CRUD Operations followed by table creation. We will execute CSQL in cqsh utility.

Start cqslh utility present in bin directory of Cassandra Installation which in turn gives cqlsh prompt.
  ~ cd /Users/n0r0082/opt/packages/cassandra/apache-cassandra-2.1.1/bin
  bin ./cqlsh 
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.1 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh> 

Create a keyspace : Create a keyspace named as "OCPorc" with replication strategy attributes. On successful execution prompt does not return any status. We can see it using describe keyspaces.
cqlsh:my_status> CREATE KEYSPACE "OCProc"
             ... WITH REPLICATION = {
             ...   'class': 'SimpleStrategy', 'replication_factor': 1
             ... };

List keySpaces : Below describe command lists all keyspaces present in given cluster node.
cqlsh:my_status> describe keyspaces;

system_traces  my_status  "OCProc"  system

Selecting a keyspace : Since column families (like table in RDMS) are created in context of keyspace, we need to select keyspace we want to work with.
cqlsh:my_status> use "OCProc" ;
cqlsh:OCProc> 

Create Table(Column Family) in selected Keyspace: Create table name USERS with username as primary key and other associated properties like boolean_filter, grace_seconds, etc. In Cassandra terminology Table are termed as Column Family.  Instead of using "Create TABLE USERS ....." keyword, we can use "CREATE COLUMNFAMILY USERS ..... "
cqlsh:OCProc> CREATE TABLE "USERS" (
          ...   "username" text PRIMARY KEY,
          ...   "email" text,
          ...   "city" text,
          ...   "phone" varint,
          ...   "encrypted_password" blob
          ... ) WITH bloom_filter_fp_chance = 0.01
          ...    AND comment = 'Creating USERS Tabel to store users details'
          ...    AND dclocal_read_repair_chance = 0.1
          ...    AND default_time_to_live = 0
          ...    AND gc_grace_seconds = 864000
          ...    AND max_index_interval = 2048
          ...    AND memtable_flush_period_in_ms = 0
          ...    AND min_index_interval = 128
          ...    AND read_repair_chance = 0.0
          ...    AND compaction = {'min_threshold': '4', 'class':
          ...    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
          ...    'max_threshold': '32'}
          ...    AND compression = {'sstable_compression':
          ...    'org.apache.cassandra.io.compress.LZ4Compressor'}
          ...    AND speculative_retry = '99.0PERCENTILE';
cqlsh:OCProc>  describe tables;

"USERS"

Insert data in USERS Table(
Column Family) : Below we insert 3 rows in USERS table.  On successful execution of INSERT statement we do not see any response in the shell whatsoever; it should simply provide us with a new command prompt.
cqlsh:OCProc> INSERT INTO "USERS"
          ... ("username", "email", "city", "phone", "encrypted_password")
          ... VALUES (
          ...   'zytham',
          ...   'zytham@gmail.com',
          ...   'Patna',
          ...   9999888800,
          ...   0x9792977ed729792e403da53024c6069a9158b8c4
          ... );
cqlsh:OCProc> INSERT INTO "USERS"
          ... ("username", "email", "city", "phone", "encrypted_password")
          ... VALUES(
          ...   'ranjan',
          ...   'ranjan@gmail.com',
          ...   'Bangalore',
          ...    678998800,
          ...   0x8914977ed729792e403da53024c6069a9158b8c4
          ... );
cqlsh:OCProc> 
cqlsh:OCProc> INSERT INTO "USERS"
          ... ("username", "email", "city", "phone", "encrypted_password")
          ... VALUES(
          ...   'mishra',
          ...   'zytham@gmail.com',
          ...   'Torento',
          ...    00980099766,
          ...   0x891497745729792e403da53024c6069a9158b8c4
          ... );
Note: We can insert columns values partially. i.e: Some fields can be empty, like below command insert one row but phone field is empty.
cqlsh:OCProc> INSERT INTO "USERS"
          ... ("username", "email", "city", "encrypted_password")
          ... VALUES (
          ...   'Rahul',
          ...   'rahul@gmail.com',
          ...   'Mumbai',
          ...   0x9792977ed729792e403da53024c6069a9158b8c4
          ... );

Select rows from USERS table(Column Family): SELECT command is same as used in RDBMS SQL. it returns all 4 records, notice phone is null in row with username "Rahul"
cqlsh:OCProc> SELECT * FROM "USERS"
          ... ;

 username | city      | email            | encrypted_password                         | phone
----------+-----------+------------------+--------------------------------------------+------------
   zytham |     Patna | zytham@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   ranjan | Bangalore | ranjan@gmail.com | 0x8914977ed729792e403da53024c6069a9158b8c4 |  678998800
    Rahul |    Mumbai |  rahul@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 |       null
   mishra |   Torento | zytham@gmail.com | 0x891497745729792e403da53024c6069a9158b8c4 |  980099766

(4 rows)

Other variations of SELECT Query:
  • List specific columns in result-set
  • List all rows whose primary key provided 
  • Pagination query using LIMIT 

Below query shows all above use case. When we execute IN query with non-primary key it throw error as "not supported".
cqlsh:OCProc> SELECT "username", "email" FROM "USERS" WHERE "city" = 'Patna' ;

 username | email
----------+------------------
   zytham | zytham@gmail.com

(1 rows)
cqlsh:OCProc> SELECT * FROM "USERS" WHERE "username" IN ('zytham', 'mishra');

 username | city    | email            | encrypted_password                         | phone
----------+---------+------------------+--------------------------------------------+------------
   zytham |   Patna | zytham@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   mishra | Torento | zytham@gmail.com | 0x891497745729792e403da53024c6069a9158b8c4 |  980099766

(2 rows)
cqlsh:OCProc> SELECT * FROM "USERS" WHERE "city" IN ('Patna', 'Torento');
code=2200 [Invalid query] message="IN predicates on non-primary-key columns (city) is not yet supported"
cqlsh:OCProc> SELECT * FROM "USERS" LIMIT 2;

 username | city      | email            | encrypted_password                         | phone
----------+-----------+------------------+--------------------------------------------+------------
   zytham |     Patna | zytham@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   ranjan | Bangalore | ranjan@gmail.com | 0x8914977ed729792e403da53024c6069a9158b8c4 |  678998800

(2 rows)

Delete columns data : Execute Delete Query and we can notice difference in terms of email value in first row.
cqlsh:OCProc> SELECT * FROM "USERS";

 username | city      | email            | encrypted_password                         | phone
----------+-----------+------------------+--------------------------------------------+------------
   zytham |     Patna | zytham@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   ranjan | Bangalore | ranjan@gmail.com | 0x8914977ed729792e403da53024c6069a9158b8c4 |  678998800
    Rahul |    Mumbai |  rahul@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 |       null
   mishra |   Torento | zytham@gmail.com | 0x891497745729792e403da53024c6069a9158b8c4 |  980099766
cqlsh:OCProc> DELETE "email" FROM "USERS" WHERE "username"= 'zytham';
cqlsh:OCProc> SELECT * FROM "USERS";

 username | city      | email            | encrypted_password                         | phone
----------+-----------+------------------+--------------------------------------------+------------
   zytham |     Patna |             null | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   ranjan | Bangalore | ranjan@gmail.com | 0x8914977ed729792e403da53024c6069a9158b8c4 |  678998800
    Rahul |    Mumbai |  rahul@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 |       null
   mishra |   Torento | zytham@gmail.com | 0x891497745729792e403da53024c6069a9158b8c4 |  980099766

(4 rows)

Note
: In Delete command we cannot use non-primary key. Below query executions fails because where clause using "City", non-primary key
cqlsh:OCProc> DELETE "email" FROM "USERS" WHERE "city"= 'Patna';
code=2200 [Invalid query] message="Non PRIMARY KEY city found in where clause"

Update table row:
cqlsh:OCProc> UPDATE "USERS" SET "city"='Delhi' WHERE "username" = 'zytham' ;
cqlsh:OCProc> SELECT * FROM "USERS";

 username | city      | email            | encrypted_password                         | phone
----------+-----------+------------------+--------------------------------------------+------------
   zytham |     Delhi |             null | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   ranjan | Bangalore | ranjan@gmail.com | 0x8914977ed729792e403da53024c6069a9158b8c4 |  678998800
    Rahul |    Mumbai |  rahul@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 |       null
   mishra |   Torento | zytham@gmail.com | 0x891497745729792e403da53024c6069a9158b8c4 |  980099766

(4 rows)

Note
: Update command also expect primary key in WHERE clause ,below command execution failed because of non-primary key column "city"
cqlsh:OCProc> UPDATE "USERS" SET "city"='Delhi' WHERE "city" = 'Patna' ;
code=2200 [Invalid query] message="Non PRIMARY KEY city found in where clause"


Note
:  Using Secondary index we can avoid primary key requirement with WHERE clause.
Create Index on USERS : Below command create an index based on column name city
cqlsh:OCProc> CREATE INDEX emp_city_idx ON "USERS" (city);

It is recommended by Datastax to avoid Secondary Index. Secondary indexes are tricky to use and can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance nightmare if multiple nodes are accessed.
==== ***** =====



Location: Bengaluru, Karnataka, India