Sep 20, 2018

Internals of Cassandra data storage: Visualise Cassandra node handshaking and how data is persisted on disk (Simulate flush and compaction using nodetool and sstabledump)

In previous post we Setup Multi node Cassandra Cluster on Google Compute Engine. In this post we analyse Cassandra startup logs and visualise how handshakes happens when new node joins cluster. Later we will see how data is persisted in file system after flushing, after deleting record we will look into how tombstone information is stored and how deleted record is maintained in Cassandra.

Cassandra node running status and Handshake visualisation 

On starting Cassandra on any node we are interested in three things.
  • Messaging service starting on which node (what is Ip address and interface) and what is Ip address of node where Cassandra started and what's final status (Highlighted in pink)
  • Which (starting) node does handshaking with which (already running) node (Highlighted in blue
  • Which all topology is updated on starting Cassandra on given node (Highlighted in Yellow)
Start Cassandra on Instance-1 and capture log: IP address of Instance-2 is 10.128.0.2
INFO  [main] 2018-09-20 16:15:57,812 StorageService.java:618 - Cassandra version: 3.11.3
INFO  [main] 2018-09-20 16:15:57,813 StorageService.java:619 - Thrift API version: 20.1.0
INFO  [main] 2018-09-20 16:15:57,813 StorageService.java:620 - CQL supported versions: 3.4.4 (default: 3.4.4)
INFO  [main] 2018-09-20 16:15:57,813 StorageService.java:622 - Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
INFO  [main] 2018-09-20 16:15:57,893 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 50 MB and a resize interval of 60 minutes
INFO  [main] 2018-09-20 16:15:57,914 MessagingService.java:761 - Starting Messaging Service on /10.128.0.2:7000 (eth0)
INFO  [main] 2018-09-20 16:16:03,012 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [main] 2018-09-20 16:16:29,114 StorageService.java:704 - Loading persisted ring state
INFO  [main] 2018-09-20 16:16:29,137 StorageService.java:822 - Starting up server gossip
INFO  [main] 2018-09-20 16:16:29,250 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [main] 2018-09-20 16:16:29,251 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [main] 2018-09-20 16:16:29,370 StorageService.java:1446 - JOINING: Finish joining ring
INFO  [main] 2018-09-20 16:16:29,426 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='standard1')
INFO  [main] 2018-09-20 16:16:29,432 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='counter1')
INFO  [main] 2018-09-20 16:16:29,439 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='stockdb', ColumnFamily='user')
INFO  [main] 2018-09-20 16:16:29,519 StorageService.java:2289 - Node /10.128.0.2 state jump to NORMAL

Observation
: From above log we can see Messaging service is started on /10.128.0.2:7000 (eth0) and IP address of node is /10.128.0.2 and after starting its state is Normal (last highlighted line of log)
Since it is very first node started in data centre so only topology of this node is updated.

Start Cassandra on Instance-2 and capture log
: IP address of Instance-2 is 10.128.0.3
INFO  [main] 2018-09-20 16:18:26,317 QueryProcessor.java:163 - Preloaded 1 prepared statements
INFO  [main] 2018-09-20 16:18:26,318 StorageService.java:618 - Cassandra version: 3.11.3
INFO  [main] 2018-09-20 16:18:26,318 StorageService.java:619 - Thrift API version: 20.1.0
INFO  [main] 2018-09-20 16:18:26,318 StorageService.java:620 - CQL supported versions: 3.4.4 (default: 3.4.4)
INFO  [main] 2018-09-20 16:18:26,319 StorageService.java:622 - Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
INFO  [main] 2018-09-20 16:18:26,403 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 50 MB and a resize interval of 60 minutes
INFO  [main] 2018-09-20 16:18:26,423 MessagingService.java:761 - Starting Messaging Service on /10.128.0.3:7000 (eth0)
INFO  [main] 2018-09-20 16:18:26,545 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [HANDSHAKE-/10.128.0.2] 2018-09-20 16:18:26,582 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.2
INFO  [main] 2018-09-20 16:18:27,582 StorageService.java:704 - Loading persisted ring state
INFO  [main] 2018-09-20 16:18:27,607 StorageService.java:822 - Starting up server gossip
INFO  [main] 2018-09-20 16:18:27,715 TokenMetadata.java:479 - Updating topology for /10.128.0.3
INFO  [main] 2018-09-20 16:18:27,716 TokenMetadata.java:479 - Updating topology for /10.128.0.3
INFO  [main] 2018-09-20 16:18:27,845 StorageService.java:1446 - JOINING: Finish joining ring
INFO  [main] 2018-09-20 16:18:27,896 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='standard1')
INFO  [main] 2018-09-20 16:18:27,900 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='counter1')
INFO  [main] 2018-09-20 16:18:27,901 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='stockdb', ColumnFamily='user')
INFO  [main] 2018-09-20 16:18:27,991 StorageService.java:2289 - Node /10.128.0.3 state jump to NORMAL
INFO  [main] 2018-09-20 16:18:28,033 AuthCache.java:172 - (Re)initializing CredentialsCache (validity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2018-09-20 16:18:28,043 Gossiper.java:1692 - Waiting for gossip to settle...
INFO  [GossipStage:1] 2018-09-20 16:18:28,731 Gossiper.java:1053 - Node /10.128.0.2 has restarted, now UP
INFO  [GossipStage:1] 2018-09-20 16:18:28,748 StorageService.java:2289 - Node /10.128.0.2 state jump to NORMAL
INFO  [GossipStage:1] 2018-09-20 16:18:28,775 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [GossipStage:1] 2018-09-20 16:18:28,776 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [RequestResponseStage-5] 2018-09-20 16:18:28,803 Gossiper.java:1019 - InetAddress /10.128.0.2 is now UP
INFO  [HANDSHAKE-/10.128.0.2] 2018-09-20 16:18:28,822 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.2

Observation: From above log we can see Messaging service is started on /10.128.0.3:7000 (eth0) and IP address of node is /10.128.0.3 and after starting its state is Normal .
Since Instance-1 is already running when Cassandra starts on this (10.128.0.3) node, it does handshake with already running node (10.128.0.2) and topology of both Instance-1 (10.128.0.2) and Instance-2 (10.128.0.3) is updated.

Start Cassandra on Instance-3 and capture log: IP address of Instance-3 is 10.128.0.4
INFO  [ScheduledTasks:1] 2018-09-20 17:12:45,013 TokenMetadata.java:498 - Updating topology for all endpoints that have changed
INFO  [main] 2018-09-20 17:12:45,213 StorageService.java:600 - Populating token metadata from system tables
INFO  [main] 2018-09-20 17:12:45,370 StorageService.java:607 - Token metadata: Normal Tokens:
/10.128.0.2:[......... ]
/10.128.0.3:[..........]
/10.128.0.4:[..........]
NFO  [main] 2018-09-20 17:12:46,849 MessagingService.java:761 - Starting Messaging Service on /10.128.0.4:7000 (eth0)
INFO  [main] 2018-09-20 17:12:46,960 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [HANDSHAKE-/10.128.0.2] 2018-09-20 17:12:47,004 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.2
INFO  [main] 2018-09-20 17:12:48,000 StorageService.java:704 - Loading persisted ring state
INFO  [main] 2018-09-20 17:12:48,026 StorageService.java:822 - Starting up server gossip
INFO  [main] 2018-09-20 17:12:48,125 TokenMetadata.java:479 - Updating topology for /10.128.0.4
INFO  [main] 2018-09-20 17:12:48,125 TokenMetadata.java:479 - Updating topology for /10.128.0.4
INFO  [main] 2018-09-20 17:12:48,228 StorageService.java:1446 - JOINING: Finish joining ring
INFO  [main] 2018-09-20 17:12:48,274 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='standard1')
INFO  [main] 2018-09-20 17:12:48,280 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='counter1')
INFO  [main] 2018-09-20 17:12:48,280 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='stockdb', ColumnFamily='user')
INFO  [main] 2018-09-20 17:12:48,361 StorageService.java:2289 - Node /10.128.0.4 state jump to NORMAL
INFO  [main] 2018-09-20 17:12:48,394 AuthCache.java:172 - (Re)initializing CredentialsCache (validity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2018-09-20 17:12:48,401 Gossiper.java:1692 - Waiting for gossip to settle...
INFO  [HANDSHAKE-/10.128.0.3] 2018-09-20 17:12:48,979 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.3
INFO  [HANDSHAKE-/10.128.0.3] 2018-09-20 17:12:48,993 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.3
INFO  [GossipStage:1] 2018-09-20 17:12:49,015 Gossiper.java:1053 - Node /10.128.0.2 has restarted, now UP
INFO  [GossipStage:1] 2018-09-20 17:12:49,030 StorageService.java:2289 - Node /10.128.0.2 state jump to NORMAL
INFO  [GossipStage:1] 2018-09-20 17:12:49,045 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [GossipStage:1] 2018-09-20 17:12:49,047 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [GossipStage:1] 2018-09-20 17:12:49,050 Gossiper.java:1053 - Node /10.128.0.3 has restarted, now UP
INFO  [GossipStage:1] 2018-09-20 17:12:49,063 StorageService.java:2289 - Node /10.128.0.3 state jump to NORMAL
INFO  [RequestResponseStage-3] 2018-09-20 17:12:49,073 Gossiper.java:1019 - InetAddress /10.128.0.3 is now UP
INFO  [RequestResponseStage-3] 2018-09-20 17:12:49,074 Gossiper.java:1019 - InetAddress /10.128.0.2 is now UP
INFO  [GossipStage:1] 2018-09-20 17:12:49,078 TokenMetadata.java:479 - Updating topology for /10.128.0.3
INFO  [GossipStage:1] 2018-09-20 17:12:49,079 TokenMetadata.java:479 - Updating topology for /10.128.0.3
INFO  [HANDSHAKE-/10.128.0.2] 2018-09-20 17:12:49,304 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.2

Observation: From above log we can see Messaging service is started on /10.128.0.4:7000 (eth0) and IP address of node is /10.128.0.4 and after starting its state is Normal .
Since Instance-1 & Instance-2 is already running when Cassandra starts on this(10.128.0.4) node, it does handshakes with already running node (10.128.0.2 and 10.128.0.3) and topology of all three Instance-1 (10.128.0.2), Instance-2 (10.128.0.3) and Instance-3 (10.128.0.4) is updated.
We have highlighted token range for all running nodes in green. When a new node joins Virtually tokens are assigned.

Start Cassandra on Instance-4 and capture log: IP address of Instance-4 is 10.128.0.5
INFO  [HANDSHAKE-/10.128.0.2] 2018-09-20 16:08:40,370 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.2
INFO  [GossipStage:1] 2018-09-20 16:08:40,373 Gossiper.java:1053 - Node /10.128.0.4 has restarted, now UP
INFO  [main] 2018-09-20 16:08:40,399 StorageService.java:1446 - JOINING: Finish joining ring
INFO  [GossipStage:1] 2018-09-20 16:08:40,402 StorageService.java:2289 - Node /10.128.0.4 state jump to NORMAL
INFO  [HANDSHAKE-/10.128.0.4] 2018-09-20 16:08:40,403 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.4
INFO  [GossipStage:1] 2018-09-20 16:08:40,444 TokenMetadata.java:479 - Updating topology for /10.128.0.4
INFO  [GossipStage:1] 2018-09-20 16:08:40,447 TokenMetadata.java:479 - Updating topology for /10.128.0.4
INFO  [GossipStage:1] 2018-09-20 16:08:40,447 Gossiper.java:1053 - Node /10.128.0.2 has restarted, now UP
INFO  [RequestResponseStage-3] 2018-09-20 16:08:40,468 Gossiper.java:1019 - InetAddress /10.128.0.4 is now UP
INFO  [GossipStage:1] 2018-09-20 16:08:40,472 StorageService.java:2289 - Node /10.128.0.2 state jump to NORMAL
INFO  [RequestResponseStage-2] 2018-09-20 16:08:40,483 Gossiper.java:1019 - InetAddress /10.128.0.2 is now UP
INFO  [GossipStage:1] 2018-09-20 16:08:40,485 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [GossipStage:1] 2018-09-20 16:08:40,486 TokenMetadata.java:479 - Updating topology for /10.128.0.2
INFO  [GossipStage:1] 2018-09-20 16:08:40,491 Gossiper.java:1053 - Node /10.128.0.3 has restarted, now UP
INFO  [GossipStage:1] 2018-09-20 16:08:40,518 StorageService.java:2289 - Node /10.128.0.3 state jump to NORMAL
INFO  [HANDSHAKE-/10.128.0.3] 2018-09-20 16:08:40,536 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.3
INFO  [main] 2018-09-20 16:08:40,540 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='standard1')
INFO  [GossipStage:1] 2018-09-20 16:08:40,537 TokenMetadata.java:479 - Updating topology for /10.128.0.3
INFO  [GossipStage:1] 2018-09-20 16:08:40,550 TokenMetadata.java:479 - Updating topology for /10.128.0.3
INFO  [main] 2018-09-20 16:08:40,551 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='keyspace1', ColumnFamily='counter1')
INFO  [main] 2018-09-20 16:08:40,556 SecondaryIndexManager.java:509 - Executing pre-join tasks for: CFS(Keyspace='stockdb', ColumnFamily='user')
INFO  [RequestResponseStage-1] 2018-09-20 16:08:40,599 Gossiper.java:1019 - InetAddress /10.128.0.3 is now UP
INFO  [main] 2018-09-20 16:08:40,643 StorageService.java:2289 - Node /10.128.0.5 state jump to NORMAL
INFO  [main] 2018-09-20 16:08:40,668 AuthCache.java:172 - (Re)initializing CredentialsCache (validity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2018-09-20 16:08:40,676 Gossiper.java:1692 - Waiting for gossip to settle...
INFO  [HANDSHAKE-/10.128.0.3] 2018-09-20 16:08:41,284 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.3
INFO  [HANDSHAKE-/10.128.0.4] 2018-09-20 16:08:41,297 OutboundTcpConnection.java:561 - Handshaking version with /10.128.0.4

Observation
: From above log we can see IP address of node is /10.128.0.4 and after starting its state is Normal .
Since Instance-1, Instance-2 & Instance-3 is already running when Cassandra starts on this(10.128.0.5) node, it does handshakes with already running node (10.128.0.2 , 10.128.0.3 and 10.128.0.4) and topology of all four Instance-1 (10.128.0.2), Instance-2 (10.128.0.3), Instance-3 (10.128.0.4) and Instance-3 (10.128.0.5) is updated.


Cassandra Data storage visualisation
 

 All four instance of Cassandra is up and running. Run below command and validate data centre(DC) configuration : two node in rack1 and 2 node in rack2.
nikhilranjan234@instance-1:~$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  10.128.0.2  38.99 MiB  256          ?       02b41029-cacc-47d8-91ca-44a579071529  r1
UN  10.128.0.3  44.05 MiB  256          ?       94b6296c-f1d2-4817-af32-8ae8e7ea07fc  r1
UN  10.128.0.4  61.11 MiB  256          ?       0ec021b0-0ae9-47fc-bd5b-894287d78a0b  r2
UN  10.128.0.5  85.07 MiB  256          ?       0828fce5-715c-4482-a909-e9e1fd40e26a  r2

4 instance of Cassandra is up and running([rack-1:  I1 & I2] [rack-2 : I3 & I4] )

Keyspace and Table(column family) creation : Run cqlsh utility on one of the terminal .
nikhilranjan234@instance-1:~$ cqlsh `hostname -I` -u cassandra -p cassandra
Connected to wm-cluster at 10.128.0.2:9042.
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh> 

Create Keyspace with replication factor 3
: We have four instance of Cassandra running so we can have 3 copy of data on different instances in given DC.
On keyspace creation success prompt does not gives any success message, describe keyspaces and see it has been created.
cassandra@cqlsh> CREATE KEYSPACE IF NOT EXISTS "OCProc" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1' : 3};
cassandra@cqlsh> describe keyspaces;
system_schema  system     "OCProc"            system_traces
system_auth    keyspace1  system_distributed  stockdb      

Select and Use a keyspace : Since column families (like table in RDMS) are created in context of keyspace so select keyspace we want to work with.
cassandra@cqlsh> use "OCProc" ;
cassandra@cqlsh:OCProc>

Create Table(Column Family) in selected Keyspace
:
cassandra@cqlsh:OCProc> CREATE TABLE user (
          ...            username text,
          ...            email text,
          ...            city text,
          ...            phone varint,
          ...            encrypted_password blob,
          ...            PRIMARY KEY (username, city)
          ...            )WITH comment = 'Creating USERS Tabel to store users details';
cassandra@cqlsh:OCProc> describe tables;
user

Insert data in user Table(Column Family) and select table to display data
:
cassandra@cqlsh:OCProc> INSERT INTO  user 
          ...             ( username ,  email ,  city ,  phone ,  encrypted_password )
          ...             VALUES (
          ...               'zytham',
          ...               'zytham@gmail.com',
          ...               'Patna',
          ...               9999888800,
          ...               0x9792977ed729792e403da53024c6069a9158b8c4
          ...             );
cassandra@cqlsh:OCProc> INSERT INTO  user 
          ...             ( username ,  email ,  city ,  phone ,  encrypted_password )
          ...             VALUES(
          ...               'ranjan',
          ...               'ranjan@gmail.com',
          ...               'Bangalore',
          ...                678998800,
          ...               0x8914977ed729792e403da53024c6069a9158b8c4
          ...             );
cassandra@cqlsh:OCProc> INSERT INTO  user 
          ...             ( username ,  email ,  city ,  phone ,  encrypted_password )
          ...             VALUES(
          ...               'mishra',
          ...               'zytham@gmail.com',
          ...               'Torento',
          ...                00980099766,
          ...               0x891497745729792e403da53024c6069a9158b8c4
          ...             );
cassandra@cqlsh:OCProc> SELECT * FROM user;

 username | city      | email            | encrypted_password                         | phone
----------+-----------+------------------+--------------------------------------------+------------
   zytham |     Patna | zytham@gmail.com | 0x9792977ed729792e403da53024c6069a9158b8c4 | 9999888800
   ranjan | Bangalore | ranjan@gmail.com | 0x8914977ed729792e403da53024c6069a9158b8c4 |  678998800
   mishra |   Torento | zytham@gmail.com | 0x891497745729792e403da53024c6069a9158b8c4 |  980099766

Replicated of data on Cassandra nodes: Which node stores data for username: zytham ?
Below command shows one copy is stored on instance-1(where we are running query) and 2 copy is stored on Instance-3 and Instance-4 (rack2)
nikhilranjan234@instance-1:~$ nodetool getendpoints OCProc user zytham
10.128.0.2
10.128.0.4
10.128.0.5
Run for some other user, it is not necessary that one copy is always stored on instance where query ie being run.

Data directory and its storage : Default data storage directory is /opt/apache-cassandra-3.11.3/data/
Data storage hierarchy : <KEYSPACE>/<TABLE_UNIQUE_ID>/<GENERATION_DATA>
nikhilranjan234@instance-1:/$ cd /opt/apache-cassandra-3.11.3/data/data
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data$ ls -l
total 32
drwxr-xr-x  4 nikhilranjan234 nikhilranjan234 4096 Sep 18 10:22 keyspace1
drwxr-xr-x  3 nikhilranjan234 nikhilranjan234 4096 Sep 20 18:07 OCProc
drwxr-xr-x  3 nikhilranjan234 nikhilranjan234 4096 Sep 18 07:03 stockdb
drwxr-xr-x 26 nikhilranjan234 nikhilranjan234 4096 Sep 18 06:36 system
drwxr-xr-x  6 nikhilranjan234 nikhilranjan234 4096 Sep 18 06:36 system_auth
drwxr-xr-x  5 nikhilranjan234 nikhilranjan234 4096 Sep 18 06:36 system_distributed
drwxr-xr-x 12 nikhilranjan234 nikhilranjan234 4096 Sep 18 06:36 system_schema
drwxr-xr-x  4 nikhilranjan234 nikhilranjan234 4096 Sep 18 06:36 system_traces

We have OCProc keyspace created as directory. Run below command and find persisted data.
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data$ cd OCProc/
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/OCProc$ cd user-f82692c0bcff11e8a9c80961902fe681/
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ ls
backups

Where is our data, no data is persisted yet : 
Answer: memtable has all our user data, it not yet persisted in disk (sstable)

How does data comes from memtable to disk  ?
Answer: When memtable flush data then it is stored on disk. Let use nodetool to forceful flush data and see what changes it brings to directory "user-f82692c0bcff11e8a9c80961902fe681".
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ ls
backups
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ nodetool flush
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ ls -l
total 40
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 20 18:07 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   43 Sep 20 18:44 mc-1-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   83 Sep 20 18:44 mc-1-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 20 18:44 mc-1-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 20 18:44 mc-1-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 20 18:44 mc-1-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4772 Sep 20 18:44 mc-1-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   62 Sep 20 18:44 mc-1-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 20 18:44 mc-1-big-TOC.txt

On flush memtable flushes data to disk(Intermediate stage is sstable). File "mc-1-big-Data.db" contains our data and along with flush also creates files for index and filter.

View User data in "mc-1-big-Data.db" :  Using sstabledump on instance-1 we can visualise user data in file "mc-1-big-Data.db".
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ sstabledump -d mc-1-big-Data.db
WARN  18:55:20,083 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this in cassandra.yaml
WARN  18:55:20,100 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.yaml
WARN  18:55:20,267 Only 7.951GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[zytham]@0 Row[info=[ts=1537467165740069] ]: Patna | [email=zytham@gmail.com ts=1537467165740069], [encrypted_password=9792977ed729792e403da53024c6069a9158b8c4 ts=1537467165740069], [p
hone=9999888800 ts=1537467165740069]
Instance-1 stores only one record with username = zytham. We can see that each cell is stored with timestamp and column value.

Where are other user records ? : Apply node flush on Instance-3 and display data stored in file first generation data file "mc-1-big-Data.db". Below we can see one copy of each user record is stored on Instance-3.
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ sstabledump -d mc-1-big-Data.db
WARN  18:55:50,970 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this in cassandra.yaml
WARN  18:55:50,977 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.yaml
WARN  18:55:51,146 Only 7.946GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[zytham]@0 Row[info=[ts=1537467165740069] ]: Patna | [email=zytham@gmail.com ts=1537467165740069], [encrypted_password=9792977ed729792e403da53024c6069a9158b8c4 ts=1537467165740069], [p
hone=9999888800 ts=1537467165740069]
[ranjan]@79 Row[info=[ts=1537467165763097] ]: Bangalore | [email=ranjan@gmail.com ts=1537467165763097], [encrypted_password=8914977ed729792e403da53024c6069a9158b8c4 ts=1537467165763097
], [phone=678998800 ts=1537467165763097]
[mishra]@163 Row[info=[ts=1537467168375384] ]: Torento | [email=zytham@gmail.com ts=1537467168375384], [encrypted_password=891497745729792e403da53024c6069a9158b8c4 ts=1537467168375384]
, [phone=980099766 ts=1537467168375384]

Display JOSN form of data visualisation
: Execute above command without -d switch.
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ sstabledump  mc-1-big-Data.db
[
  {
    "partition" : {
      "key" : [ "zytham" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 78,
        "clustering" : [ "Patna" ],
        "liveness_info" : { "tstamp" : "2018-09-20T18:12:45.740069Z" },
        "cells" : [
          { "name" : "email", "value" : "zytham@gmail.com" },
          { "name" : "encrypted_password", "value" : "0x9792977ed729792e403da53024c6069a9158b8c4" },
          { "name" : "phone", "value" : 9999888800 }
        ]
      }
    ]
  },
  {
    "partition" : {
      "key" : [ "ranjan" ],
      "position" : 79
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 162,
        "clustering" : [ "Bangalore" ],
        "liveness_info" : { "tstamp" : "2018-09-20T18:12:45.763097Z" },
        "cells" : [
          { "name" : "email", "value" : "ranjan@gmail.com" },
          { "name" : "encrypted_password", "value" : "0x8914977ed729792e403da53024c6069a9158b8c4" },
          { "name" : "phone", "value" : 678998800 }
                  ]
      }
    ]
  },
  {
    "partition" : {
      "key" : [ "mishra" ],
      "position" : 163
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 245,
        "clustering" : [ "Torento" ],
        "liveness_info" : { "tstamp" : "2018-09-20T18:12:48.375384Z" },
        "cells" : [
          { "name" : "email", "value" : "zytham@gmail.com" },
          { "name" : "encrypted_password", "value" : "0x891497745729792e403da53024c6069a9158b8c4" },
          { "name" : "phone", "value" : 980099766 }
        ]
      }
    ]
  }
]

Delete record from user table: delete row form user where username = 'mishra' and city='Torento';
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ cqlsh `hostname -I` -u cassandra -p 
cassandra
Connected to wm-cluster at 10.128.0.4:9042.
[cqlsh 5.0.1 | Cassandra 3.11.3 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh> 
cassandra@cqlsh> 
cassandra@cqlsh> 
cassandra@cqlsh> 
cassandra@cqlsh> describe keyspaces
system_schema  system     "OCProc"            system_traces
system_auth    keyspace1  system_distributed  stockdb      
cassandra@cqlsh> use "OCProc";
cassandra@cqlsh:OCProc> delete from user where username = 'mishra' and city='Torento';
cassandra@cqlsh:OCProc> exit

See data file again, no change. Where did chnage happend till now ? :
Answer: memtable.
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ sstabledump -d mc-1-big-Data.db 
WARN  19:13:18,553 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this
 in cassandra.yaml
WARN  19:13:18,559 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.
yaml
WARN  19:13:18,729 Only 7.945GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[zytham]@0 Row[info=[ts=1537467165740069] ]: Patna | [email=zytham@gmail.com ts=1537467165740069], [encrypted_password=9792977ed729792e403da53024c6069a9158b8c4 ts=153
7467165740069], [phone=9999888800 ts=1537467165740069]
[ranjan]@79 Row[info=[ts=1537467165763097] ]: Bangalore | [email=ranjan@gmail.com ts=1537467165763097], [encrypted_password=8914977ed729792e403da53024c6069a9158b8c4 t
s=1537467165763097], [phone=678998800 ts=1537467165763097]
[mishra]@163 Row[info=[ts=1537467168375384] ]: Torento | [email=zytham@gmail.com ts=1537467168375384], [encrypted_password=891497745729792e403da53024c6069a9158b8c4 ts
=1537467168375384], [phone=980099766 ts=1537467168375384]

Flush using nodetool : New generation files are persisted along with old one, old files are not deleted.
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ nodetool flush
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ ls -l
total 76
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 20 18:07 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   43 Sep 20 18:54 mc-1-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234  173 Sep 20 18:54 mc-1-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 20 18:54 mc-1-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 20 18:54 mc-1-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   31 Sep 20 18:54 mc-1-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4786 Sep 20 18:54 mc-1-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   62 Sep 20 18:54 mc-1-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 20 18:54 mc-1-big-TOC.txt
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   43 Sep 20 19:15 mc-2-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   42 Sep 20 19:15 mc-2-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 20 19:15 mc-2-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 20 19:15 mc-2-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 20 19:15 mc-2-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4637 Sep 20 19:15 mc-2-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   62 Sep 20 19:15 mc-2-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 20 19:15 mc-2-big-TOC.txt

When old (old generation) files be deleted ? :
Answer: During comapction

Apply compaction: Delete old generation file and merged old & new generation files.
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ nodetool compact
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ ls -l
total 40
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 20 18:07 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 20 19:17 mc-3-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234  171 Sep 20 19:17 mc-3-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 20 19:17 mc-3-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 20 19:17 mc-3-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   31 Sep 20 19:17 mc-3-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4805 Sep 20 19:17 mc-3-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   62 Sep 20 19:17 mc-3-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 20 19:17 mc-3-big-TOC.txt

Deleted and compacted: Now visualise merged data file. It has not been removed, it has been marked as deleted (Why?) : If some failed node joins cluster after recovery and has old data, then this marked deleted help to update that node and remove stalled data which has been deleted already.
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/data/data/OCProc/user-f82692c0bcff11e8a9c80961902fe681$ sstabledump -d mc-3-big-Data.db 
WARN  19:26:45,958 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this
 in cassandra.yaml
WARN  19:26:45,968 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.
yaml
WARN  19:26:46,151 Only 7.966GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[zytham]@0 Row[info=[ts=1537467165740069] ]: Patna | [email=zytham@gmail.com ts=1537467165740069], [encrypted_password=9792977ed729792e403da53024c6069a9158b8c4 ts=153
7467165740069], [phone=9999888800 ts=1537467165740069]
[ranjan]@79 Row[info=[ts=1537467165763097] ]: Bangalore | [email=ranjan@gmail.com ts=1537467165763097], [encrypted_password=8914977ed729792e403da53024c6069a9158b8c4 t
s=1537467165763097], [phone=678998800 ts=1537467165763097]
[mishra]@163 Row[info=[ts=-9223372036854775808] del=deletedAt=1537470698672680, localDeletion=1537470698 ]: Torento | 

Sep 19, 2018

Textual description of firstImageUrl

Cassandra cluster on Google Compute Engine (Infrastructure as a service) : Setup Multi node Cassandra Cluster

Google Compute Engine delivers virtual machines running in Google's innovative data centres and worldwide fiber network. Compute Engine's tooling and workflow support enable scaling from single instances to global, load-balanced cloud computing.
In this post we will use Google Compute Engine and form 4 node cluster with 2 node in rack1 and 2 node in rack2 and visualise how data is stored in multiple nodes with record insert with replication factor.
  1. Go to Google console and setup an account (Credit card required, Its worth to spend Rs.52.00 for worth $300 credit and most importantly for one year)
  2. Click on navigation menu -> Under compute section -> Compute Engine -> VM Instances
  3. Click on create instance and provide a valid name, say instance-1. Select region Iowa (cost depends on region, try some other and see cost variation)
  4. Click on Boot disk and change Boot disk type = SSD persistent disk. 
  5. Finally click on create button and create an instance. It takes reasonable amount of time.
    Below diagram summarise above series of events.
Google Cloud Platform - VN Instance setup
Permission to user for downloading and installing software : I will use /opt directory so make owner to logged in user.
  1. Click on SSH -> Open in Browser Window. It opens a terminal with logged in user(User used to create GCE account). 
  2. Execute following command and change user name as per your account. Replace <nikhilranjan234> as your user showing in prompt. 
    nikhilranjan234@instance-1:~$ sudo chown nikhilranjan234:nikhilranjan234 -R /opt
Download and Setup required softwares: Create a directory software under /opt and download JDK and Cassandra tar. Execute following command and setup JDK8 and Cassandra3.x in /opt directory.99999999999
  • nikhilranjan234@instance-1:/opt$ mkdir softwares
    nikhilranjan234@instance-1:/opt$ cd softwares/
    nikhilranjan234@instance-1:/opt/softwares$ wget http://mirrors.fibergrid.in/apache/cassandra/3.11.3/apache-cassandr
    a-3.11.3-bin.tar.gz
    nikhilranjan234@instance-1:/opt/softwares$ wget http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz?AuthParam=1537246053_99cd7d915aac56ea51cecab8a761d8c4
    nikhilranjan234@instance-1:/opt/softwares$ mv jdk-8u181-linux-x64.tar.gz\?AuthParam\=1537246053_99cd7d915aac56ea51c
    ecab8a761d8c4 jdk8.tar.gz
    nikhilranjan234@instance-1:/opt/softwares$ cd ..
    nikhilranjan234@instance-1:/opt$ tar -zxf ./softwares/apache-cassandra-3.11.3-bin.tar.gz
    nikhilranjan234@instance-1:/opt$ tar -zxf ./softwares/jdk8.tar.gz
    
Set CASSANDRA_HOME and  JAVA_HOME in .profile file:
  1. Open file in any editor.
    nikhilranjan234@instance-1:/opt$ vi ~/.profile
    
  2. Add following lines in this file.
    export JAVA_HOME=/opt/jdk1.8.0_181 
    export CASSANDRA_HOME=/opt/apache-cassandra-3.11.3 
    PATH="$JAVA_HOME/bin:$CASSANDRA_HOME/bin:$CASSANDRA_HOME/tools/bin:$PATH"
    
  3. Source .profile file.
    nikhilranjan234@instance-1:/opt$ source ~/.profile
    
  4. Execute following command and validate HOME directory has been set correctly
    nikhilranjan234@instance-1:/opt$ java -version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
    nikhilranjan234@instance-1:/opt$ echo $CASSANDRA_HOME/
    /opt/apache-cassandra-3.11.3/
Configuration file changes: cassandra.yaml and cassandra-rackdc.properties
  1. Go to conf directory under $CASSANDRA_HOME and open cassandra.yaml. Change following configs. Below MOD_IP_ADDRESS is a place holder which will be updated later based on instance IP address. 
  2. cluster_name: 'wm-cluster'
    authenticator: PasswordAuthenticator
    authorizer: CassandraAuthorizer
    seeds: "127.0.0.1"
    endpoint_snitch : GossipingPropertyFileSnitch
    listen_address  : MOD_IP_ADDRESS
    
  3. Open cassandra-rackdc.properties file and change following property.Here MOD_RACK is a place holder which will be updated later based on instance.
    rack=MOD_RACK
  4. From VM Instance dashboard find IP address and execute following command on in conf directory of instance-1 terminal.
    nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.2=g' cassandra.yaml
    nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_RACK=r1=g' cassandra-rackdc.properties
    
Start Cassandra on instance-1: Execute following command to start Cassandra on instance-1.
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ cassandra
...
...
INFO  [main] 2018-09-19 13:49:24,
309 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [main] 2018-09-19 13:49:50,394 StorageService.java:550 - Unable to gossip with any peers 
but continuing anyway since node is in its own seed list
.....
......
INFO  [main] 2018-09-19 13:49:50,704 StorageService.java:1446 - JOINING: Finish joining ring
INFO  [main] 2018-09-19 13:49:50,750 SecondaryIndexManager.java:509 - Executing pre-join tasks 
for: CFS(Keyspace='keyspace1', ColumnFamily='standard1')
INFO  [main] 2018-09-19 13:49:50,760 SecondaryIndexManager.java:509 - Executing pre-join tasks 
for: CFS(Keyspace='keyspace1', ColumnFamily='counter1')
INFO  [main] 2018-09-19 13:49:50,762 SecondaryIndexManager.java:509 - Executing pre-join tasks 
for: CFS(Keyspace='stockdb', ColumnFamily='user')
INFO  [main] 2018-09-19 13:49:50,848 StorageService.java:2289 - Node /10.128.0.2 state jump to 
NORMAL
INFO  [main] 2018-09-19 13:49:50,893 AuthCache.java:172 - (Re)initializing CredentialsCache (va
lidity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2018-09-19 13:49:50,901 Gossiper.java:1692 - Waiting for gossip to settle...
INFO  [main] 2018-09-19 13:49:58,908 Gossiper.java:1723 - No gossip backlog; proceeding

Run following command to validate status of Cassandra in given Data Centre.
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.128.0.2  103.64 KiB  256          100.0%            a4425b47-0b19-4ede-aec4-4a78fce503cf  r1

Cassandra is running on instance-1. Now we will use Instance-1 config to create Image which will be used to spin three other instance.

Create Image using disc Instance-1: Create on Navigation menu button on top left corner and go to Images section.
  1. Click on Create Image and provide a suitable name say debian-cassandra.
  2. Select Source disk as Instance-1 and create Image. 

Create Instance-2 : Create Instance-2 using image created above. Change Boot disk and got to tab Custom Images & select debian-cassandra.
Update configuration file for Instance-2: Start terminal for Instance-2 and execute following command to change listen_address and rack info. Ip address of instance is visible on instances dashboard.
nikhilranjan234@instance-2:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.3=g' cassandra.yaml
nikhilranjan234@instance-2:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_RACK=r1=g' cassandra-rackdc.properties

Note: We have added 2 instances(10.128.0.2, 10.128.0.3) in rack1 (r1). Now we can start cassandra on instance-2 and validate that it joined data centre.

Similarly create instance-3 and instance-4 from image created and update config files with following command. Run following command from conf directory.

Configuration change for Instance-3:
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.4=g' cassandra.yaml
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_RACK=r2=g' cassandra-rackdc.properties

Configuration change for Instance-4
:
nikhilranjan234@instance-4:/opt/apache-cassandra-4.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.5=g' cassandra.yaml
nikhilranjan234@instance-4:/opt/apache-cassandra-4.11.3/conf$ sed -i 's=MOD_RACK=r2=g' cassandra-rackdc.properties

Check data centre status : Validate four instances are up and running in two rack: r1 and r2.
nikhilranjan234@instance-1:~$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  10.128.0.2  64.92 MiB  256          ?       02b41029-cacc-47d8-91ca-44a579071529  r1
UN  10.128.0.3  44.11 MiB  256          ?       94b6296c-f1d2-4817-af32-8ae8e7ea07fc  r1
UN  10.128.0.4  60.95 MiB  256          ?       0ec021b0-0ae9-47fc-bd5b-894287d78a0b  r2
UN  10.128.0.5  84.92 MiB  256          ?       0828fce5-715c-4482-a909-e9e1fd40e26a  r2

Cassandra uses the system_auth and dse_security keyspaces for storing security authentication and authorization information.Execute following command and Set the replication factor.
cassandra@cqlsh> ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1':3};

Create KEYSPACE and insert  records :
cassandra@cqlsh> CREATE KEYSPACE IF NOT EXISTS stockdb WITH replication = {'class':'NetworkTopologyStrategy', 'dc1' : 3};
Create Table user in KeySpace - stockdb
CREATE TABLE stockdb.user (
  user_id VARCHAR,
  location VARCHAR,
  display_name VARCHAR,
  first_name VARCHAR,
  last_name VARCHAR,
  PRIMARY KEY (user_id, location)
);
Insert Records in user table:
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u10','earth','Nemo','Nirmallya','Mukherjee');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth1','Kirk','William','Shatner');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u2','vulcan','Spock', 'Leonard', 'Nimoy');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth4','Scotty','James','Doohan');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth2','Bones', 'Leonard', 'McCoy');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u3','klingon','Worf','Michael','Dorn');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth5','Sulu','George','Takei');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth3','Uhura','Nichelle','Nichols');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u4','romulus','Alidar Jarok','James','Sloyan');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth7','Khan Noonien Singh','Ricardo','Montalban');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth6','Chekov','Walter','Koenig');

Flushing data using nodetool
: Flushes all memtables from the node to SSTables on disk.
nikhilranjan234@instance-1:~$ nodetool flush

Replication Node(Using nodetool):
: Find which nodes stores records with user_id = u1 and location = earth1
nikhilranjan234@instance-1:~$ nodetool getendpoints stockdb user u1
10.128.0.4
10.128.0.2
10.128.0.3
nikhilranjan234@instance-1:~$ nodetool getendpoints stockdb user earth1
10.128.0.2
10.128.0.5
10.128.0.3

Where data is stored in file  and display files where data is stored : 
/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ ls -l
total 40
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 18 07:03 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 18 07:17 mc-1-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234  398 Sep 18 07:17 mc-1-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234    8 Sep 18 07:17 mc-1-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 18 07:17 mc-1-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   26 Sep 18 07:17 mc-1-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4802 Sep 18 07:17 mc-1-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 18 07:17 mc-1-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 18 07:17 mc-1-big-TOC.txt

Display data stored in Data file(Using sstabledump): 

nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ sstabledump -d mc-1-big-Data.db 
WARN  17:10:29,893 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this in cassandra.yaml
WARN  17:10:29,908 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.yaml
WARN  17:10:30,108 Only 7.935GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[u3]@0 Row[info=[ts=1537254243305484] ]: klingon | [display_name=Worf ts=1537254243305484], [first_name=Michael ts=1537254243305484], [last_name=Dorn ts=1537254243305484]
[u4]@53 Row[info=[ts=1537254243333161] ]: romulus | [display_name=Alidar Jarok ts=1537254243333161], [first_name=James ts=1537254243333161], [last_name=Sloyan ts=1537254243333161]
[u1]@114 Row[info=[ts=1537254243277414] ]: earth1 | [display_name=Kirk ts=1537254243277414], [first_name=William ts=1537254243277414], [last_name=Shatner ts=1537254243277414]
[u1]@168 Row[info=[ts=1537254243298223] ]: earth2 | [display_name=Bones ts=1537254243298223], [first_name=Leonard ts=1537254243298223], [last_name=McCoy ts=1537254243298223]
[u1]@205 Row[info=[ts=1537254243320949] ]: earth3 | [display_name=Uhura ts=1537254243320949], [first_name=Nichelle ts=1537254243320949], [last_name=Nichols ts=1537254243320949]
[u1]@245 Row[info=[ts=1537254243291221] ]: earth4 | [display_name=Scotty ts=1537254243291221], [first_name=James ts=1537254243291221], [last_name=Doohan ts=1537254243291221]
[u1]@282 Row[info=[ts=-9223372036854775808] del=deletedAt=1537254836904633, localDeletion=1537254836 ]: earth5 | 
[u1]@300 Row[info=[ts=1537254244762854] ]: earth6 | [display_name=Chekov ts=1537254244762854], [first_name=Walter ts=1537254244762854], [last_name=Koenig ts=1537254244762854]
[u1]@338 Row[info=[ts=1537254243344904] ]: earth7 | [display_name=Khan Noonien Singh ts=1537254243344904], [first_name=Ricardo ts=1537254243344904], [last_name=Montalban ts=15372542433
44904]
[u10]@393 Row[info=[ts=1537254243259442] ]: earth | [display_name=Nemo ts=1537254243259442], [first_name=Nirmallya ts=1537254243259442], [last_name=Mukherjee ts=1537254243259442]

Delete and compaction: Delete a record from table
cassandra@cqlsh> delete from stockdb.user where user_id = 'u1' and location='earth2';

After deletion, deleted record is with tombstone tag. Using compaction (nodetool compact) record is deleted and same can be verified below. After compaction new set of files created and old one is deleted.
Flush and compact
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ nodetool flush
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ nodetool compact
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ ls -l
total 40
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 18 07:03 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 19 17:39 mc-2-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234  402 Sep 19 17:39 mc-2-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 19 17:39 mc-2-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 19 17:39 mc-2-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   26 Sep 19 17:39 mc-2-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4841 Sep 19 17:39 mc-2-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 19 17:39 mc-2-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 19 17:39 mc-2-big-TOC.txt
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ sstabledump -d ./mc-2-big-Data.db 
WARN  17:39:48,857 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this in cassandra.yaml
WARN  17:39:48,866 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.yaml
WARN  17:39:49,045 Only 7.960GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[u3]@0 Row[info=[ts=1537254243305484] ]: klingon | [display_name=Worf ts=1537254243305484], [first_name=Michael ts=1537254243305484], [last_name=Dorn ts=1537254243305484]
[u4]@53 Row[info=[ts=1537254243333161] ]: romulus | [display_name=Alidar Jarok ts=1537254243333161], [first_name=James ts=1537254243333161], [last_name=Sloyan ts=1537254243333161]
[u1]@114 Row[info=[ts=-9223372036854775808] del=deletedAt=1537377916253721, localDeletion=1537377916 ]: ear
 | 
[u1]@149 Row[info=[ts=1537254243277414] ]: earth1 | [display_name=Kirk ts=1537254243277414], [first_name=William ts=1537254243277414], [last_name=Shatner ts=1537254243277414]
[u1]@187 Row[info=[ts=-9223372036854775808] del=deletedAt=1537377956120236, localDeletion=1537377956 ]: earth2 | 
[u1]@208 Row[info=[ts=1537254243320949] ]: earth3 | [display_name=Uhura ts=1537254243320949], [first_name=Nichelle ts=1537254243320949], [last_name=Nichols ts=1537254243320949]
[u1]@248 Row[info=[ts=1537254243291221] ]: earth4 | [display_name=Scotty ts=1537254243291221], [first_name=James ts=1537254243291221], [last_name=Doohan ts=1537254243291221]
[u1]@285 Row[info=[ts=-9223372036854775808] del=deletedAt=1537254836904633, localDeletion=1537254836 ]: earth5 | 
[u1]@303 Row[info=[ts=1537254244762854] ]: earth6 | [display_name=Chekov ts=1537254244762854], [first_name=Walter ts=1537254244762854], [last_name=Koenig ts=1537254244762854]
[u1]@341 Row[info=[ts=1537254243344904] ]: earth7 | [display_name=Khan Noonien Singh ts=1537254243344904], [first_name=Ricardo ts=1537254243344904], [last_name=Montalban ts=15372542433
44904]
[u10]@396 Row[info=[ts=1537254243259442] ]: earth | [display_name=Nemo ts=1537254243259442], [first_name=Nirmallya ts=1537254243259442], [last_name=Mukherjee ts=1537254243259442]

Display JSON formatted data
:
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ sstabledump ./mc-2-big-Data.db 

Sep 10, 2018

Elasticsearch Bulk API and Aggregation DSL (Elasticsearch and Kibana Dev tool)

In previous posts(1, 2, 3) we covered how to Index and delete document. Imagine we have huge archived data and need to be brought to elasticsearch, indexing document one by one is not viable and efficient solution. Here Elasticsearch Bulk API comes for rescue, the bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.
In order to demonstrate bulk API download data file. This file has been generated using http://www.json-generator.com/ and this post explains how?

Index documents using Bulk API: Using "_bulk" api(end point) with put command we can index multiple documents efficiently. I have copied one row from file for reference, add all rows and execute below command.
PUT /employees/memmber/_bulk
{"index" : {}}
{"name" :"Hunt Morrow","age":47,"gender":"male","email":"huntmorrow@concility.com","phone":"+91 (996) 574-3591","address":"750 Lott Avenue","city":"Martinsville","state":"Maine, 6952"}

Pagination of response documents
: Using from and size we can fetch specific set of documents in GET response.
With size parameter it returns only specified number of documents. In response it will affect hits array.
GET /employees/memmber/_search
{
  "size" : 5,
  "query": {
    "match_all": {}
  }
}
Use from parameter which indicate starting position. "from" is like offset.
GET /employees/memmber/_search
{
  "from" : 0,
  "size" : 5,
  "query": {
    "match_all": {}
  }
}

Sorting response documents : The sort is defined on a per field level, with special field name for _score to sort by score, and _doc to sort by index order.The order option can have the following values: asc (Sort in ascending order) and desc (Sort in descending order)
GET /employees/memmber/_search
{
  "from" : 0,
  "size" : 5,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}
Note: By default Fielddata is disabled on text fields, Try sort on text field like name in above query we will get illegal_argument_exception with reason "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.". For more details regarding sort clause refer this.

Aggregation DSL(Search aggregation)

Count response documents: Using "_count" endpoint retrieve documents count matching query criteria.
GET /employees/memmber/_count
{
  "query": {
    "match": {
      "gender": "male"
    }
  }
}
Show response

Aggregation based on text field  
: Aggregation are good with value not filed type. We have to use keyword version of field because gender is of text type. Keyword fields are only searchable by their exact value. From Elasticserch 5, the string field has split into two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search.
Below aggregation command uses gender.keyword indicating - it wants to match exact value of gender (male and Female). In response we have aggregation -> buckets[] with two node male and female.
GET /employees/memmber/_search
{
  "aggs": {
    "genders": {
      "terms": {"field":"gender.keyword"}
    }
  }
}
Show response

Aggregation(Average) based on numeric field
: Average age of male and female group computed. In this case age is not text so we do not have to use keyword version of field. In response we have aggregation -> buckets[] we have agv_age for male and female. Aggregation query consist of two parts : "genders" creates buckets and "avg_price" creates metric. From response we can see that buckets array created based on key = "male/female" along with metric data.
GET /employees/memmber/_search
{
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}
Show response

Find min and max(aggregation using "min" and "max") 
:
GET /employees/memmber/_search
{
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_price" : {
          "max": {
            "field": "age"
          }
        },
        "min_price" : {
          "min": {
            "field": "age"
          }
        }
      }
    }
  }
}
Show response

Aggregation with query
: Aggregation is applied in context of query condition.
GET /employees/memmber/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lte": 60
      }
    }
  }, 
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_price" : {
          "max": {
            "field": "age"
          }
        },
        "min_price" : {
          "min": {
            "field": "age"
          }
        }
      }
    }
  }
}
Show response

Aggregation without retrieving documents
: Using size = 0, we can restrict to return zero document and do aggregation as usual. In response size of hits array is zero.
GET /employees/memmber/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lte": 60
      }
    }
  }, 
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_price" : {
          "max": {
            "field": "age"
          }
        },
        "min_price" : {
          "min": {
            "field": "age"
          }
        }
      }
    }
  }
}
Show response

Aggregation using "stats"
: Elasticsearch provides a construct which gives complete statistics like min,max,avg, count,sum,etc using "stats" switch. Aggregation query consist of two parts : "genders" creates buckets and "stas_on_age" creates metric. From response we can see that buckets array created based on key = "male/female" along with metric data min, max, etc.
GET /employees/memmber/_search
{
  "size": 0, 
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "stats_on_age": {
          "stats": {
            "field": "age"
          }
        }
      }
    }
  }
}
Show response

Aggregation date range query
 : Download this card_sell_json file and index all doc using following command.
PUT /sell/vehicle/_bulk
{"index" : {}}
{"company":"AUDI","price": 1508.81,"color":"blue","registered":"2016-07-27","model":"V4","condition":"Used","customer":{"name":"Juliet Reeves","email":"julietreeves@bunga.com","phone":"+1 (900) 479-2096","address":"677 Dunne Court, Buxton, Virgin Islands,3005","age":25}}

Get all cars between date range and display document counts.
GET /sell/vehicle/_search
{
  "size": 0, 
  "aggs": {
    "popular_cars": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "sold_date_range": {
          "range": {
            "field": "registered",
            "ranges": [
              {
                "from": "2014-04-21",
                "to": "2016-02-01"
              },
              {
                "from": "2017-11-22",
                "to": "2018-09-01"
              }
            ]
          }
        }
      }
    }
  }
}
Show response

Nested aggregation
: Get all cars between date range along with car details and document counts
GET /sell/vehicle/_search
{
  "size": 0, 
  "aggs": {
    "popular_cars": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "sold_date_range": {
          "range": {
            "field": "registered",
            "ranges": [
              {
                "from": "2014-04-21",
                "to": "2016-02-01"
              },
              {
                "from": "2017-11-22",
                "to": "2018-09-01"
              }
            ]
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            },
            "make" :{
              "terms": {
                "field": "company.keyword"
              }
            }
          }
        }
      }
    }
  }
}
Show response