Sep 19, 2018

Textual description of firstImageUrl

Cassandra cluster on Google Compute Engine (Infrastructure as a service) : Setup Multi node Cassandra Cluster

Google Compute Engine delivers virtual machines running in Google's innovative data centres and worldwide fiber network. Compute Engine's tooling and workflow support enable scaling from single instances to global, load-balanced cloud computing.
In this post we will use Google Compute Engine and form 4 node cluster with 2 node in rack1 and 2 node in rack2 and visualise how data is stored in multiple nodes with record insert with replication factor.
  1. Go to Google console and setup an account (Credit card required, Its worth to spend Rs.52.00 for worth $300 credit and most importantly for one year)
  2. Click on navigation menu -> Under compute section -> Compute Engine -> VM Instances
  3. Click on create instance and provide a valid name, say instance-1. Select region Iowa (cost depends on region, try some other and see cost variation)
  4. Click on Boot disk and change Boot disk type = SSD persistent disk. 
  5. Finally click on create button and create an instance. It takes reasonable amount of time.
    Below diagram summarise above series of events.
Google Cloud Platform - VN Instance setup
Permission to user for downloading and installing software : I will use /opt directory so make owner to logged in user.
  1. Click on SSH -> Open in Browser Window. It opens a terminal with logged in user(User used to create GCE account). 
  2. Execute following command and change user name as per your account. Replace <nikhilranjan234> as your user showing in prompt. 
    nikhilranjan234@instance-1:~$ sudo chown nikhilranjan234:nikhilranjan234 -R /opt
Download and Setup required softwares: Create a directory software under /opt and download JDK and Cassandra tar. Execute following command and setup JDK8 and Cassandra3.x in /opt directory.99999999999
  • nikhilranjan234@instance-1:/opt$ mkdir softwares
    nikhilranjan234@instance-1:/opt$ cd softwares/
    nikhilranjan234@instance-1:/opt/softwares$ wget http://mirrors.fibergrid.in/apache/cassandra/3.11.3/apache-cassandr
    a-3.11.3-bin.tar.gz
    nikhilranjan234@instance-1:/opt/softwares$ wget http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz?AuthParam=1537246053_99cd7d915aac56ea51cecab8a761d8c4
    nikhilranjan234@instance-1:/opt/softwares$ mv jdk-8u181-linux-x64.tar.gz\?AuthParam\=1537246053_99cd7d915aac56ea51c
    ecab8a761d8c4 jdk8.tar.gz
    nikhilranjan234@instance-1:/opt/softwares$ cd ..
    nikhilranjan234@instance-1:/opt$ tar -zxf ./softwares/apache-cassandra-3.11.3-bin.tar.gz
    nikhilranjan234@instance-1:/opt$ tar -zxf ./softwares/jdk8.tar.gz
    
Set CASSANDRA_HOME and  JAVA_HOME in .profile file:
  1. Open file in any editor.
    nikhilranjan234@instance-1:/opt$ vi ~/.profile
    
  2. Add following lines in this file.
    export JAVA_HOME=/opt/jdk1.8.0_181 
    export CASSANDRA_HOME=/opt/apache-cassandra-3.11.3 
    PATH="$JAVA_HOME/bin:$CASSANDRA_HOME/bin:$CASSANDRA_HOME/tools/bin:$PATH"
    
  3. Source .profile file.
    nikhilranjan234@instance-1:/opt$ source ~/.profile
    
  4. Execute following command and validate HOME directory has been set correctly
    nikhilranjan234@instance-1:/opt$ java -version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
    nikhilranjan234@instance-1:/opt$ echo $CASSANDRA_HOME/
    /opt/apache-cassandra-3.11.3/
Configuration file changes: cassandra.yaml and cassandra-rackdc.properties
  1. Go to conf directory under $CASSANDRA_HOME and open cassandra.yaml. Change following configs. Below MOD_IP_ADDRESS is a place holder which will be updated later based on instance IP address. 
  2. cluster_name: 'wm-cluster'
    authenticator: PasswordAuthenticator
    authorizer: CassandraAuthorizer
    seeds: "127.0.0.1"
    endpoint_snitch : GossipingPropertyFileSnitch
    listen_address  : MOD_IP_ADDRESS
    
  3. Open cassandra-rackdc.properties file and change following property.Here MOD_RACK is a place holder which will be updated later based on instance.
    rack=MOD_RACK
  4. From VM Instance dashboard find IP address and execute following command on in conf directory of instance-1 terminal.
    nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.2=g' cassandra.yaml
    nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_RACK=r1=g' cassandra-rackdc.properties
    
Start Cassandra on instance-1: Execute following command to start Cassandra on instance-1.
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ cassandra
...
...
INFO  [main] 2018-09-19 13:49:24,
309 OutboundTcpConnection.java:108 - OutboundTcpConnection using coalescing strategy DISABLED
INFO  [main] 2018-09-19 13:49:50,394 StorageService.java:550 - Unable to gossip with any peers 
but continuing anyway since node is in its own seed list
.....
......
INFO  [main] 2018-09-19 13:49:50,704 StorageService.java:1446 - JOINING: Finish joining ring
INFO  [main] 2018-09-19 13:49:50,750 SecondaryIndexManager.java:509 - Executing pre-join tasks 
for: CFS(Keyspace='keyspace1', ColumnFamily='standard1')
INFO  [main] 2018-09-19 13:49:50,760 SecondaryIndexManager.java:509 - Executing pre-join tasks 
for: CFS(Keyspace='keyspace1', ColumnFamily='counter1')
INFO  [main] 2018-09-19 13:49:50,762 SecondaryIndexManager.java:509 - Executing pre-join tasks 
for: CFS(Keyspace='stockdb', ColumnFamily='user')
INFO  [main] 2018-09-19 13:49:50,848 StorageService.java:2289 - Node /10.128.0.2 state jump to 
NORMAL
INFO  [main] 2018-09-19 13:49:50,893 AuthCache.java:172 - (Re)initializing CredentialsCache (va
lidity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2018-09-19 13:49:50,901 Gossiper.java:1692 - Waiting for gossip to settle...
INFO  [main] 2018-09-19 13:49:58,908 Gossiper.java:1723 - No gossip backlog; proceeding

Run following command to validate status of Cassandra in given Data Centre.
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/conf$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.128.0.2  103.64 KiB  256          100.0%            a4425b47-0b19-4ede-aec4-4a78fce503cf  r1

Cassandra is running on instance-1. Now we will use Instance-1 config to create Image which will be used to spin three other instance.

Create Image using disc Instance-1: Create on Navigation menu button on top left corner and go to Images section.
  1. Click on Create Image and provide a suitable name say debian-cassandra.
  2. Select Source disk as Instance-1 and create Image. 

Create Instance-2 : Create Instance-2 using image created above. Change Boot disk and got to tab Custom Images & select debian-cassandra.
Update configuration file for Instance-2: Start terminal for Instance-2 and execute following command to change listen_address and rack info. Ip address of instance is visible on instances dashboard.
nikhilranjan234@instance-2:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.3=g' cassandra.yaml
nikhilranjan234@instance-2:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_RACK=r1=g' cassandra-rackdc.properties

Note: We have added 2 instances(10.128.0.2, 10.128.0.3) in rack1 (r1). Now we can start cassandra on instance-2 and validate that it joined data centre.

Similarly create instance-3 and instance-4 from image created and update config files with following command. Run following command from conf directory.

Configuration change for Instance-3:
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.4=g' cassandra.yaml
nikhilranjan234@instance-3:/opt/apache-cassandra-3.11.3/conf$ sed -i 's=MOD_RACK=r2=g' cassandra-rackdc.properties

Configuration change for Instance-4
:
nikhilranjan234@instance-4:/opt/apache-cassandra-4.11.3/conf$ sed -i 's=MOD_IP_ADDRESS=10.128.0.5=g' cassandra.yaml
nikhilranjan234@instance-4:/opt/apache-cassandra-4.11.3/conf$ sed -i 's=MOD_RACK=r2=g' cassandra-rackdc.properties

Check data centre status : Validate four instances are up and running in two rack: r1 and r2.
nikhilranjan234@instance-1:~$ nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  10.128.0.2  64.92 MiB  256          ?       02b41029-cacc-47d8-91ca-44a579071529  r1
UN  10.128.0.3  44.11 MiB  256          ?       94b6296c-f1d2-4817-af32-8ae8e7ea07fc  r1
UN  10.128.0.4  60.95 MiB  256          ?       0ec021b0-0ae9-47fc-bd5b-894287d78a0b  r2
UN  10.128.0.5  84.92 MiB  256          ?       0828fce5-715c-4482-a909-e9e1fd40e26a  r2

Cassandra uses the system_auth and dse_security keyspaces for storing security authentication and authorization information.Execute following command and Set the replication factor.
cassandra@cqlsh> ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1':3};

Create KEYSPACE and insert  records :
cassandra@cqlsh> CREATE KEYSPACE IF NOT EXISTS stockdb WITH replication = {'class':'NetworkTopologyStrategy', 'dc1' : 3};
Create Table user in KeySpace - stockdb
CREATE TABLE stockdb.user (
  user_id VARCHAR,
  location VARCHAR,
  display_name VARCHAR,
  first_name VARCHAR,
  last_name VARCHAR,
  PRIMARY KEY (user_id, location)
);
Insert Records in user table:
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u10','earth','Nemo','Nirmallya','Mukherjee');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth1','Kirk','William','Shatner');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u2','vulcan','Spock', 'Leonard', 'Nimoy');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth4','Scotty','James','Doohan');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth2','Bones', 'Leonard', 'McCoy');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u3','klingon','Worf','Michael','Dorn');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth5','Sulu','George','Takei');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth3','Uhura','Nichelle','Nichols');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u4','romulus','Alidar Jarok','James','Sloyan');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth7','Khan Noonien Singh','Ricardo','Montalban');
 INSERT INTO stockdb.user(user_id,location,display_name,first_name,last_name) VALUES ('u1','earth6','Chekov','Walter','Koenig');

Flushing data using nodetool
: Flushes all memtables from the node to SSTables on disk.
nikhilranjan234@instance-1:~$ nodetool flush

Replication Node(Using nodetool):
: Find which nodes stores records with user_id = u1 and location = earth1
nikhilranjan234@instance-1:~$ nodetool getendpoints stockdb user u1
10.128.0.4
10.128.0.2
10.128.0.3
nikhilranjan234@instance-1:~$ nodetool getendpoints stockdb user earth1
10.128.0.2
10.128.0.5
10.128.0.3

Where data is stored in file  and display files where data is stored : 
/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ ls -l
total 40
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 18 07:03 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 18 07:17 mc-1-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234  398 Sep 18 07:17 mc-1-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234    8 Sep 18 07:17 mc-1-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 18 07:17 mc-1-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   26 Sep 18 07:17 mc-1-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4802 Sep 18 07:17 mc-1-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 18 07:17 mc-1-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 18 07:17 mc-1-big-TOC.txt

Display data stored in Data file(Using sstabledump): 

nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ sstabledump -d mc-1-big-Data.db 
WARN  17:10:29,893 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this in cassandra.yaml
WARN  17:10:29,908 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.yaml
WARN  17:10:30,108 Only 7.935GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[u3]@0 Row[info=[ts=1537254243305484] ]: klingon | [display_name=Worf ts=1537254243305484], [first_name=Michael ts=1537254243305484], [last_name=Dorn ts=1537254243305484]
[u4]@53 Row[info=[ts=1537254243333161] ]: romulus | [display_name=Alidar Jarok ts=1537254243333161], [first_name=James ts=1537254243333161], [last_name=Sloyan ts=1537254243333161]
[u1]@114 Row[info=[ts=1537254243277414] ]: earth1 | [display_name=Kirk ts=1537254243277414], [first_name=William ts=1537254243277414], [last_name=Shatner ts=1537254243277414]
[u1]@168 Row[info=[ts=1537254243298223] ]: earth2 | [display_name=Bones ts=1537254243298223], [first_name=Leonard ts=1537254243298223], [last_name=McCoy ts=1537254243298223]
[u1]@205 Row[info=[ts=1537254243320949] ]: earth3 | [display_name=Uhura ts=1537254243320949], [first_name=Nichelle ts=1537254243320949], [last_name=Nichols ts=1537254243320949]
[u1]@245 Row[info=[ts=1537254243291221] ]: earth4 | [display_name=Scotty ts=1537254243291221], [first_name=James ts=1537254243291221], [last_name=Doohan ts=1537254243291221]
[u1]@282 Row[info=[ts=-9223372036854775808] del=deletedAt=1537254836904633, localDeletion=1537254836 ]: earth5 | 
[u1]@300 Row[info=[ts=1537254244762854] ]: earth6 | [display_name=Chekov ts=1537254244762854], [first_name=Walter ts=1537254244762854], [last_name=Koenig ts=1537254244762854]
[u1]@338 Row[info=[ts=1537254243344904] ]: earth7 | [display_name=Khan Noonien Singh ts=1537254243344904], [first_name=Ricardo ts=1537254243344904], [last_name=Montalban ts=15372542433
44904]
[u10]@393 Row[info=[ts=1537254243259442] ]: earth | [display_name=Nemo ts=1537254243259442], [first_name=Nirmallya ts=1537254243259442], [last_name=Mukherjee ts=1537254243259442]

Delete and compaction: Delete a record from table
cassandra@cqlsh> delete from stockdb.user where user_id = 'u1' and location='earth2';

After deletion, deleted record is with tombstone tag. Using compaction (nodetool compact) record is deleted and same can be verified below. After compaction new set of files created and old one is deleted.
Flush and compact
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ nodetool flush
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ nodetool compact
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ ls -l
total 40
drwxr-xr-x 2 nikhilranjan234 nikhilranjan234 4096 Sep 18 07:03 backups
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 19 17:39 mc-2-big-CompressionInfo.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234  402 Sep 19 17:39 mc-2-big-Data.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   10 Sep 19 17:39 mc-2-big-Digest.crc32
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   16 Sep 19 17:39 mc-2-big-Filter.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   26 Sep 19 17:39 mc-2-big-Index.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234 4841 Sep 19 17:39 mc-2-big-Statistics.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   51 Sep 19 17:39 mc-2-big-Summary.db
-rw-r--r-- 1 nikhilranjan234 nikhilranjan234   92 Sep 19 17:39 mc-2-big-TOC.txt
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ sstabledump -d ./mc-2-big-Data.db 
WARN  17:39:48,857 Small commitlog volume detected at /opt/apache-cassandra-3.11.3/data/commitlog; setting commitlog_total_space_in_mb to 2503.  You can override this in cassandra.yaml
WARN  17:39:48,866 Small cdc volume detected at /opt/apache-cassandra-3.11.3/data/cdc_raw; setting cdc_total_space_in_mb to 1251.  You can override this in cassandra.yaml
WARN  17:39:49,045 Only 7.960GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
[u3]@0 Row[info=[ts=1537254243305484] ]: klingon | [display_name=Worf ts=1537254243305484], [first_name=Michael ts=1537254243305484], [last_name=Dorn ts=1537254243305484]
[u4]@53 Row[info=[ts=1537254243333161] ]: romulus | [display_name=Alidar Jarok ts=1537254243333161], [first_name=James ts=1537254243333161], [last_name=Sloyan ts=1537254243333161]
[u1]@114 Row[info=[ts=-9223372036854775808] del=deletedAt=1537377916253721, localDeletion=1537377916 ]: ear
 | 
[u1]@149 Row[info=[ts=1537254243277414] ]: earth1 | [display_name=Kirk ts=1537254243277414], [first_name=William ts=1537254243277414], [last_name=Shatner ts=1537254243277414]
[u1]@187 Row[info=[ts=-9223372036854775808] del=deletedAt=1537377956120236, localDeletion=1537377956 ]: earth2 | 
[u1]@208 Row[info=[ts=1537254243320949] ]: earth3 | [display_name=Uhura ts=1537254243320949], [first_name=Nichelle ts=1537254243320949], [last_name=Nichols ts=1537254243320949]
[u1]@248 Row[info=[ts=1537254243291221] ]: earth4 | [display_name=Scotty ts=1537254243291221], [first_name=James ts=1537254243291221], [last_name=Doohan ts=1537254243291221]
[u1]@285 Row[info=[ts=-9223372036854775808] del=deletedAt=1537254836904633, localDeletion=1537254836 ]: earth5 | 
[u1]@303 Row[info=[ts=1537254244762854] ]: earth6 | [display_name=Chekov ts=1537254244762854], [first_name=Walter ts=1537254244762854], [last_name=Koenig ts=1537254244762854]
[u1]@341 Row[info=[ts=1537254243344904] ]: earth7 | [display_name=Khan Noonien Singh ts=1537254243344904], [first_name=Ricardo ts=1537254243344904], [last_name=Montalban ts=15372542433
44904]
[u10]@396 Row[info=[ts=1537254243259442] ]: earth | [display_name=Nemo ts=1537254243259442], [first_name=Nirmallya ts=1537254243259442], [last_name=Mukherjee ts=1537254243259442]

Display JSON formatted data
:
nikhilranjan234@instance-1:/opt/apache-cassandra-3.11.3/data/data/stockdb/user-ee050f60bb1011e8b38fdd33159348d4$ sstabledump ./mc-2-big-Data.db 


Location: Bengaluru, Karnataka, India