Oct 2, 2018

Textual description of firstImageUrl

Kafka Connect: Setup Kafka Connect Cluster(Docker Landoop fast-data-dev) and FileStream Source connector (Standalone and Distributed mode) Part-2

In previous post we did setup Kafka connect cluster in docker(Landoop fast-data-dev) and created  FileStreamConnector in standalone mode for transferring file content to Kafka topic. In this post we will use existing Docker Kafka connect setup to transfer file content in distributed mode.

Kafka Connect Source (Distributed mode) 

In this section we will see how to configure a connector in distributed mode using Kafka connect UI and run FileStreamSource connector in distributed mode. Read from file and publish data to Kafka topic. FileStreamSource connector acts as carrier.

Step-1: Create Kafka topic "kafka-connect-distibuted" with 3 partition and replication factor 1. Topic created can be validated form in browser at address http://127.0.0.1:3030/kafka-topics-ui/#/.
root@fast-data-dev kafka-connect $ kafka-topics --create --topic kafka-connect-distributed --partitions 3 --replication-factor 1 --zookeeper 127.0.0.1:2181
Created topic "kafka-connect-distributed".

Step-2: Go to Kafka connect UI http://127.0.0.1:3030/kafka-connect-ui/#/cluster/fast-data-dev and Create File connector. Click on New and select File from available connector.
FileStream connector creation from Kafka Connect UI

Step-3: Update configs with the following & click CREATE. On success it creates a connector "file-stream-kafka-connect-distributed" and lists in left side connector panel.
name=file-stream-kafka-connect-distributed
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
file=source-input.txt
topic=kafka-connect-distributed
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true
FileStreamSource Connector configuration
Step-4: Since we are running connector in distributed mode, we have to create source-input.txt file in Kafka connect cluster. Make sure docker is up and running (Remember: docker-compose up kafka-cluster)
  • Find running docker container ID. 
➜  Kafka-connect docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                                                                                                                                                  NAMES
2ebdc89d7caf        landoop/fast-data-dev:cp3.3.0   "/usr/local/bin/dumb…"   18 hours ago        Up 18 hours         0.0.0.0:2181->2181/tcp, 0.0.0.0:3030->3030/tcp, 0.0.0.0:8081-8083->8081-8083/tcp, 0.0.0.0:9092->9092/tcp, 0.0.0.0:9581-9585->9581-9585/tcp, 3031/tcp   kafkaconnect_kafka-cluster_1
  • Using above containerId(2ebdc89d7caf) login to docker machine, execute ls command validate we are in docker.
➜  Kafka-connect docker exec -it 2ebdc89d7caf bash
root@fast-data-dev / $ pwd
/
root@fast-data-dev / $ ls
bin	    dev			home   mnt   root  srv	usr
build.info  etc			lib    opt   run   sys	var
connectors  extra-connect-jars	media  proc  sbin  tmp
root@fast-data-dev / $ 
Step-5: Action time !! Create source-input.txt and write some message in it. Once file is saved, all message is posted in topic "kafka-connect-distributed".
root@fast-data-dev / $ vi source-input.txt 
root@fast-data-dev / $ cat source-input.txt 
Hello, From distributed mode
Messaege from docker 
Bye, Last message !!

Step-6 : Visit Kafka topic UI and check for the message copied from file to topic by FileSourceConnector. Note that message are stored in JSON format as connector topic created earlier with config value.converter.schemas.enable=true.

Consume message from topic and validate message are retrieved in JOSN format

Add some new message in source-input.txt.  Open a new terminal and start consuming message from topic "kafka-connect-distributed".

Docker terminal
root@fast-data-dev / $ pwd
/
root@fast-data-dev / $ echo "MesageLive-1" >> source-input.txt 
root@fast-data-dev / $ echo "MesageLive-Dist-2" >> source-input.txt 
root@fast-data-dev / $ 

Topic consumer terminal 
➜  Kafka-connect docker run --rm -it --net=host landoop/fast-data-dev bash

root@fast-data-dev / $ kafka-console-consumer --topic kafka-connect-distributed --from-beginning --bootstrap-server 127.0.0.1:9092
{"schema":{"type":"string","optional":false},"payload":"Hello, From distributed mode"}
{"schema":{"type":"string","optional":false},"payload":"Messaege from docker "}
{"schema":{"type":"string","optional":false},"payload":"Bye, Last message !!"}
{"schema":{"type":"string","optional":false},"payload":"MesageLive-1"}
{"schema":{"type":"string","optional":false},"payload":"MesageLive-Dist-2"}


Related post: 
Location: Bengaluru, Karnataka, India