Oct 2, 2018

Textual description of firstImageUrl

Kafka Connect: Setup Kafka Connect Cluster(Docker Landoop fast-data-dev) and FileStream Source connector (Standalone and Distributed mode) Part-2

In previous post we did setup Kafka connect cluster in docker(Landoop fast-data-dev) and created  FileStreamConnector in standalone mode for transferring file content to Kafka topic. In this post we will use existing Docker Kafka connect setup to transfer file content in distributed mode.

Kafka Connect Source (Distributed mode) 

In this section we will see how to configure a connector in distributed mode using Kafka connect UI and run FileStreamSource connector in distributed mode. Read from file and publish data to Kafka topic. FileStreamSource connector acts as carrier.

Step-1: Create Kafka topic "kafka-connect-distibuted" with 3 partition and replication factor 1. Topic created can be validated form in browser at address http://127.0.0.1:3030/kafka-topics-ui/#/.
root@fast-data-dev kafka-connect $ kafka-topics --create --topic kafka-connect-distributed --partitions 3 --replication-factor 1 --zookeeper 127.0.0.1:2181
Created topic "kafka-connect-distributed".

Step-2: Go to Kafka connect UI http://127.0.0.1:3030/kafka-connect-ui/#/cluster/fast-data-dev and Create File connector. Click on New and select File from available connector.
FileStream connector creation from Kafka Connect UI

Step-3: Update configs with the following & click CREATE. On success it creates a connector "file-stream-kafka-connect-distributed" and lists in left side connector panel.
name=file-stream-kafka-connect-distributed
connector.class=org.apache.kafka.connect.file.FileStreamSourceConnector
tasks.max=1
file=source-input.txt
topic=kafka-connect-distributed
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true
FileStreamSource Connector configuration
Step-4: Since we are running connector in distributed mode, we have to create source-input.txt file in Kafka connect cluster. Make sure docker is up and running (Remember: docker-compose up kafka-cluster)
  • Find running docker container ID. 
➜  Kafka-connect docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS                                                                                                                                                  NAMES
2ebdc89d7caf        landoop/fast-data-dev:cp3.3.0   "/usr/local/bin/dumb…"   18 hours ago        Up 18 hours         0.0.0.0:2181->2181/tcp, 0.0.0.0:3030->3030/tcp, 0.0.0.0:8081-8083->8081-8083/tcp, 0.0.0.0:9092->9092/tcp, 0.0.0.0:9581-9585->9581-9585/tcp, 3031/tcp   kafkaconnect_kafka-cluster_1
  • Using above containerId(2ebdc89d7caf) login to docker machine, execute ls command validate we are in docker.
➜  Kafka-connect docker exec -it 2ebdc89d7caf bash
root@fast-data-dev / $ pwd
/
root@fast-data-dev / $ ls
bin	    dev			home   mnt   root  srv	usr
build.info  etc			lib    opt   run   sys	var
connectors  extra-connect-jars	media  proc  sbin  tmp
root@fast-data-dev / $ 
Step-5: Action time !! Create source-input.txt and write some message in it. Once file is saved, all message is posted in topic "kafka-connect-distributed".
root@fast-data-dev / $ vi source-input.txt 
root@fast-data-dev / $ cat source-input.txt 
Hello, From distributed mode
Messaege from docker 
Bye, Last message !!

Step-6 : Visit Kafka topic UI and check for the message copied from file to topic by FileSourceConnector. Note that message are stored in JSON format as connector topic created earlier with config value.converter.schemas.enable=true.

Consume message from topic and validate message are retrieved in JOSN format

Add some new message in source-input.txt.  Open a new terminal and start consuming message from topic "kafka-connect-distributed".

Docker terminal
root@fast-data-dev / $ pwd
/
root@fast-data-dev / $ echo "MesageLive-1" >> source-input.txt 
root@fast-data-dev / $ echo "MesageLive-Dist-2" >> source-input.txt 
root@fast-data-dev / $ 

Topic consumer terminal 
➜  Kafka-connect docker run --rm -it --net=host landoop/fast-data-dev bash

root@fast-data-dev / $ kafka-console-consumer --topic kafka-connect-distributed --from-beginning --bootstrap-server 127.0.0.1:9092
{"schema":{"type":"string","optional":false},"payload":"Hello, From distributed mode"}
{"schema":{"type":"string","optional":false},"payload":"Messaege from docker "}
{"schema":{"type":"string","optional":false},"payload":"Bye, Last message !!"}
{"schema":{"type":"string","optional":false},"payload":"MesageLive-1"}
{"schema":{"type":"string","optional":false},"payload":"MesageLive-Dist-2"}


Related post: 
Location: Bengaluru, Karnataka, India

6 comments:


  1. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    courses for big data analytics

    ReplyDelete
  2. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!big data malaysia
    data scientist certification malaysia
    data analytics courses

    ReplyDelete
  3. I was taking a gander at some of your posts on this site and I consider this site is truly informational! Keep setting up..
    big data in malaysia
    data scientist course in malaysia
    data analytics courses
    360DigiTMG

    ReplyDelete