Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018 Frank Pientka,...
Transcript of Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018 Frank Pientka,...
© Materna GmbH 2018 www.materna.de
Data Streaming & Messaging
with Apache Kafka
Frank Pientka
Let the data flow!
© Materna GmbH 2018 www.materna.de
Frank Pientka, Dipl.-Informatiker
+49 (231) 5599 8854
+49 (1570) 1128854
www.materna.de
Dipl.-Informatiker (TH Karlsruhe)
Verheiratet, 2 Töchter
Principal Software Architect in Dortmund
Fast 30 Jahre IT-Erfahrung
Projekte, Veröffentlichungen und Vorträge
Mehr Qualität in Software, Netzwerker, Innovator
Wer ist Frank Pientka?
2
© Materna GmbH 2018 www.materna.de
Agenda
3
The need for speed – fast data
Two worlds – message & data together
Why Kafka?
What is Kafka?
Cluster
Messaging
Clients
Connecting
Streaming
Confluent use cases, platform
Kafka steps
Resume
© Materna GmbH 2018 www.materna.de
Big data - fast data
4
© Materna GmbH 2018 www.materna.de
Three Vs of Big Data
5
Velocity
VolumeVariety
© Materna GmbH 2018 www.materna.de
The data value chain
6
Data Value
Age of Data
Single
Data
Item
Aggregate
Data
Value
Close the gap!
© Materna GmbH 2018 www.materna.de
The lambda architecture for big data analysis
Speed layer (velocity)
Batch layer (volume)
Data
storage
Serving layer
Data
queuing
Data source
Presentation
Batch
processing
Real-time
processing
© Materna GmbH 2018 www.materna.de
Kappa architecture for fast data anylytics
8
Speed layer (velocity)
Serving layer
Data
queuingData source
PresentationReal-time
processing
© Materna GmbH 2018 www.materna.de
Big data - fast data: The need for speed
9
Stream Mini-Batch
Query Batch
© Materna GmbH 2018 www.materna.de 10
Since 2011 LinkedIn
Apache 2012
Confluent 2014
Writen in Java & Scala
Kafka 0.11 Streaming 2017
Kafka 1.1 March 28, 2018
Kafka 1.1.1, 2.0 planed 2018
What is?
© Materna GmbH 2018 www.materna.de 11
© Materna GmbH 2018 www.materna.de
Messaging with Kafka
12
Producer Consumer
Topic A
Topic B
Topic A
Key
Value
Time
Broker
State
Store
Intermediate Topic
CRC attributeskey-
length
key-
content
message-
length
message
-content
Message format
© Materna GmbH 2018 www.materna.de
Topics in 3 partitions with 3 replicas
13
order of messages
within a partition are
guaranteed by key
© Materna GmbH 2018 www.materna.de
Distributed partitions (P0-P3) parallel processed by consumer groups (C1-C6)
14
groups spilt
on partitions
for read parallelization
© Materna GmbH 2018 www.materna.de
Consumer groups subscribed to a topic with parallel reads
15
rebalancing
© Materna GmbH 2018 www.materna.de
Last commit offset, current read Client offset, High watermark, Log end offset
16
© Materna GmbH 2018 www.materna.de
Producer, consumer, offset, retention period
17
Messages are retained
Consumer knows his position
Horizontal scaling
© Materna GmbH 2018 www.materna.de
Topics and partioned logs writes in a cluster with horizontal scalability
18
Producers
© Materna GmbH 2018 www.materna.de
Log Compaction Basics
19
© Materna GmbH 2018 www.materna.de
Log Compaction Basics
20
© Materna GmbH 2018 www.materna.de
Kafka Cluster single node multiple broker:
Zookeeper, Producer, Consumer groups
21
Consumer3
(Group2)Kafka
Broker
Consumer4
(Group2)
Producer
Zookeeper
2181
Consumer2
(Group1)
Consumer1
(Group1)
Streaming
Update Consumed
Message offset
Queue
Topology
Topic
Topology
Kafka
Broker
9092
Highly scalable, available and distributed
Benefits
• Costs
• Scalability (size and speed) Big/FastData
• Availability (distribution, backpressure?)
• Message ordering retention
Get Cluster topic infos
© Materna GmbH 2018 www.materna.de
Kafka consistency and failover with leader and follower replicas
22
bin/kafka-topics.sh –create –zookeeper
localhost:2181 –replication-factor 3 –partitions 3
–topic MultiBrokerTopic
9092 9093 9094
© Materna GmbH 2018 www.materna.de
Kafka consistency and failover from broker 1 to 2
23
bin/kafka-console-producer.sh –broker-list
localhost:9092,localhost:9093,localhost:9094
–topic MultiBrokerTopic
9092 9093 9094
© Materna GmbH 2018 www.materna.de 24
ecosystem
© Materna GmbH 2018 www.materna.de
Kafka Connectors source & sink
25
Data
source
Connect
KafkaData
sink
Connect
Console
File
JDBC
ElasticSearch
Hdfs
S3
dynamoDB
© Materna GmbH 2018 www.materna.de
Kafka Connectors
26
CONNECTOR TYPE CONNECTOR TYPE
ElasticSearch sink HDFS sink
Amazon S3 sink Cassandra sink
Oracle CDC source Mongo DB source
MQTT source JMS sink
Couchbase sink & source Dynamo DB sink & source
IBM MQ sink & source JDBC sink & source
Blockchain source Amazon Kinesis sink
CoAP source Azure
DocumentDB
sink
Splunk sink & source Solr sink & source
© Materna GmbH 2018 www.materna.de
Process of Kafka stream processing (API, KSQL)
27
Create a STREAM/TABLE from
Kafka topic with KSQL
© Materna GmbH 2018 www.materna.de
Create KStream, KTable from Topic
KTable as changelog stream
28
stream-table duality
- Stream as Table: stream as changelog of a table,
aggregating stream data return a table
-Table as Stream: A table can as a stream snapshot
(key ,
value)
records
Sum of
values As
KStream
Sum of
values As
KTable
(“kafka”, 1)
(“kafka”, 2)
3 2
© Materna GmbH 2018 www.materna.de
Kafka Streams supports three kinds of joins
29
© Materna GmbH 2018 www.materna.de
Operations on KStream & KTable
30
RocksDB or In-memory
Store Type
are internal
compacted changelog topics
Tumbling window vs Hopping window
© Materna GmbH 2018 www.materna.de
State in Cluster – Stream processing
31
© Materna GmbH 2018 www.materna.de
event store reconstruct the *original table from the changelog stream
32
Don’t use log compaction
with KStreams!
Breaks event store
© Materna GmbH 2018 www.materna.de
Kappa architecture with Kafka Streams & Kafka Connect
33
Speed layer (core+streams)Serving layer (connect)
Input_topicData source
Output_table n
Stream
processing Output_table n+1
job n
job n+1
© Materna GmbH 2018 www.materna.de 34
Publish & subscribe
Read and write streams of data like a messaging system
Process
Write scalable stream processing applications that react to events in real-time
Store
Store streams of data safely in a distributed, replicated, fault-tolerant cluster
© Materna GmbH 2018 www.materna.de
Let’s start getting hands dirty
© Materna GmbH 2018 www.materna.de
Create/List Topics
Create a topic
> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --
partitions 1 --topic test
List down all topics
> bin/kafka-topics.sh --list --zookeeper localhost:2181
Output: test
© Materna GmbH 2018 www.materna.de
Producer
Send some Messages
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Now type on console:
This is a message
This is another message
© Materna GmbH 2018 www.materna.de
Consumer
Receive some Messages
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic
test --from-beginning
This is a message
This is another message
© Materna GmbH 2018 www.materna.de
Cluster
39
> cp config/server.properties config/server-93.properties
broker.id=93
listeners=PLAINTEXT://:9093
log.dir=/tmp/kafka-logs-93
Now Start another Kafka Server create topic with replication factor 2 (=# brokers)
bin/kafka-server-start.sh config/server-93.properties
bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 2 –partitions 1
–topic MultiBrokerTopic
bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topic MultiBrokerTopic
bin/kafka-console-producer.sh –broker-list localhost:9092,localhost:9093 –topic
MultiBrokerTopic
bin/kafka-console-consumer.sh –bootstrap-server localhost:9092,localhost:9093 –from-
beginning –topic MultiBrokerTopic
Kill Leader, Broker switch from ID 93 to ID 0
© Materna GmbH 2018 www.materna.de
Connect
40
connect-file-sink.properties file=test.txt
topic=connect-test
connect-file-source.properties file=test.sink.txt
topics=connect-test
echo -e “hello\nworld” > test.txt
> bin/connect-standalone.sh config/connect-file-source.properties
config/connect-file-sink.properties
more test.sink.txt
> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test
--from-beginning
{“schema”:{“type”:”string”, “optional”:false},“payload”:”hello”}
{“schema”:{“type”:”string”,“optional”:false},“payload”:”world”}
© Materna GmbH 2018 www.materna.de
Uses Cases for Apache Kafka (Confluent)
41
© Materna GmbH 2018 www.materna.de
Confluent Platform: open source & commercial
42
© Materna GmbH 2018 www.materna.de
Resume Kafka
43
Best of both worlds: distributed, highly scalable messaging & streaming
Extendable platform with lots of connectors, supported programming languages
Stream processing is a fast growing topic with promising solutions
Lack of standards
Basic authorization, security mechanism
Productions challenges (e.g. monitoring, debugging, sizing in the cloud, containers etc.)
Growing experience and best-Practicies
Professional support
Managed cloud solutions
© Materna GmbH 2018 www.materna.de
Further info's
44
The Dataflow Model: A Practical Approach to
Balancing Correctness, Latency, and Cost in Massive-
Scale, Unbounded, Out-of-Order Data Processing,
Tyler Akidau et al., VLDB 2015
© Materna GmbH 2018 www.materna.de 45
More questions?
© Materna GmbH 2018 www.materna.de© Materna GmbH 2018 www.materna.de
Kontakt
Materna GmbH
Frank Pientka
Tel. +49 1570 1128854
E-Mail: [email protected]
www.materna.de
46