Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018 Frank Pientka,...

46
© Materna GmbH 2018 www.materna.de Data Streaming & Messaging with Apache Kafka Frank Pientka Let the data flow!

Transcript of Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018 Frank Pientka,...

Page 1: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Data Streaming & Messaging

with Apache Kafka

Frank Pientka

Let the data flow!

Page 2: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Frank Pientka, Dipl.-Informatiker

[email protected]

+49 (231) 5599 8854

+49 (1570) 1128854

www.materna.de

Dipl.-Informatiker (TH Karlsruhe)

Verheiratet, 2 Töchter

Principal Software Architect in Dortmund

Fast 30 Jahre IT-Erfahrung

Projekte, Veröffentlichungen und Vorträge

Mehr Qualität in Software, Netzwerker, Innovator

Wer ist Frank Pientka?

2

Page 3: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Agenda

3

The need for speed – fast data

Two worlds – message & data together

Why Kafka?

What is Kafka?

Cluster

Messaging

Clients

Connecting

Streaming

Confluent use cases, platform

Kafka steps

Resume

Page 4: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Big data - fast data

4

Page 5: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Three Vs of Big Data

5

Velocity

VolumeVariety

Page 6: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

The data value chain

6

Data Value

Age of Data

Single

Data

Item

Aggregate

Data

Value

Close the gap!

Page 7: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

The lambda architecture for big data analysis

Speed layer (velocity)

Batch layer (volume)

Data

storage

Serving layer

Data

queuing

Data source

Presentation

Batch

processing

Real-time

processing

Page 8: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kappa architecture for fast data anylytics

8

Speed layer (velocity)

Serving layer

Data

queuingData source

PresentationReal-time

processing

Page 9: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Big data - fast data: The need for speed

9

Stream Mini-Batch

Query Batch

Page 10: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de 10

Since 2011 LinkedIn

Apache 2012

Confluent 2014

Writen in Java & Scala

Kafka 0.11 Streaming 2017

Kafka 1.1 March 28, 2018

Kafka 1.1.1, 2.0 planed 2018

What is?

Page 11: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de 11

Page 12: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Messaging with Kafka

12

Producer Consumer

Topic A

Topic B

Topic A

Key

Value

Time

Broker

State

Store

Intermediate Topic

CRC attributeskey-

length

key-

content

message-

length

message

-content

Message format

Page 13: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Topics in 3 partitions with 3 replicas

13

order of messages

within a partition are

guaranteed by key

Page 14: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Distributed partitions (P0-P3) parallel processed by consumer groups (C1-C6)

14

groups spilt

on partitions

for read parallelization

Page 15: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Consumer groups subscribed to a topic with parallel reads

15

rebalancing

Page 16: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Last commit offset, current read Client offset, High watermark, Log end offset

16

Page 17: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Producer, consumer, offset, retention period

17

Messages are retained

Consumer knows his position

Horizontal scaling

Page 18: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Topics and partioned logs writes in a cluster with horizontal scalability

18

Producers

Page 19: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Log Compaction Basics

19

Page 20: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Log Compaction Basics

20

Page 21: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kafka Cluster single node multiple broker:

Zookeeper, Producer, Consumer groups

21

Consumer3

(Group2)Kafka

Broker

Consumer4

(Group2)

Producer

Zookeeper

2181

Consumer2

(Group1)

Consumer1

(Group1)

Streaming

Update Consumed

Message offset

Queue

Topology

Topic

Topology

Kafka

Broker

9092

Highly scalable, available and distributed

Benefits

• Costs

• Scalability (size and speed) Big/FastData

• Availability (distribution, backpressure?)

• Message ordering retention

Get Cluster topic infos

Page 22: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kafka consistency and failover with leader and follower replicas

22

bin/kafka-topics.sh –create –zookeeper

localhost:2181 –replication-factor 3 –partitions 3

–topic MultiBrokerTopic

9092 9093 9094

Page 23: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kafka consistency and failover from broker 1 to 2

23

bin/kafka-console-producer.sh –broker-list

localhost:9092,localhost:9093,localhost:9094

–topic MultiBrokerTopic

9092 9093 9094

Page 24: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de 24

ecosystem

Page 25: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kafka Connectors source & sink

25

Data

source

Connect

KafkaData

sink

Connect

Console

File

JDBC

ElasticSearch

Hdfs

S3

dynamoDB

Page 26: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kafka Connectors

26

CONNECTOR TYPE CONNECTOR TYPE

ElasticSearch sink HDFS sink

Amazon S3 sink Cassandra sink

Oracle CDC source Mongo DB source

MQTT source JMS sink

Couchbase sink & source Dynamo DB sink & source

IBM MQ sink & source JDBC sink & source

Blockchain source Amazon Kinesis sink

CoAP source Azure

DocumentDB

sink

Splunk sink & source Solr sink & source

Page 27: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Process of Kafka stream processing (API, KSQL)

27

Create a STREAM/TABLE from

Kafka topic with KSQL

Page 28: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Create KStream, KTable from Topic

KTable as changelog stream

28

stream-table duality

- Stream as Table: stream as changelog of a table,

aggregating stream data return a table

-Table as Stream: A table can as a stream snapshot

(key ,

value)

records

Sum of

values As

KStream

Sum of

values As

KTable

(“kafka”, 1)

(“kafka”, 2)

3 2

Page 29: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kafka Streams supports three kinds of joins

29

Page 30: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Operations on KStream & KTable

30

RocksDB or In-memory

Store Type

are internal

compacted changelog topics

Tumbling window vs Hopping window

Page 31: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

State in Cluster – Stream processing

31

Page 32: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

event store reconstruct the *original table from the changelog stream

32

Don’t use log compaction

with KStreams!

Breaks event store

Page 33: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Kappa architecture with Kafka Streams & Kafka Connect

33

Speed layer (core+streams)Serving layer (connect)

Input_topicData source

Output_table n

Stream

processing Output_table n+1

job n

job n+1

Page 34: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de 34

Publish & subscribe

Read and write streams of data like a messaging system

Process

Write scalable stream processing applications that react to events in real-time

Store

Store streams of data safely in a distributed, replicated, fault-tolerant cluster

Page 35: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Let’s start getting hands dirty

Page 36: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Create/List Topics

Create a topic

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --

partitions 1 --topic test

List down all topics

> bin/kafka-topics.sh --list --zookeeper localhost:2181

Output: test

Page 37: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Producer

Send some Messages

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Now type on console:

This is a message

This is another message

Page 38: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Consumer

Receive some Messages

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic

test --from-beginning

This is a message

This is another message

Page 39: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Cluster

39

> cp config/server.properties config/server-93.properties

broker.id=93

listeners=PLAINTEXT://:9093

log.dir=/tmp/kafka-logs-93

Now Start another Kafka Server create topic with replication factor 2 (=# brokers)

bin/kafka-server-start.sh config/server-93.properties

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 2 –partitions 1

–topic MultiBrokerTopic

bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topic MultiBrokerTopic

bin/kafka-console-producer.sh –broker-list localhost:9092,localhost:9093 –topic

MultiBrokerTopic

bin/kafka-console-consumer.sh –bootstrap-server localhost:9092,localhost:9093 –from-

beginning –topic MultiBrokerTopic

Kill Leader, Broker switch from ID 93 to ID 0

Page 40: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Connect

40

connect-file-sink.properties file=test.txt

topic=connect-test

connect-file-source.properties file=test.sink.txt

topics=connect-test

echo -e “hello\nworld” > test.txt

> bin/connect-standalone.sh config/connect-file-source.properties

config/connect-file-sink.properties

more test.sink.txt

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test

--from-beginning

{“schema”:{“type”:”string”, “optional”:false},“payload”:”hello”}

{“schema”:{“type”:”string”,“optional”:false},“payload”:”world”}

Page 41: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Uses Cases for Apache Kafka (Confluent)

41

Page 42: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Confluent Platform: open source & commercial

42

Page 43: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Resume Kafka

43

Best of both worlds: distributed, highly scalable messaging & streaming

Extendable platform with lots of connectors, supported programming languages

Stream processing is a fast growing topic with promising solutions

Lack of standards

Basic authorization, security mechanism

Productions challenges (e.g. monitoring, debugging, sizing in the cloud, containers etc.)

Growing experience and best-Practicies

Professional support

Managed cloud solutions

Page 44: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de

Further info's

44

The Dataflow Model: A Practical Approach to

Balancing Correctness, Latency, and Cost in Massive-

Scale, Unbounded, Out-of-Order Data Processing,

Tyler Akidau et al., VLDB 2015

Page 45: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de 45

More questions?

Page 46: Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018  Frank Pientka, Dipl.-Informatiker frank.pientka@materna.de +49 (231) 5599 8854 +49 (1570) 1128854

© Materna GmbH 2018 www.materna.de© Materna GmbH 2018 www.materna.de

Kontakt

Materna GmbH

Frank Pientka

Tel. +49 1570 1128854

E-Mail: [email protected]

www.materna.de

46