Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018 Frank Pientka,...

Post on 21-May-2020

0 views 0 download

Transcript of Let the data flow! - Java Forum Stuttgart · © Materna GmbH 2018 Frank Pientka,...

© Materna GmbH 2018 www.materna.de

Data Streaming & Messaging

with Apache Kafka

Frank Pientka

Let the data flow!

© Materna GmbH 2018 www.materna.de

Frank Pientka, Dipl.-Informatiker

frank.pientka@materna.de

+49 (231) 5599 8854

+49 (1570) 1128854

www.materna.de

Dipl.-Informatiker (TH Karlsruhe)

Verheiratet, 2 Töchter

Principal Software Architect in Dortmund

Fast 30 Jahre IT-Erfahrung

Projekte, Veröffentlichungen und Vorträge

Mehr Qualität in Software, Netzwerker, Innovator

Wer ist Frank Pientka?

2

© Materna GmbH 2018 www.materna.de

Agenda

3

The need for speed – fast data

Two worlds – message & data together

Why Kafka?

What is Kafka?

Cluster

Messaging

Clients

Connecting

Streaming

Confluent use cases, platform

Kafka steps

Resume

© Materna GmbH 2018 www.materna.de

Big data - fast data

4

© Materna GmbH 2018 www.materna.de

Three Vs of Big Data

5

Velocity

VolumeVariety

© Materna GmbH 2018 www.materna.de

The data value chain

6

Data Value

Age of Data

Single

Data

Item

Aggregate

Data

Value

Close the gap!

© Materna GmbH 2018 www.materna.de

The lambda architecture for big data analysis

Speed layer (velocity)

Batch layer (volume)

Data

storage

Serving layer

Data

queuing

Data source

Presentation

Batch

processing

Real-time

processing

© Materna GmbH 2018 www.materna.de

Kappa architecture for fast data anylytics

8

Speed layer (velocity)

Serving layer

Data

queuingData source

PresentationReal-time

processing

© Materna GmbH 2018 www.materna.de

Big data - fast data: The need for speed

9

Stream Mini-Batch

Query Batch

© Materna GmbH 2018 www.materna.de 10

Since 2011 LinkedIn

Apache 2012

Confluent 2014

Writen in Java & Scala

Kafka 0.11 Streaming 2017

Kafka 1.1 March 28, 2018

Kafka 1.1.1, 2.0 planed 2018

What is?

© Materna GmbH 2018 www.materna.de 11

© Materna GmbH 2018 www.materna.de

Messaging with Kafka

12

Producer Consumer

Topic A

Topic B

Topic A

Key

Value

Time

Broker

State

Store

Intermediate Topic

CRC attributeskey-

length

key-

content

message-

length

message

-content

Message format

© Materna GmbH 2018 www.materna.de

Topics in 3 partitions with 3 replicas

13

order of messages

within a partition are

guaranteed by key

© Materna GmbH 2018 www.materna.de

Distributed partitions (P0-P3) parallel processed by consumer groups (C1-C6)

14

groups spilt

on partitions

for read parallelization

© Materna GmbH 2018 www.materna.de

Consumer groups subscribed to a topic with parallel reads

15

rebalancing

© Materna GmbH 2018 www.materna.de

Last commit offset, current read Client offset, High watermark, Log end offset

16

© Materna GmbH 2018 www.materna.de

Producer, consumer, offset, retention period

17

Messages are retained

Consumer knows his position

Horizontal scaling

© Materna GmbH 2018 www.materna.de

Topics and partioned logs writes in a cluster with horizontal scalability

18

Producers

© Materna GmbH 2018 www.materna.de

Log Compaction Basics

19

© Materna GmbH 2018 www.materna.de

Log Compaction Basics

20

© Materna GmbH 2018 www.materna.de

Kafka Cluster single node multiple broker:

Zookeeper, Producer, Consumer groups

21

Consumer3

(Group2)Kafka

Broker

Consumer4

(Group2)

Producer

Zookeeper

2181

Consumer2

(Group1)

Consumer1

(Group1)

Streaming

Update Consumed

Message offset

Queue

Topology

Topic

Topology

Kafka

Broker

9092

Highly scalable, available and distributed

Benefits

• Costs

• Scalability (size and speed) Big/FastData

• Availability (distribution, backpressure?)

• Message ordering retention

Get Cluster topic infos

© Materna GmbH 2018 www.materna.de

Kafka consistency and failover with leader and follower replicas

22

bin/kafka-topics.sh –create –zookeeper

localhost:2181 –replication-factor 3 –partitions 3

–topic MultiBrokerTopic

9092 9093 9094

© Materna GmbH 2018 www.materna.de

Kafka consistency and failover from broker 1 to 2

23

bin/kafka-console-producer.sh –broker-list

localhost:9092,localhost:9093,localhost:9094

–topic MultiBrokerTopic

9092 9093 9094

© Materna GmbH 2018 www.materna.de 24

ecosystem

© Materna GmbH 2018 www.materna.de

Kafka Connectors source & sink

25

Data

source

Connect

KafkaData

sink

Connect

Console

File

JDBC

ElasticSearch

Hdfs

S3

dynamoDB

© Materna GmbH 2018 www.materna.de

Kafka Connectors

26

CONNECTOR TYPE CONNECTOR TYPE

ElasticSearch sink HDFS sink

Amazon S3 sink Cassandra sink

Oracle CDC source Mongo DB source

MQTT source JMS sink

Couchbase sink & source Dynamo DB sink & source

IBM MQ sink & source JDBC sink & source

Blockchain source Amazon Kinesis sink

CoAP source Azure

DocumentDB

sink

Splunk sink & source Solr sink & source

© Materna GmbH 2018 www.materna.de

Process of Kafka stream processing (API, KSQL)

27

Create a STREAM/TABLE from

Kafka topic with KSQL

© Materna GmbH 2018 www.materna.de

Create KStream, KTable from Topic

KTable as changelog stream

28

stream-table duality

- Stream as Table: stream as changelog of a table,

aggregating stream data return a table

-Table as Stream: A table can as a stream snapshot

(key ,

value)

records

Sum of

values As

KStream

Sum of

values As

KTable

(“kafka”, 1)

(“kafka”, 2)

3 2

© Materna GmbH 2018 www.materna.de

Kafka Streams supports three kinds of joins

29

© Materna GmbH 2018 www.materna.de

Operations on KStream & KTable

30

RocksDB or In-memory

Store Type

are internal

compacted changelog topics

Tumbling window vs Hopping window

© Materna GmbH 2018 www.materna.de

State in Cluster – Stream processing

31

© Materna GmbH 2018 www.materna.de

event store reconstruct the *original table from the changelog stream

32

Don’t use log compaction

with KStreams!

Breaks event store

© Materna GmbH 2018 www.materna.de

Kappa architecture with Kafka Streams & Kafka Connect

33

Speed layer (core+streams)Serving layer (connect)

Input_topicData source

Output_table n

Stream

processing Output_table n+1

job n

job n+1

© Materna GmbH 2018 www.materna.de 34

Publish & subscribe

Read and write streams of data like a messaging system

Process

Write scalable stream processing applications that react to events in real-time

Store

Store streams of data safely in a distributed, replicated, fault-tolerant cluster

© Materna GmbH 2018 www.materna.de

Let’s start getting hands dirty

© Materna GmbH 2018 www.materna.de

Create/List Topics

Create a topic

> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --

partitions 1 --topic test

List down all topics

> bin/kafka-topics.sh --list --zookeeper localhost:2181

Output: test

© Materna GmbH 2018 www.materna.de

Producer

Send some Messages

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Now type on console:

This is a message

This is another message

© Materna GmbH 2018 www.materna.de

Consumer

Receive some Messages

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic

test --from-beginning

This is a message

This is another message

© Materna GmbH 2018 www.materna.de

Cluster

39

> cp config/server.properties config/server-93.properties

broker.id=93

listeners=PLAINTEXT://:9093

log.dir=/tmp/kafka-logs-93

Now Start another Kafka Server create topic with replication factor 2 (=# brokers)

bin/kafka-server-start.sh config/server-93.properties

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 2 –partitions 1

–topic MultiBrokerTopic

bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topic MultiBrokerTopic

bin/kafka-console-producer.sh –broker-list localhost:9092,localhost:9093 –topic

MultiBrokerTopic

bin/kafka-console-consumer.sh –bootstrap-server localhost:9092,localhost:9093 –from-

beginning –topic MultiBrokerTopic

Kill Leader, Broker switch from ID 93 to ID 0

© Materna GmbH 2018 www.materna.de

Connect

40

connect-file-sink.properties file=test.txt

topic=connect-test

connect-file-source.properties file=test.sink.txt

topics=connect-test

echo -e “hello\nworld” > test.txt

> bin/connect-standalone.sh config/connect-file-source.properties

config/connect-file-sink.properties

more test.sink.txt

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test

--from-beginning

{“schema”:{“type”:”string”, “optional”:false},“payload”:”hello”}

{“schema”:{“type”:”string”,“optional”:false},“payload”:”world”}

© Materna GmbH 2018 www.materna.de

Uses Cases for Apache Kafka (Confluent)

41

© Materna GmbH 2018 www.materna.de

Confluent Platform: open source & commercial

42

© Materna GmbH 2018 www.materna.de

Resume Kafka

43

Best of both worlds: distributed, highly scalable messaging & streaming

Extendable platform with lots of connectors, supported programming languages

Stream processing is a fast growing topic with promising solutions

Lack of standards

Basic authorization, security mechanism

Productions challenges (e.g. monitoring, debugging, sizing in the cloud, containers etc.)

Growing experience and best-Practicies

Professional support

Managed cloud solutions

© Materna GmbH 2018 www.materna.de

Further info's

44

The Dataflow Model: A Practical Approach to

Balancing Correctness, Latency, and Cost in Massive-

Scale, Unbounded, Out-of-Order Data Processing,

Tyler Akidau et al., VLDB 2015

© Materna GmbH 2018 www.materna.de 45

More questions?

© Materna GmbH 2018 www.materna.de© Materna GmbH 2018 www.materna.de

Kontakt

Materna GmbH

Frank Pientka

Tel. +49 1570 1128854

E-Mail: frank.pientka@materna.de

www.materna.de

46