Unified Log Processing Architecture

54
BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN 2014 © Trivadis Einheitlicher Umgang mit Ereignisströmen Die Unified Log Processing Architecture Guido Schmutz Trivadis AG August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture 1

description

Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. In this session an architecture with a central log structured storage is presented where anybody can store and subscribe for events. This can be implemented using frameworks such as Kafka, Storm, Samza and Spark Streaming.

Transcript of Unified Log Processing Architecture

Page 1: Unified Log Processing Architecture

2014 © Trivadis

BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

2014 © Trivadis

Einheitlicher Umgang mit Ereignisströmen Die Unified Log Processing Architecture

Guido Schmutz

Trivadis AG

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

1

Page 2: Unified Log Processing Architecture

2014 © Trivadis

Guido Schmutz

•  Working for Trivadis for more than 17 years

•  Oracle ACE Director for Fusion Middleware and SOA •  Co-Author of different books •  Consultant, Trainer Software Architect for Java, Oracle, SOA and

Big Data / Fast Data •  Member of Trivadis Architecture Board •  Technology Manager @ Trivadis

•  More than 25 years of software development experience

•  Contact: [email protected] •  Blog: http://guidoschmutz.wordpress.com •  Twitter: gschmutz

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

2

Page 3: Unified Log Processing Architecture

2014 © Trivadis

Agenda

1.  Introduction/Motivation

2.  How to Design Stream Processing Solutions

3.  Implementing the Enterprise Event Bus (Unified Log)

4.  Implementing Stream Processing

5.  Implementing Event Collecting

6.  Architectural Patterns

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

3

Page 4: Unified Log Processing Architecture

2014 © Trivadis

Big Data Definition (4 Vs)

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

+ Time to action ? – Big Data + Event Processing = Fast Data

Characteristics of Big Data: Its Volume, Velocity and Variety in combination

4

Page 5: Unified Log Processing Architecture

2014 © Trivadis

The world is changing …

The model of Generating/Consuming Data has changed ….

Old Model: few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

5

Page 6: Unified Log Processing Architecture

2014 © Trivadis

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

6

Page 7: Unified Log Processing Architecture

2014 © Trivadis

Internet Of Things – Sensors are/will be everywhere

There are more devices tapping into the internet than people on earth

How do we prepare our systems/architecture for the future?

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

Source: Cisco Source: The Economist 7

Page 8: Unified Log Processing Architecture

2014 © Trivadis

The world is changing …

Explosion of specialized data systems that have become popular in the past years Specialized systems exists for

•  Search •  OLAP •  Simple Online Storage •  Batch Processing •  Graph Analysis •  Recommendation •  …

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

8

Page 9: Unified Log Processing Architecture

2014 © Trivadis

Event vs. Request Driven

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Introduction

9

Page 10: Unified Log Processing Architecture

2014 © Trivadis

What is Stream Processing?

•  Infrastructure for continuous data processing

•  Computational model can be as general as MapReduce but with the ability to produce low-latency results

•  Data collected continuously is naturally processed continuously

•  aka. Event Processing / Complex Event Processing (CEP)

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

10

Page 11: Unified Log Processing Architecture

2014 © Trivadis

Why Stream Processing?

Response latency

Stream Processing

Milliseconds to minutes

RPC

Synchronous Later. Possibly much later.

Page 12: Unified Log Processing Architecture

2014 © Trivadis

Agenda

1.  Introduction/Motivation

2.  How to Design Stream Processing Solutions

3.  Implementing the Enterprise Event Bus (Unified Log)

4.  Implementing Stream Processing

5.  Implementing Event Collecting

6.  Architectural Patterns

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

12

Page 13: Unified Log Processing Architecture

2014 © Trivadis

How to design a Stream Processing System?

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

13

Event Stream

event Collecting

event

Persist (Queue)

Event Stream

event Collecting

event

Processing

event Processing

result

result

Event Stream

event Collecting/ Processing

result

Page 14: Unified Log Processing Architecture

2014 © Trivadis

It usually starts very simple … just one data pipeline

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

14

Event Stream Processor event Collector

Page 15: Unified Log Processing Architecture

2014 © Trivadis

New Event Stream sources are added …

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

15

Event Stream Processor

2nd Event Stream

3rd Event Stream

nth Event Stream

event

event

event

event

Collector

2nd Collector

3rd Collector

Nth Collector

Page 16: Unified Log Processing Architecture

2014 © Trivadis

New Processors are interested in the events …

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

16

Event Stream Processor

2nd Event Stream

3rd Event Stream

nth Event Stream

2nd Processor event

event

event

event

Collector

2nd Collector

3rd Collector

Nth Collector

Page 17: Unified Log Processing Architecture

2014 © Trivadis

… and the solution becomes the problem

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

17

Event Stream Processor

2nd Event Stream

3rd Event Stream

nth Event Stream

2nd Processor

3rd Processor

Nth Processor

event

event

event

event

Collector

2nd Collector

3rd Collector

Nth Collector

Page 18: Unified Log Processing Architecture

2014 © Trivadis

… and the solution becomes the problem

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

18

Event Stream Processor

2nd Event Stream

3rd Event Stream

nth Event Stream

2nd Processor

3rd Processor

Nth Processor

event

event

event

event

Collector

2nd Collector

3rd Collector

Nth Collector

Page 19: Unified Log Processing Architecture

2014 © Trivadis

… and the solution becomes the problem

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

19

New Customers

Operational Logs

Click Stream

Meter Readings

event

event

event

event

CDC Collector

Log Collector

Click Stream Collector

Senor Collector

Hadoop/Data Warehouse

Recommendation System

Log Search

Fraud Detection

Page 20: Unified Log Processing Architecture

2014 © Trivadis

Decouple event streams from consumers

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

20

„Unified Log“

Remember Enterprise Service Bus (ESB) ?

Enterprise Event Bus Event Stream Processor Event Stream Source

New Customers

Operational Logs

Click Stream

Meter Readings

CDC Collector

Log Collector

Click Stream Collector

Senor Collector

Hadoop/Data Warehouse

Recommendation System

Log Search

Fraud Detection

What do we mean by Unified Log?

Where do we do transformations ?

Page 21: Unified Log Processing Architecture

2014 © Trivadis

Unified Log

That’s what most people think about logs 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-admin/images/date-button.gif HTTP/1.1" 200 111 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 200 13593 137.229.78.245 - - [02/Jul/2012:13:22:26 -0800] "GET /wp-includes/js/tinymce/wp-tinymce.php?c=1&ver=349-20805 HTTP/1.1" 200 101114 137.229.78.245 - - [02/Jul/2012:13:22:28 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30747 137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "POST /wp-admin/post.php HTTP/1.1" 302 - 137.229.78.245 - - [02/Jul/2012:13:22:40 -0800] "GET /wp-admin/post.php?post=387&action=edit&message=1 HTTP/1.1" 200 73160 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/css/editor.css?ver=3.4.1 HTTP/1.1" 304 - 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "GET /wp-includes/js/tinymce/langs/wp-langs-en.js?ver=349-20805 HTTP/1.1" 304 - 137.229.78.245 - - [02/Jul/2012:13:22:41 -0800] "POST /wp-admin/admin-ajax.php HTTP/1.1" 200 30809

But this is what we mean here by Log

•  a structured log (records are numbered beginning with 0 based on order they are written)

•  aka. commit log or journal

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

21

0 1 2 3 4 5 6 7 8 9 10 11

1st record Next record written

Page 22: Unified Log Processing Architecture

2014 © Trivadis

Central Unified Log for (real-time) subscription

Take all the organization’s data and put it into a central log for subscription

Properties of the Unified Log:

•  Unified: “Enterprise”, single deployment

•  Append-Only: events are appended, no update in place => immutable

•  Ordered: each event has an offset, which is unique within a shard

•  Fast: should be able to handle thousands of messages / sec

•  Distributed: lives on a cluster of machines

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

22

0 1 2 3 4 5 6 7 8 9 10 11

reads

writes

Collector

Consumer System A (time = 6)

Consumer System B

(time = 10)

reads

Page 23: Unified Log Processing Architecture

2014 © Trivadis

Data Flow Graphs using Unified Log

•  Stream processing allows for computing feeds off of other feeds

•  Derived feeds are no different than original feeds they are computed off

•  Single deployment of “Unified Log” but logically different feeds

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

23

Meter Readings Collector

Enrich / Transform

Aggregate by Minute

Raw MeterReadings

Meter with Customer

Meter by Customer by Minute

Customer Aggregate by Minute

Meter by Minute

Page 24: Unified Log Processing Architecture

2014 © Trivadis

Agenda

1.  Introduction/Motivation

2.  How to Design Stream Processing Solutions

3.  Implementing the Enterprise Event Bus (Unified Log)

4.  Implementing Stream Processing

5.  Implementing Event Collecting

6.  Architectural Patterns

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

24

Page 25: Unified Log Processing Architecture

2014 © Trivadis

Apache Kafka - Overview

•  A distributed publish-subscribe messaging system

•  Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …)

•  Initially developed at LinkedIn, now part of Apache

•  Does not follow JMS Standards and does not use JMS API

•  Kafka maintains feeds of messages in topics

30.04.2014 Event Processing in Action

25

Kafka Cluster

Consumer Consumer Consumer

Producer Producer Producer

0 1 2 3 4 5 6 7 8 9 1 0

1 1

1 2

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9 1 0

1 1

1 2

Anatomy of a topic:

Partition 0

Partition 1

Partition 2

Writes

old new

Page 26: Unified Log Processing Architecture

2014 © Trivadis

Apache Kafka - Motivation

LinkedIn’s motivation for Kafka was: §  “A unified platform for handling all the real-time data feeds a large company

might have.”

Must haves §  High throughput to support high volume event feeds. §  Support real-time processing of these feeds to create new, derived feeds. §  Support large data backlogs to handle periodic ingestion from offline

systems. §  Support low-latency delivery to handle more traditional messaging use

cases. §  Guarantee fault-tolerance in the presence of machine failures.

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

26

Page 27: Unified Log Processing Architecture

2014 © Trivadis

Apache Kafka - Performance

Kafka at LinkedIn

Up to 2 million writes/sec on 3 cheap machines §  Using 3 producers on 3 different machines

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

27

10+ billion writes per day

172k messages per second

(average)

55+ billion messages per day

to real-time consumers

http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

Page 28: Unified Log Processing Architecture

2014 © Trivadis

Apache Kafka - Partition offsets

Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset

•  Consumers track their pointers via (offset, partition, topic) tuples

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

28

Consumer group C1

Page 29: Unified Log Processing Architecture

2014 © Trivadis

Apache Kafka – two Options for Log Cleanup

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

29

1.  Retaining a window of data •  Ideal for event data •  Window can be defined in time (days) or space (GBs) – defaults to 1 week

2.  Retain a complete log (log compaction) •  Ideal for keyed data •  Keep a space-efficient complete

log of changes •  Log compaction runs in the

background •  Ensures that always at least the

last known value for each message key within the log of data is retained

Page 30: Unified Log Processing Architecture

2014 © Trivadis

Unified Log Alternatives

•  Amazon Kinesis (http://aws.amazon.com/kinesis/)

•  Redis Pub/Sub (http://redis.io/topics/pubsub)

•  Kestrel (http://robey.github.io/kestrel/)

•  ZeroMQ (http://zeromq.org/)

•  RabbitMQ (http://www.rabbitmq.com/)

•  Oracle GoldenGate (http://bit.ly/g-gate)

•  JMS compliant Server •  Apache ActiveMQ (http://activemq.apache.org/) •  Weblogic JMS (

http://www.oracle.com/technetwork/middleware/weblogic/overview/index.html) •  IBM Websphere MQ (http://www-03.ibm.com/software/products/de/ibm-mq) •  …

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

30

Page 31: Unified Log Processing Architecture

2014 © Trivadis

Agenda

1.  Introduction/Motivation

2.  How to Design Stream Processing Solutions

3.  Implementing the Enterprise Event Bus (Unified Log)

4.  Implementing Stream Processing

5.  Implementing Event Collecting

6.  Architectural Patterns

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

31

Page 32: Unified Log Processing Architecture

2014 © Trivadis

Apache Storm

A platform for doing analysis on streams of data as they come in, so you can react to data as it happens. •  A highly distributed real-time computation system

•  Provides general primitives to do real-time computation

•  To simplify working with queues & workers

•  scalable and fault-tolerant

•  complementary to Hadoop

Originated at Backtype, acquired by Twitter in 2011

Open Sourced late 2011

Part of Apache Incubator since September 2013

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm

32

Page 33: Unified Log Processing Architecture

2014 © Trivadis

Apache Storm: Simplifying dealing with queue & workers

Scaling is painful – queue partitioning & worker deploy

Operational overhead – worker failures & queue backups

No guarantees on data processing

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm

33

Page 34: Unified Log Processing Architecture

2014 © Trivadis

Apache Storm – Core concepts

Tuple •  Core data structure in storm •  Immutable Set of Key/value pairs •  You can think of Storm tuples as events •  Values must be serializable

Stream •  Key abstraction of Storm •  an unbounded sequence of tuples that can be processed in parallel by Storm •  Each stream is given ID and bolts can produce and consume tuples from

these streams on the basis of their ID •  Each stream also has an associated schema of the tuples that will flow

through it

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm

34

T T T T T T T T

Page 35: Unified Log Processing Architecture

2014 © Trivadis

Apache Storm – Core concepts

Topology •  Wires data and functions via a DAG (directed acyclic graph) •  Executes on many machines similar to a MR job in Hadoop

Spout •  Source of data streams (tuples) •  can be run in “reliable” and “unreliable” mode

Bolt •  Consumes 1+ streams and potentially

produces new streams •  Complex operations often require multiple

steps and thus multiple bolts •  Calculate, Filter, Aggregate, Join, Talk to

database

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm

35

Spout

Spout

Bolt

Bolt

Bolt

Bolt

Source of Stream B

Subscribes: A Emits: C

Subscribes: A Emits: D

Subscribes: A & B Emits: -

Subscribes: C & D Emits: -

Page 36: Unified Log Processing Architecture

2014 © Trivadis

Apache Storm – Core concepts

Each Spout or Bolt are running N instances in parallel

August 2014 CAS Big Data - FH Bern | Stream- and Event-Processing | Processing Event Streams - Apache Storm

36

Split Text

Text Spout

WordCount

Split Text WordCount

Shuffle Fields

Shuffle grouping is random grouping

Fields grouping is grouped by value, such that equal value results in equal task

All grouping replicates to all tasks

Global grouping makes all tuples go to one task

None grouping makes bolt run in the same thread as bolt/spout it subscribes to

Direct grouping producer (task that emits) controls which consumer will receive

Local or Shuffle grouping

similar to the shuffle grouping but will shuffle tuples among bolt tasks running in the same worker process, if any. Falls back to shuffle grouping behavior.

Report Global

Page 37: Unified Log Processing Architecture

2014 © Trivadis

Apache Samza

•  http://samza.incubator.apache.org/

•  Similar to Apache Storm, Apache Samza is a distributed real time computation framework for processing streaming data.

•  Apache Samza is built on top of Apache Kafka and Apache Yarn. Samza uses Kafka as its messaging layer and Yarn for managing the cluster of nodes with Samza processes.

•  Samza is scalable, fault tolerant and provides guaranteed message processing.

•  Samza was originally developed at LinkedIn and was donated to ASF in 2013

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

37

Page 38: Unified Log Processing Architecture

2014 © Trivadis

Apache Samza – Core Concepts

Message •  Can be appended to a stream or read from a stream •  Can optionally have an associated key which is used for

partitioning Streams •  composed of immutable •  Streams can have any number of consumers Jobs •  Code that performs a logical transformation on a set of

input streams to append output messages to set of output streams

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

38

Page 39: Unified Log Processing Architecture

2014 © Trivadis

Apache Samza – Core Concepts

Partition •  stream is broken into 1 or more partitions •  partition is a totally ordered •  Message is appended to only one of the

partitions based on the key chosen by the writer Tasks •  Job is scaled by breaking it into multiple tasks •  Task is unit of parallelism of the job •  Each task consumed data from one partition •  Scheduler assigns each task to a machine •  Number of tasks in a job is determined by the number of

input partitions

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

39

Page 40: Unified Log Processing Architecture

2014 © Trivadis

Apache Samza – Core Concepts

Dataflows •  Multiple jobs can be composed into

a dataflow graph •  Composition is purely through the streams

the jobs take as input and output •  Jobs are otherwise totally decoupled •  Graphs are often acyclic, but they don’t have to be Containers •  Partitions and tasks are logical units of parallelism •  Containers are the unit of physical parallelism, essentially

a Unix process •  Each container runs one or more tasks •  Number of containers is specified by user at run time

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

40

Page 41: Unified Log Processing Architecture

2014 © Trivadis

Apache Samza – State Management

Samza manages snapshotting and restoration of a stream processor’s state

When processor is restarted, Samza restores its state to a consistent snapshot

Common use cases for stateful processing

•  Windowed aggregation

•  Table-table join

•  Stream-table join

•  Stream-Stream join

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

41

Page 42: Unified Log Processing Architecture

2014 © Trivadis

Apache Samza - Checkpointing

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

42

Page 43: Unified Log Processing Architecture

2014 © Trivadis

Stream Processing Alternatives

•  Apache S4 (http://incubator.apache.org/s4/)

•  Apache Spark Streaming (http://spark.apache.org/streaming/)

•  Google MillWheel (http://research.google.com/pubs/pub41378.html)

•  Akka Streams (http://akka.io)

•  Complex Event Processing §  Esper (http://esper.codehaus.org/) §  WSO2 Complex Event Processor (http://wso2.com/products/complex-event-processor/) §  Oracle Event Processing (

http://www.oracle.com/technetwork/middleware/complex-event-processing/overview/index.html)

§  TIBCO BusinessEvents & TIBCO StreamBase (http://www.tibco.com/products/event-processing/complex-event-processing)

§  IBM InfoSphere (http://www-01.ibm.com/software/data/infosphere/) §  Microsoft StreamInsight (http://msdn.microsoft.com/de-ch/sqlserver/ee476990.aspx) §  …

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

43

Page 44: Unified Log Processing Architecture

2014 © Trivadis

Agenda

1.  Introduction/Motivation

2.  How to Design Stream Processing Solutions

3.  Implementing the Enterprise Event Bus (Unified Log)

4.  Implementing Stream Processing

5.  Implementing Event Collecting

6.  Architectural Patterns

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

44

Page 45: Unified Log Processing Architecture

2014 © Trivadis

Apache Flume

A distributed data collection service that gets flows of data (like logs) from their source and aggregates them to where they have to be processed

Sources: files, syslog, avro, …

Sinks: HDFS files, HBase, Kafka, …

Goals •  Reliability •  Scalability •  Extensibility •  Manageability

April 2014 CAS Big Data - FH Bern | Hadoop Ecosystem | Moving Data In and Out of Hadoop

45

Page 46: Unified Log Processing Architecture

2014 © Trivadis

Apache Flume Components

Component Function Agent The JVM running Flume, one per machine

Runs many sources and sinks

Source Produces data in the form of events Runs in a separate thread.

Sink Receives events from a channel Runs in a separate thread.

Channel Connects sources to sinks (like a queue) Implements the reliability semantics.

Event A single datum; a log record, an avro object, etc.

April 2014 CAS Big Data - FH Bern | Hadoop Ecosystem | Moving Data into Hadoop

46

Page 47: Unified Log Processing Architecture

2014 © Trivadis

Event Collecting Alternatives

•  Logstash (http://logstash.net/)

•  Fluentd (http://www.fluentd.org/)

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

47

Page 48: Unified Log Processing Architecture

2014 © Trivadis

Agenda

1.  Introduction/Motivation

2.  How to Design Stream Processing Solutions

3.  Implementing the Enterprise Event Bus (Unified Log)

4.  Implementing Stream Processing

5.  Implementing Event Collecting

6.  Architectural Patterns

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

48

Page 49: Unified Log Processing Architecture

2014 © Trivadis

Architectural Pattern: Standalone Event Stream Processing

30.04.2014 Event Processing in Action

49 49

Event Processing (ESP / CEP)

State Store / Event Store En

terp

rise

Even

t Bus

(In

gres

s)

Even

t Cl

oud

Internet of Things

Social Media Streams

Ente

rpris

e Ev

ent B

us

49

Analytical Applications

DB

Ente

rpris

e Se

rvic

e Bu

s

Business Rule Management

System Rules

Event Processing

Result Store

Page 50: Unified Log Processing Architecture

2014 © Trivadis

Hadoop Big Data Infrastructure

Architectural Pattern: Event Stream Processing as part of Lambda Architecture

30.04.2014 Event Processing in Action

50 50

Event Processing (ESP / CEP)

State Store / Event Store

Ente

rpris

e Ev

ent B

us

(Ingr

ess)

Even

t Cl

oud

Internet of Things

Social Media Streams

Ente

rpris

e Ev

ent B

us

50

Analytical Applications

DB

Ente

rpris

e Se

rvic

e Bu

s

Event Processing

Map/Reduce HDFS Result

Store

Result Store

Page 51: Unified Log Processing Architecture

2014 © Trivadis

Hadoop Big Data Infrastructure

Architectural Pattern: Event Stream Processing as part of Kappa Architecture

30.04.2014 Event Processing in Action

51 51

Event Processing (ESP / CEP)

State Store / Event Store

Ente

rpris

e Ev

ent B

us

(Ingr

ess)

Even

t Cl

oud

Internet of Things

Social Media Streams

51

Analytical Applications

DB Ente

rpris

e Se

rvic

e Bu

s

Event Processing

Replay HDFS

Result Store

Page 52: Unified Log Processing Architecture

2014 © Trivadis

Even

t Cl

oud

Event Processing in modern architecture

30.04.2014 Event Processing in Action

52

Enterprise Applications

WS

REST

JMS

RDMBS

Loca

l ESB

External Cloud Service Providers

Ente

rpris

e Se

rvic

e Bu

s (E

SB)

EJB

Event Processing (ESP / CEP)

State Store / Event Store

BPM and SOA Platform

Event

REST

Business Logic/Rules

NoSQL

Analytical Applications

Data Analytics

Internet of Things

Event Processing

52

Mobile Apps

DB

Rich (Web) Client Apps

DB

Social Media Streams

Ente

rpris

e Ev

ent B

us

(Ingr

ess)

Ente

rpris

e Ev

ent B

us

Visualization

Biz Logic Rules

WS

Event

Business RuleManagement

System Rules

Processes ACM

HumanWF

Page 53: Unified Log Processing Architecture

2014 © Trivadis

Weitere Informationen...

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

53

INFOBOX – Lesen und Löschen •  Folie wenn auf weitere Informationen

verwiesen werden soll, also z.B. Bücher, Websiten, etc.

Page 54: Unified Log Processing Architecture

2014 © Trivadis

BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

Fragen und Antworten...

2013 © Trivadis

August 2014 Einheitlicher Umgang mit Ereignisströmen - Unified Log Processing Architecture

INFOBOX – Lesen und Löschen •  Die Schlussfolie steht in zwei Varianten

zur Verfügung, einmal für die Kontaktdaten eines Referenten, einmal in der Variante für zwei oder mehr Referenten

•  Name, Titel und Location jeweils untereinander in eine Zeile (Shift+Return)

•  Die Idee ist das diese Folie als letzte Folie (auch für Fragen und Antworten) am Ende der Präsentation lange stehen bleibt, somit haben die Zuhörer die Möglichkeit die Kontaktdaten aufzuschreiben J