Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt...

41
Der Funke springt über: Apache Spark in einem Raspberry Cluster

Transcript of Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt...

Page 1: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Der Funke springt über:

Apache Spark in einem Raspberry Cluster

Page 2: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Dozenten

Burkhard Hoppenstedtatr Software GmbH & Universität [email protected]

Nicolas KuhauptUniversität [email protected]

Page 3: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Gliederung

Spark

Architektur

Geschichte

Code Snippets

Cluster

Aufbau

Konfigurierung

Ergebnisse

Algorithmen-SkalierungClusterleistung

Page 4: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Spark Geschichte

MapReduce Forschung (Google)

MapReduce Paper

Hadoop sortiert 1.8TB in 47.9 h

Konferenz: Hadoop Summit

2010 Spark Paper

Apache Hadoop 1.0

Apache Spark top-level Projekt

Spark sortiert 100TB in 23 min, Hadoop in 72 min

2002

2004

2006

2008

2010

2012

2014

2014

Page 5: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Powered By

eBay Inc.

Spark

Amazon

IBM

Yahoo!

Page 6: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Dateiformate

Format Strukturiert

Textdateien Nein

JSON Halb

CSV Ja

SequenceFile Ja

Protocol buffers Ja

Object files Ja

Page 7: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Dateiformate

Format Strukturiert

Textdateien Nein

JSON Halb

CSV Ja

SequenceFile Ja

Protocol buffers Ja

Object files Ja

Sequence File Header

3 Byte (SEQ) + 1 Byte Version

Key Class Name

Value Class Name

isCompressed

IsBlockCompressed

Class Name

Meta Data

SyncMarker

Page 8: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Dateiformate

Format Strukturiert

Textdateien Nein

JSON Halb

CSV Ja

SequenceFile Ja

Protocol buffers Ja

Object files Ja

message Person {

required string name = 1;

required int32 id = 2;

optional string email = 3;

}

Person john = Person.newBuilder()

.setId(1234)

.setName("John Doe")

.setEmail("[email protected]")

.build();

output = new FileOutputStream(args[0]);

john.writeTo(output);

Person john;

fstream input(argv[1],

ios::in | ios::binary);

john.ParseFromIstream(&input);

id = john.id();

name = john.name();

email = john.email();

Page 9: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Dateiformate

Format Strukturiert

Textdateien Nein

JSON Halb

CSV Ja

SequenceFile Ja

Protocol buffers Ja

Object files Ja

Java Serialisierung

• Nicht standardisiert mit Hadoop-Output

• Langsam• Schnelle Verarbeitung

beliebiger Objekte

Page 10: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Word count

//Python Example

rdd = textFile(“…path…“)words = rdd.flatMap(lambda x: x.split(“ “))result = words.map(lambda x: (x,1)).reduceByKey(lambda x, y: x + y)

Page 11: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Amazon JSON

{'asin': '0078764343', 'description': 'Brand new sealed!', 'price': 37.98, 'imUrl': 'http://ecx.images-amazon.com/images/I/513h6dPbwLL._SY300_.jpg', 'related': {'also_bought': ['B000TI836G', 'B003Q53VZC', 'B00EFFW0HC', 'B003VWGBC0', 'B003O6G5TW', 'B0037LTTRO', 'B002I098JE', 'B008OQTS0U', 'B005EVEODY', 'B008B3AVNE', 'B000PE0HBS', 'B00354NAYG', 'B0050SYPV2', 'B00503E8S2', 'B0050SY77E', 'B0022TNO7S', 'B0056WJA30', 'B0023CBY4E', 'B002SRSQ72', 'B005EZ5GQY', 'B004XACA60', 'B00273Z9WM', 'B004HX1QFY', 'B002I0K50U'], 'bought_together': ['B002I098JE'], 'buy_after_viewing': ['B0050SY5BM', 'B000TI836G', 'B0037LTTRO', 'B002I098JE']}, 'salesRank': {'Video Games': 28655}, 'categories': [['Video Games', 'Xbox 360', 'Games']]}

Page 12: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

FP GrowthCustomer Also bought

John Doe Banana, Unicorn, Sword

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Banana, Unicorn, Steam Boat

Stephen Hawking Banana, Sword

Nyan Cat Unicorn, Sword

Cpt. Jack Sparrow Banana, Sword

Steve Jobs Banana, Unicorn, Sword, Razor

Mr. Nobody Banana, Unicorn, Sword

Banana: 6 | Unicorn: 7 | Sword : 7 | Steam Boat: 2 | Razor: 1

Page 13: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

FP Growth - resortCustomer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

Stephen Hawking Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. Jack Sparrow Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Banana: 6 | Unicorn: 7 | Sword : 7 | Steam Boat: 2 | Razor: 1

Page 14: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

{ }

Page 15: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:1

Banana:1

{ }

Sword:1

Page 16: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:2

SteamBoat:1

Banana:1

{ }

Sword:1

Page 17: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:3

SteamBoat:1

Banana:1

{ }

Sword:2

Page 18: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:4

SteamBoat:1

Banana:1

{ }

Sword:2

Banana:1

SteamBoat:1

Page 19: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:4

SteamBoat:1

Banana:1

{ }

Sword:2

Banana:1

Banana:1

Sword:1

SteamBoat:1

Page 20: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:5

SteamBoat:1

Banana:1

{ }

Sword:3

Banana:1

Banana:1

Sword:1

SteamBoat:1

Page 21: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:5

SteamBoat:1

Banana:1

{ }

Sword:3

Banana:1

Banana:2

Sword:2

SteamBoat:1

Page 22: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:6

SteamBoat:1

Razor:1

{ }

Sword:4

Banana:1

Banana:2

Sword:2

Banana:2

SteamBoat:1

Page 23: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 24: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 25: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Frequent Patterns

Banana

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 26: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3}

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 27: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} , {Unicorn, Banana:1}

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 28: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} , {Unicorn, Banana:1} , {Sword, Banana: 2}

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 29: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} , {Unicorn, Banana:1} , {Sword, Banana: 2}

Page 30: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} , {Unicorn, Banana:1} , {Sword, Banana: 2}, {Unicorn, Banana: 4}

Page 31: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} , {Unicorn, Banana:1} , {Sword, Banana: 2}, {Unicorn, Banana: 4}

Sword {Unicorn, Sword: 5}

Steam boat {Unicorn, Steam Boat: 1}, {Unicorn, Banana, Steam Boat: 1}, {Unicorn, Steam Boat : 2}

Razor {Unicorn, Sword, Banana, Razor:1}

Page 32: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} {Sword, Banana: 2}, {Unicorn, Banana: 4}

Sword {Unicorn, Sword: 5}

Steam boat {Unicorn, Steam Boat : 2}

Razor

Customer Also bought

John Doe Unicorn, Sword, Banana

Mickey Mouse Unicorn, Steam Boat

Donald Trump Unicorn, Sword

John Cena Unicorn, Banana, Steam Boat

StephenHawking

Sword, Banana,

Nyan Cat Unicorn, Sword

Cpt. JackSparrow

Sword, Banana

Steve Jobs Unicorn, Sword, Banana, Razor

Mr. Nobody Unicorn, Sword, Banana

Frequent Patterns

Banana {Unicorn, Sword, Banana: 3} , {Unicorn, Banana:1} , {Sword, Banana: 2}, {Unicorn, Banana: 4}

Sword {Unicorn, Sword: 5}

Steam boat {Unicorn, Steam Boat: 1}, {Unicorn, Banana, Steam Boat: 1}, {Unicorn, Steam Boat : 2}

Razor {Unicorn, Sword, Banana, Razor:1}

min support:2

Unicorn:7

SteamBoat:1

Razor:1

{ }

Sword:5

Banana:1

Banana:2

Sword:2

Banana:3

SteamBoat:1

Page 33: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

10

%30

%50

%70

%90

%

Scaling of FP Growth

Page 34: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

10%

40%

70%

100%

Scaling of KMeans

Page 35: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Customer Money Spend(€)

Time (h)

John Doe 2 4

Mickey Mouse 4 5

Donald Trump 10 10

John Cena 1 10

Stephen Hawking 0 2

Nyan Cat 2 3

Cpt. Jack Sparrow 2 6

Steve Jobs 5 8

Mr. Nobody 7 2

Page 36: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014
Page 37: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014
Page 38: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014
Page 39: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

„John Cena and Donald Trump shop similar to Steve Jobs“

Page 40: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Fazit

Page 41: Der Funke springt über: Apache Spark in einem Raspberry ... · Apache Spark top-level Projekt Spark sortiert 100TB in 23 min, Hadoop in 72 min 2002 2004 2006 2008 2010 2012 2014

Quellen

• Sequence Files Header: http://hadooptutorial.info/hadoop-sequence-files-example/ [30.01.2017]

• Protocol Buffers: https://developers.google.com/protocol-buffers/ [30.01.2017]

• Raspberry Pi 3: https://www.raspberrypi.org/products/raspberry-pi-3-model-b/ [31.01.2017]