Aggregation and Degradation in JetStream: Streaming ...mfreed/docs/jet... · 0 10 20 30 40 50 60 70...
Transcript of Aggregation and Degradation in JetStream: Streaming ...mfreed/docs/jet... · 0 10 20 30 40 50 60 70...
Ariel Rabkin Princeton University
Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area
Work done with Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman
Today’s Analytics Architectures
2
� Backhaul is inefficient and inflexible
MillWheel (Google) Storm
Tomorrow’s Architecture: JetStream
3
� Backhaul is inefficient and inflexible � Goal: optimize use of WAN links by
exposing them to streaming system.
JetStream
Backhaul is Intrinsically Inefficient
4
Time [two days]
Ban
dwid
th
Available
Buyer’s remorse: wasted bandwidth
Analyst’s remorse: system overload or missing data
Needed for backhaul
Stream Processing Basics
5
Filtering (count > 100) Sampling (drop 90% of data) Image Compression
Quantiles (95th percentile) Query stored data
Site A
Some Operators in JetStream:
Stream Operators
Inpu
t Dat
a
Stream Operators
Inpu
t Dat
a
Stream Operators
Stream Operators
Site B
Stream Operator
Site C
The JetStream System
What: Streaming with aggregation and degradation as first-class primitives
Where: Storage and processing at edge
Why: Maximize goodput using aggregation and degradation
How: Data cubes and feedback control
6
An Example Query
7
How popular is every URL?
Requests Requests CDN
Requests
Requests Requests CDN
Requests
Mechanism 1: Storage with Aggregation
8
Requests Requests CDN
Requests
Requests Requests CDN
Requests Every minute, compute request counts by URL
Local Aggregation and Storage
Local Aggregation and Storage
Mechanism 2: Adaptive Degradation
9
Requests Requests CDN
Requests
Requests Requests CDN
Requests Every minute, compute request counts by URL
Local Aggregation and Storage
Local Aggregation and Storage
Adjustable Filtering
Adjustable Filtering
Requirements for Storage Abstraction
10
� Update-able (locally and incrementally)
Data Data Merged Representation
+ =
Data Data
� Merge-able (without accuracy penalty)
� Data size is reducible (with predictable accuracy cost)
Stored Data += Data
The Data Cube Model
Aggregation used for: � Updates � Roll-ups � Merging cubes � Summarizing cubes
11
Counts by URL 12:00 12:01 12:02
www.mysite.com/a 3 5 0
www.mysite.com/b 0 2 0
www.yoursite.com 5 4 …
www.her-site.com 8 12 …
Cube: A multidimensional array, indexed by a set of dimensions, whose cells hold aggregates.
Cubes have aggregation function: Agg( , )à
Cubes can be “Rolled Up”
12
Counts by URL 12:00 12:01 12:02
www.mysite.com/a 3 5 0
www.mysite.com/b 0 2 0
www.yoursite.com 5 4 …
www.her-site.com 8 12 …
Cube: A multidimensional array, indexed by a set of dimensions, whose cells hold aggregates.
Counts by URL *
www.mysite.com/a 8
www.mysite.com/b 2
www.yoursite.com 9
www.her-site.com 20
Counts by URL 12:00 12:01 12:02 * 16 23 …
Cubes Unify Storage and Aggregation
13
Stored Data Update
Update
Update
Update sent downstream
Standing Query
One-off query
Feedback control
Degradation: The Big Picture
14
Local Data Dataflow
Operators Summarized or Approximated
Data
� Level of degradation auto-tuned to match bandwidth. � Challenge: Supporting mergeability and flexible policies
Network Dataflow Operators
Mergeability Imposes Constraints
� Insight: Degradation may be discontinuous
01 - 10 11 - 20 Every 10 21 - 30
01 - 30 Every 30??
01 - 05 06 - 10 11 - 15 16 - 20 21 - 25 Every 5 26 - 30
01 - 06 07 - 12 13 - 18 19 - 24 Every 6 25 - 30
15
??????
02 - 06 07 - 11 12 - 16 17 - 21 22 - 26 Every 5 27 - 31
There Are Many Ways to Degrade Data
16
� Can coarsen a dimension
� Can drop low-rank values
5s minute 5 m hour dayAggregation time period
1
2
4
8
16
32
64
128
256
Savi
ngs
from
Agg
rega
tion
Domains
Coarsening Does Not Always Help
17
5s minute 5 m hour dayAggregation time period
1
2
4
8
16
32
64
128
256
Savi
ngs
from
Agg
rega
tion
DomainsURLs
Degradations Have Trade-offs
18
Name Fixed BW Savings
Fixed Accuracy cost
Parameter
Dim. Coarsening Usually no Yes Dimension Scale
Drop values (locally)
Yes No Cut-off
Drop values (globally)
No, multi-round protocol
Yes Cut-off
Audiovisual downsampling
Yes Yes Sample rate
Histogram Coarsening
Yes
Yes
Number of Buckets
A Simple Idea that Does Not Work
� We have sensors that report congestion…. � Have operators read sensor and adjust themselves?
19
Coarsening Operator
Incoming data Network Sampled
Data
Sending 4x too much
A Simple Idea that Does Not Work
� We have sensors that report congestion…. � Have operators read sensor and adjust themselves?
20
Coarsening Operator
Incoming data Network Sampled
Data
Sending 4x too much
Increase aggregation period up to 10 sec. If
insufficient, use sampling
Challenge: Composite Policies
� Chaos if two operators are simultaneously responding to the same sensor
21
Coarsening Operator
Incoming data Network
Sampling Operator
Sending 4x too much
Interfacing with Operators
22
Shrinking data by 50% Possible levels:
[0%, 50%, 75%, 95%, …]
Go to level 75%
Coarsening Operator
Incoming data Network
Sampling Operator
Controller
Sending 4x too much
Experimental Setup
23
80 nodes on VICCI testbed at three sites (Seattle, Atlanta, and Germany)
Policy: Drop data if insufficient BW
Princeton
0 20 40 60 80 100 120 140Experiment time (minutes)
0
200
400
600
800BW
(Mbi
ts/s
ec)
Without Degradation
24
Drop BW
0 20 40 60 80 100 120 140Elapsed time (minutes)
0
200
400
600
800
1000
Late
ncy
(sec
)
0 20 40 60 80 100 120 140Elapsed time (minutes)
0
200
400
600
800
1000
Late
ncy
(sec
)
0 20 40 60 80 100 120 140Elapsed time (minutes)
0
200
400
600
800
1000
Late
ncy
(sec
)
Median Latency
95th percentile latency
Maximum latency
0 10 20 30 40 50 60 70 80 90Experiment time (minutes)
0
100
200
300
400BW
(Mbi
ts/s
ec)
Degradation Keeps Latency Bounded
25
Bandwidth Shaping
0 10 20 30 40 50 60 70 80 90Elapsed time (minutes)
0
5
10
15
20
Late
ncy
(sec
)
0 10 20 30 40 50 60 70 80 90Elapsed time (minutes)
0
5
10
15
20
Late
ncy
(sec
)
Median Latency
95th percentile latency
0 10 20 30 40 50 60 70 80 90Elapsed time (minutes)
0
5
10
15
20
25
30
35
40
Late
ncy
(sec
)Showing maximum latencies
26
Median Latency
95th percentile latency
Maximum Latency
Programming Ease
27
Scenario Lines of code Slow requests 5 Requests by URL 5
Bandwidth by node 15 Bad referrers 16 Latency and size quantiles 25 Success by domain 30 Top 10 domains by period 40
Big Requests 97
Conclusions and Future Work
� Useful to embed aggregation and degradation abstractions in streaming systems.
� Aggregation can be unified with storage.
� System must accommodate degradation semantics.
� Open questions: � How to guide users to the right degradation policy? � How to embed abstractions in higher-level language?
28