Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...
Transcript of Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...
1
Sparse Matrices for High-Performance Graph Analytics
John R. Gilbert University of California, Santa Barbara Chesapeake Large-Scale Analytics Conference October 17, 2013
Support: Intel, Microsoft, DOE Office of Science, NSF
2
Thanks …
Aydin Buluc (LBL), Kevin Deweese (UCSB), Erika Duriakova (Dublin), Armando Fox (UCB),
Shoaib Kamil (MIT), Jeremy Kepner (MIT), Adam Lugowski (UCSB), Tim Mattson (Intel),
Lenny Oliker (LBL), Carey Priebe (JHU), Steve Reinhardt (YarcData), Lijie Ren (Google), Eric Robinson (Lincoln), Viral Shah (UIDAI),
Veronika Strnadova (UCSB), Yun Teng (UCSB), Joshua Vogelstein (Duke), Drew Waranis (UCSB),
Sam Williams (LBL)
3
Outline
• Motivation
• Sparse matrices for graph algorithms
• CombBLAS: sparse arrays and graphs on parallel machines
• KDT: attributed semantic graphs in a high-level language
• Standards for graph algorithm primitives
4
A few biological graph analysis problems
• Connective abnormalities in schizophrenia [van den Heuvel et al.] – Problem: understand disease from anatomical brain imaging – Tools: betweenness centrality, shortest path length – Results: global statistics on connection graph correlate w/ diagnosis
• Genomics for biofuels [Strnadova et al.] – Problem: scale to millions of markers times thousands of individuals – Tools: min spanning tree, customized clustering – Results: using much more data leads to much better genomic maps
• Alignment and matching of brain scans [Vogelstein et al.] – Problem: match corresponding functional regions across individuals – Tools: graph partitioning, clustering, and more. . . – Results: in progress
5
Computers
Continuous physical modeling
Linear algebra
Discrete structure analysis
Graph theory
Computers
The middleware of scientific computing
6
Top 500 List (June 2013)
= x P A L U
Top500 Benchmark: Solve a large system of linear equations
by Gaussian elimination
7
Graph 500 List (June 2013)
Graph500 Benchmark:
Breadth-first search in a large
power-law graph
1 2
3
4 7
6
5
8
Floating-Point vs. Graphs, June 2013
= x P A L U 1 2
3
4 7
6
5
33.8 Peta / 15.3 Tera is about 2200.
33.8 Petaflops 15.3 Terateps
9
Floating-Point vs. Graphs, June 2013
= x P A L U 1 2
3
4 7
6
5
Jun 2013: 33.8 Peta / 15.3 Tera ~ 2,200 Nov 2010: 2.5 Peta / 6.6 Giga ~ 380,000
15.3 Terateps 33.8 Petaflops
10
• By analogy to numerical scientific computing. . .
• What should the combinatorial BLAS look like?
The challenge of the software stack
C = A*B
y = A*x
µ = xT y
Basic Linear Algebra Subroutines (BLAS): Ops/Sec vs. Matrix Size
11
Outline
• Motivation
• Sparse matrices for graph algorithms
• CombBLAS: sparse arrays and graphs on parallel machines
• KDT: attributed semantic graphs in a high-level language
• Standards for graph algorithm primitives
12
Multiple-source breadth-first search
X
1 2
3
4 7
6
5
AT
13
Multiple-source breadth-first search
• Sparse array representation => space efficient • Sparse matrix-matrix multiplication => work efficient
• Three possible levels of parallelism: searches, vertices, edges
X AT ATX
à
1 2
3
4 7
6
5
Graph contrac+on via sparse triple product
5
6
3
1 2
4
A1
A3 A2
A1
A2 A3
Contract
1 5 2 3 4 6 1
5
2 3 4
6
1 1 0 00 00 0 1 10 00 0 0 01 1
1 1 01 0 10 1 01 11 1
0 0 1
x x =
1 5 2 3 4 6 1 2 3
Subgraph extrac+on via sparse triple product
5
6
3
1 2
4
Extract 3
12
1 5 2 3 4 6 1
5
2 3 4
6
1 1 1 00 00 0 1 11 00 0 0 01 1
1 1 01 0 11 1 01 11 1
0 0 1
x x =
1 5 2 3 4 6 1 2 3
16
Betweenness centrality [Robinson 2008]
• Slide on BC in CombBLAS, or at least array based.
17
Graph algorithms in the language of linear algebra
• Kepner et al. study [2006]: fundamental graph algorithms including min spanning tree, shortest paths, independent set, max flow, clustering, …
• SSCA#2 / centrality [2008]
• Basic breadth-first search / Graph500 [2010]
• Beamer et al. [2013] direction-optimizing breadth-first search, using CombBLAS
18
Identification of Primitives
Sparse matrix-matrix multiplication (SpGEMM)
Element-wise operations
×
Matrices over various semirings: (+ . x), (min . +), (or . and), …
Sparse matrix-dense vector multiplication Sparse matrix indexing
×
.*
Sparse array-based primitives
The case for sparse matrix graph primi+ves
Many irregular applications contain coarse-grained parallelism that can be exploited
by abstractions at the proper level.
Tradi-onal graph computa-ons Data driven,
unpredictable communica+on.
Irregular and unstructured, poor locality of reference
Fine grained data accesses, dominated by latency
The case for sparse matrix graph primi+ves
Many irregular applications contain coarse-grained parallelism that can be exploited
by abstractions at the proper level.
Tradi-onal graph computa-ons
Graphs in the language of linear algebra
Data driven, unpredictable communica+on.
Fixed communica+on paHerns
Irregular and unstructured, poor locality of reference
Opera+ons on matrix blocks exploit memory hierarchy
Fine grained data accesses, dominated by latency
Coarse grained parallelism, bandwidth limited
The case for sparse matrix graph primi+ves
21
Matrices over semirings
• E.g. matrix multiplication C = AB (or matrix/vector):
Ci,j = Ai,1×B1,j + Ai,2×B2,j + · · · + Ai,n×Bn,j
• Replace scalar operations × and + by ⊗ : associative, distributes over ⊕
⊕ : associative, commutative
• Then Ci,j = Ai,1⊗B1,j ⊕ Ai,2⊗B2,j ⊕ · · · ⊕ Ai,n⊗Bn,j
• Examples: ×.+ ; and.or ; +.min ; . . .
• Same data reference pattern and control flow
22
Examples of semirings in graph algorithms
(R, +, x) Real Field
Standard numerical linear algebra
({0,1}, |, &) Boolean Semiring
Graph traversal
(R U {∞}, min, +) Tropical Semiring
Shortest paths
(R U {∞}, min, x) Select subgraph, or contract nodes to form quotient graph
(edge/vertex attributes, vertex data aggregation, edge data processing)
Schema for user-specified computation at vertices and edges
23
Outline
• Motivation
• Sparse matrices for graph algorithms
• CombBLAS: sparse arrays and graphs on parallel machines
• KDT: attributed semantic graphs in a high-level language
• Standards for graph algorithm primitives
24
Combinatorial BLAS: Functions
or run-time type information), all elementwise operationsbetween two sparse matrices may have a single functionprototype in the future.
On the other hand, the data access patterns of matrix–matrix and matrix–vector multiplications are independentof the underlying semiring. As a result, the sparsematrix–matrix multiplication routine (SpGEMM) and thesparse matrix–vector multiplication routine (SpMV) eachhave a single function prototype that accepts a parameterrepresenting the semiring, which defines user-specifiedaddition and multiplication operations for SpGEMM andSpMV.
(2) If an operation can be efficiently implemented bycomposing a few simpler operations, then we do notprovide a special function for that operator.
For example, making a non-zero matrix A column sto-chastic can be efficiently implemented by first callingREDUCE on A to get a dense row vector v that contains thesums of columns, then obtaining the multiplicative inverseof each entry in v by calling the APPLY function with theunary function object that performs f !vi" # 1=vi for everyvi it is applied to, and finally calling SCALE(V) onA to effec-tively divide each non-zero entry in a column by its sum.Consequently, we do not provide a special function to makea matrix column stochastic.
On the other hand, a commonly occurring operation is tozero out some of the non-zeros of a sparse matrix. Thisoften comes up in graph traversals, where Xk represents thekth frontier, the set of vertices that are discovered during
the kth iteration. After the frontier expansion ATXk ,previously discovered vertices can be pruned by performingan elementwise multiplication with a matrixY that includesa zero for every vertex that has been discovered before, andnon-zeros elsewhere. However, this approach might not bework-efficient as Y will often be dense, especially in theearly stages of the graph traversal.
Consequently, we provide a generalized functionSPEWISEX that performs the elementwise multiplicationof sparse matrices op!A" and op!B". It also accepts twoauxiliary parameters, notA and notB, that are used to negatethe sparsity structure of A and B. If notA is true, thenop!A"!i; j" # 0 for every non-zero A!i; j" 6# 0, andop!A"!i; j" # 1 for every zero A!i; j" # 0. The role ofnotB is identical. Direct support for the logical NOT oper-ations is crucial to avoid the explicit construction of thedense not!B" object.
(3) To avoid expensive object creation and copying, manyfunctions also have in-place versions. For operationsthat can be implemented in place, we deny access toany other variants only if those increase the runningtime.
Table 2. Summary of the current API for the Combinatorial BLAS
Function Applies to Parameters Returns Matlab Phrasing
Sparse Matrix A, B: sparse matricesSPGEMM (as friend) trA: transpose A if true Sparse Matrix C # A $ B
trB: transpose B if true
SPMV Sparse Matrix A: sparse matrices(as friend) x: sparse or dense vector(s) Sparse or Dense y # A $ x
trA: transpose A if true Vector(s)
Sparse Matrices A, B: sparse matricesSPEWISEX (as friend) notA: negate A if true Sparse Matrix C # A $ B
notB: negate B if true
Any Matrix dim: dimension to reduceREDUCE (as method) binop: reduction operator Dense Vector sum(A)
Sparse Matrix p: row indices vectorSPREF (as method) q: column indices vector Sparse Matrix B # A(p, q)
Sparse Matrix p: row indices vectorSPASGN (as method) q: column indices vector none A(p, q) # B
B: matrix to assign
Any Matrix rhs: any object Check guidingSCALE (as method) (except a sparse matrix) none principles 3 and 4
Any Vector rhs: any vector none noneSCALE (as method)
Any Object unop: unary operatorAPPLY (as method) (applied to non-zeros) None none
Buluc and Gilbert 5
at UNIV CALIFORNIA BERKELEY LIB on June 6, 2011hpc.sagepub.comDownloaded from
Combinatorial BLAS: Distributed-‐memory reference implementa+on
CommGrid
DCSC CSC CSB Triples
SpMat SpDistMat DenseDistMat
DistMat
Enforces interface only
Combinatorial BLAS func7ons and operators
DenseDistVec SpDistVec
FullyDistVec ... HAS A
Polymorphism
Matrix/vector distribu+ons, interleaved on each other.
5
8
€
x1,1
€
x1,2
€
x1,3
€
x2,1
€
x2,2
€
x2,3
€
x3,1
€
x3,2
€
x3,3
€
A1,1
€
A1,2
€
A1,3
€
A2,1
€
A2,2
€
A2,3
€
A3,1
€
A3,2
€
A3,3
€
n pr€
n pc
2D layout for sparse matrices & vectors
-‐ 2D matrix layout wins over 1D with large core counts and with limited bandwidth/compute -‐ 2D vector layout some+mes important for load balance
Default distribu+on in Combinatorial BLAS. Scalable with increasing number of processes
Parallel sparse matrix-‐matrix mul+plica+on algorithm
x =
€
Cij
100K 25K
5K
25K
100K
5K
A B C
2D algorithm: Sparse SUMMA (based on dense SUMMA) General implementation that handles rectangular matrices
Cij += HyperSparseGEMM(Arecv, Brecv)
1D vs. 2D scaling for sparse matrix-‐matrix mul+plica+on 16 BULUC AND GILBERT
0
5
10
15
20
25
30
35
9 36 64 121 150 180 256
Seco
nds
Number of Cores
1.6X
2.1X2.2X
3.1X
66XSpSUMMAEpetraExt
(a) R-MAT � R-MAT product (scale 21).
0
10
20
30
40
50
60
70
4 9 16 36 64 121
Seco
nds
Number of Cores
3.9X
32X
65XSpSUMMAEpetraExt
(b) Multiplication of an R-MAT matrix of scale23 with the restriction operator of order 8.
Fig. 4.10: Comparison of SpGEMM implementation of Trilinos’ EpetraExt packagewith our Sparse SUMMA implementation. The data labels on the plots show thespeedup of our implementation over EpetraExt.
rithms.Finally, with the number of cores per node increasing due to multicore scaling, the
contention on the network interface card increases. Without hierarchical parallelismthat exploits the faster on-chip network, the flat MPI parallelism will be unscalablebecause more processes will be competing for the same network link. Therefore,designing hierarchically SpGEMM and SpRef algorithms is an important future di-rection.
REFERENCES
[1] Franklin, Nersc’s Cray XT4 System. http://www.nersc.gov/users/computational-systems/franklin/.
[2] Mark Adams and James W. Demmel. Parallel multigrid solver for 3d unstructured finiteelement problems. In Supercomputing ’99: Proceedings of the 1999 ACM/IEEE conferenceon Supercomputing, page 27, New York, NY, USA, 1999. ACM.
[3] Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Minimizing communication innumerical linear algebra. SIAM. J. Matrix Anal. & Appl, 32:pp. 866–901, 2011.
[4] William L. Briggs, Van Emden Henson, and Steve F. McCormick. A multigrid tutorial: secondedition. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.
[5] Aydın Buluc and John R. Gilbert. New ideas in sparse matrix-matrix multiplication. InJ. Kepner and J. Gilbert, editors, Graph Algorithms in the Language of Linear Algebra.SIAM, Philadelphia. 2011.
[6] Aydın Buluc and John R. Gilbert. Challenges and advances in parallel sparse matrix-matrixmultiplication. In ICPP’08: Proc. of the Intl. Conf. on Parallel Processing, pages 503–510,Portland, Oregon, USA, 2008. IEEE Computer Society.
[7] Aydın Buluc and John R. Gilbert. On the representation and multiplication of hypersparsematrices. In IPDPS’08: Proceedings of the 2008 IEEE International Symposium on Par-allel&Distributed Processing, pages 1–11. IEEE Computer Society, 2008.
[8] Aydın Buluc and John R. Gilbert. Highly parallel sparse matrix-matrix multiplication. Tech-nical Report UCSB-CS-2010-10, Computer Science Department, University of California,Santa Barbara, 2010. http://arxiv.org/abs/1006.2183.
[9] Aydın Buluc and John R. Gilbert. The Combinatorial BLAS: Design, implementation, andapplications. The International Journal of High Performance Computing Applications,online first, 2011.
• 1-D data layout • 2-D data layout (Combinatorial BLAS)
Almost linear scaling until bandwidth costs starts to dominate
Scaling to more processors…
Scaling proportional to √p afterwards
0.125
0.25
0.5
1
2
4
8
16
4 9 16 36 64 121 256
529 1024
2025 4096
Seco
nds
Number of Cores
Slope = -0.854
Slope = -0.471
Scale-21Compute bound
Bandwidth-bound
Tcomm =O(n p )
Tcomp =O(n)
30
Combinatorial BLAS users (Sep 2013)
• IBM (T.J. Watson, Zurich, & Tokyo) • Microsoft • Intel • Cray • Stanford • UC Berkeley • Carnegie-Mellon • Georgia Tech • Ohio State • Columbia • U Minnesota
• King Fahd U • Tokyo Inst of Technology
• Chinese Academy of Sciences
• U Ghent (Belgium)
• Bilkent U (Turkey)
• U Canterbury (New Zealand)
• Purdue • Indiana U • Mississippi State • UC Merced
31
Outline
• Motivation
• Sparse matrices for graph algorithms
• CombBLAS: sparse arrays and graphs on parallel machines
• KDT: attributed semantic graphs in a high-level language
• Standards for graph algorithm primitives
Parallel graph analysis soVware
Discrete structure analysis
Graph theory
Computers
Parallel graph analysis soVware
Discrete structure analysis
Graph theory
Computers Communica+on Support (MPI, GASNet, etc)
Threading Support (OpenMP, Cilk, etc))
Distributed Combinatorial BLAS
Shared-‐address space Combinatorial BLAS
HPC scien+sts and engineers
Graph algorithm developers
Knowledge Discovery Toolbox (KDT)
• KDT is higher level (graph abstrac+ons) • Combinatorial BLAS is for performance
Domain scien+sts
• (Seman+c) directed graphs – constructors, I/O – basic graph metrics (e.g., degree()) – vectors
• Clustering / components • Centrality / authority: betweenness centrality, PageRank • Hypergraphs and sparse matrices • Graph primi+ves (e.g., bfsTree()) • SpMV / SpGEMM on semirings
Domain expert vs. graph expert
Markov Clustering
Input Graph
Largest Component
Graph of Clusters
Markov Clustering
Input Graph
Largest Component
Graph of Clusters
Domain expert vs. graph expert
• (Seman+c) directed graphs – constructors, I/O – basic graph metrics (e.g., degree()) – vectors
• Clustering / components • Centrality / authority: betweenness centrality, PageRank • Hypergraphs and sparse matrices • Graph primi+ves (e.g., bfsTree()) • SpMV / SpGEMM on semirings
comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clusters = G.cluster(‘Markov’) clusNedge = G.nedge(clusters) smallG = G.contract(clusters) # visualize
Domain expert vs. graph expert
Markov Clustering
Input Graph
Largest Component
Graph of Clusters
• (Seman+c) directed graphs – constructors, I/O – basic graph metrics (e.g., degree()) – vectors
• Clustering / components • Centrality / authority: betweenness centrality, PageRank • Hypergraphs and sparse matrices • Graph primi+ves (e.g., bfsTree()) • SpMV / SpGEMM on semirings
[…] L = G.toSpParMat() d = L.sum(kdt.SpParMat.Column) L = -L L.setDiag(d) M = kdt.SpParMat.eye(G.nvert()) – mu*L pos = kdt.ParVec.rand(G.nvert()) for i in range(nsteps): pos = M.SpMV(pos)
comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clusters = G.cluster(‘Markov’) clusNedge = G.nedge(clusters) smallG = G.contract(clusters) # visualize
• Aimed at domain experts who know their problem well but don’t know how to program a supercomputer
• Easy-‐to-‐use Python interface • Runs on a laptop as well as a cluster with 10,000 processors
• Open source soVware (New BSD license) • V0.3 release April 2013
A general graph library with opera+ons based on linear
algebraic primi+ves
Knowledge Discovery Toolbox hHp://kdt.sourceforge.net/
38
PageRank
0*1.2!/3%4!56(!%7608.*,6%98556%%
<'()b'*@$&'0&$'$
7)+2)?$%&$%6-3+2'*2$
%=$32.)+$%6-3+2'*2$
7)+4>)&$,%*@$23$%28$
e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($
%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$
)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$
=3,,31)5$/0$'$*3+6',%N'43*$&2)-$
cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$
$
<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$
K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$
&;+=)+Ä8$
{$<R%$s$µ$R%$X$6)&&'()&$
/)21))*$*)%(./3+&$
!"#$"%&'
(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'
M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$
%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$
&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$
&066)2+%>$-3&%47)$5)T*%2)8$$
M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$
*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$
2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$
'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$
Belief Propagation
Markov Clustering
K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$
-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$
'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$
%2&$7)+4>)&$/)=3+)$,)'7%*(8$
$
f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$
1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($
'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$
-+35;>2$'*5$+)&>',%*(8$
6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%
Betweenness Centrality
H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$
%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$
&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$
Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$
)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$
/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$
$
6:8#!%0*1.2!/3%@581A6*%B*00C6,6%
PageRank
0*1.2!/3%4!56(!%7608.*,6%98556%%
<'()b'*@$&'0&$'$
7)+2)?$%&$%6-3+2'*2$
%=$32.)+$%6-3+2'*2$
7)+4>)&$,%*@$23$%28$
e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($
%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$
)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$
=3,,31)5$/0$'$*3+6',%N'43*$&2)-$
cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$
$
<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$
K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$
&;+=)+Ä8$
{$<R%$s$µ$R%$X$6)&&'()&$
/)21))*$*)%(./3+&$
!"#$"%&'
(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'
M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$
%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$
&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$
&066)2+%>$-3&%47)$5)T*%2)8$$
M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$
*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$
2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$
'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$
Belief Propagation
Markov Clustering
K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$
-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$
'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$
%2&$7)+4>)&$/)=3+)$,)'7%*(8$
$
f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$
1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($
'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$
-+35;>2$'*5$+)&>',%*(8$
6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%
Betweenness Centrality
H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$
%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$
&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$
Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$
)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$
/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$
$
6:8#!%0*1.2!/3%@581A6*%B*00C6,6%
PageRank
0*1.2!/3%4!56(!%7608.*,6%98556%%
<'()b'*@$&'0&$'$
7)+2)?$%&$%6-3+2'*2$
%=$32.)+$%6-3+2'*2$
7)+4>)&$,%*@$23$%28$
e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($
%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$
)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$
=3,,31)5$/0$'$*3+6',%N'43*$&2)-$
cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$
$
<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$
K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$
&;+=)+Ä8$
{$<R%$s$µ$R%$X$6)&&'()&$
/)21))*$*)%(./3+&$
!"#$"%&'
(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'
M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$
%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$
&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$
&066)2+%>$-3&%47)$5)T*%2)8$$
M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$
*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$
2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$
'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$
Belief Propagation
Markov Clustering
K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$
-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$
'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$
%2&$7)+4>)&$/)=3+)$,)'7%*(8$
$
f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$
1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($
'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$
-+35;>2$'*5$+)&>',%*(8$
6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%
Betweenness Centrality
H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$
%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$
&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$
Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$
)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$
/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$
$
6:8#!%0*1.2!/3%@581A6*%B*00C6,6%
PageRank
0*1.2!/3%4!56(!%7608.*,6%98556%%
<'()b'*@$&'0&$'$
7)+2)?$%&$%6-3+2'*2$
%=$32.)+$%6-3+2'*2$
7)+4>)&$,%*@$23$%28$
e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($
%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$
)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$
=3,,31)5$/0$'$*3+6',%N'43*$&2)-$
cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$
$
<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$
K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$
&;+=)+Ä8$
{$<R%$s$µ$R%$X$6)&&'()&$
/)21))*$*)%(./3+&$
!"#$"%&'
(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'
M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$
%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$
&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$
&066)2+%>$-3&%47)$5)T*%2)8$$
M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$
*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$
2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$
'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$
Belief Propagation
Markov Clustering
K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$
-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$
'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$
%2&$7)+4>)&$/)=3+)$,)'7%*(8$
$
f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$
1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($
'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$
-+35;>2$'*5$+)&>',%*(8$
6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%
Betweenness Centrality
H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$
%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$
&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$
Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$
)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$
/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$
$
6:8#!%0*1.2!/3%@581A6*%B*00C6,6%
A few KDT applications
Example: • Vertex types: Person, Phone,
Camera, Gene, Pathway • Edge types: PhoneCall, TextMessage,
CoLoca+on, SequenceSimilarity • Edge aHributes: Time, Dura+on
• Calculate centrality just for emails among engineers sent between given start and end +mes
AHributed seman+c graphs and filters
def onlyEngineers (self): return self.position == Engineer def timedEmail (self, sTime, eTime): return ((self.type == email) and (self.Time > sTime) and (self.Time < eTime)) G.addVFilter(onlyEngineers) G.addEFilter(timedEmail(start, end)) # rank via centrality based on recent email transactions among engineers bc = G.rank(’approxBC’)
KDT$Algorithm$
CombBLAS$Primi4ve$
Filter$(Py)$
Python'
C++'
Semiring$(Py)$KDT$Algorithm$
CombBLAS$Primi4ve$ Filter$(C++)$
Semiring$(C++)$
Standard$KDT$ KDT+SEJITS$
SEJITS$$$$Transla4on$
Filter$(Py)$
Semiring$(Py)$
SEJITS for filter/semiring accelera+on
KDT$Algorithm$
CombBLAS$Primi4ve$
Filter$(Py)$
Python'
C++'
Semiring$(Py)$KDT$Algorithm$
CombBLAS$Primi4ve$ Filter$(C++)$
Semiring$(C++)$
Standard$KDT$ KDT+SEJITS$
SEJITS$$$$Transla4on$
Filter$(Py)$
Semiring$(Py)$
SEJITS for filter/semiring accelera+on
Embedded DSL: Python for the whole applica+on • Introspect, translate Python to equivalent C++ code • Call compiled/op+mized C++ instead of Python
Filtered BFS with SEJITS
!"#$%!"$!%&"!!%#"!!%'"!!%("!!%&)"!!%*#"!!%)'"!!%
&#&% #$)% $+)% &!#'% #!#$%
!"#$%&'
(%)*
"%
+,*-".%/0%!12%3./4"55"5%
,-.% /012./3,-.% 456789:/%
Time (in seconds) for a single BFS itera+on on scale 25 RMAT (33M ver+ces, 500M edges) with 10% of elements passing filter. Machine is NERSC’s Hopper.
43
Outline
• Motivation
• Sparse matrices for graph algorithms
• CombBLAS: sparse arrays and graphs on parallel machines
• KDT: attributed semantic graphs in a high-level language
• Standards for graph algorithm primitives
44
History of BLAS
• Separation of concerns: • Experts in mapping algorithms onto hardware tuned BLAS to specific platforms.
• Experts in linear algebra built software on top of the BLAS to obtain high performance “for free”.
• Today every computer, phone, etc. comes with /usr/lib/libblas!
The Basic Linear Algebra Subroutines had a revolutionary impact
on computational linear algebra.
BLAS 1 vector ops Lawson, Hanson, Kincaid, Krogh, 1979
LINPACK
BLAS 2 matrix-vector ops Dongarra, Du Croz, Hammarling, Hanson, 1988
LINPACK on vector machines
BLAS 3 matrix-matrix ops Dongarra, Du Croz, Hammarling, Hanson, 1990
LAPACK on cache based machines
45
• No, it is not reasonable to define a universal set of graph algorithm building blocks: – Huge diversity in matching algorithms to hardware platforms.
– No consensus on data structures and linguistic primitives. – Lots of graph algorithms remain to be discovered.
– Early standardization can inhibit innovation.
• Yes, it is reasonable to define a common set of graph algorithm building blocks … for Graphs as Linear Algebra: – Representing graphs in the language of linear algebra is a mature
field.
– Algorithms, high level interfaces, and implementations vary. – But the core primitives are well established.
Can we define and standardize the “Graph BLAS”?
47
Conclusion
• It helps to look at things from two directions.
• Sparse arrays and matrices yield useful primitives and algorithms for high-performance graph computation.
• Graphs in the language of linear algebra are sufficiently mature to support a standard set of BLAS.