Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...

47
1 Sparse Matrices for High-Performance Graph Analytics John R. Gilbert University of California, Santa Barbara Chesapeake Large-Scale Analytics Conference October 17, 2013 Support: Intel, Microsoft, DOE Office of Science, NSF

Transcript of Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...

Page 1: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

1

Sparse Matrices for High-Performance Graph Analytics

John R. Gilbert University of California, Santa Barbara Chesapeake Large-Scale Analytics Conference October 17, 2013

Support: Intel, Microsoft, DOE Office of Science, NSF

Page 2: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

2

Thanks …

Aydin Buluc (LBL), Kevin Deweese (UCSB), Erika Duriakova (Dublin), Armando Fox (UCB),

Shoaib Kamil (MIT), Jeremy Kepner (MIT), Adam Lugowski (UCSB), Tim Mattson (Intel),

Lenny Oliker (LBL), Carey Priebe (JHU), Steve Reinhardt (YarcData), Lijie Ren (Google), Eric Robinson (Lincoln), Viral Shah (UIDAI),

Veronika Strnadova (UCSB), Yun Teng (UCSB), Joshua Vogelstein (Duke), Drew Waranis (UCSB),

Sam Williams (LBL)

Page 3: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

3

Outline

•  Motivation

•  Sparse matrices for graph algorithms

•  CombBLAS: sparse arrays and graphs on parallel machines

•  KDT: attributed semantic graphs in a high-level language

•  Standards for graph algorithm primitives

Page 4: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

4

A few biological graph analysis problems

•  Connective abnormalities in schizophrenia [van den Heuvel et al.] –  Problem: understand disease from anatomical brain imaging –  Tools: betweenness centrality, shortest path length –  Results: global statistics on connection graph correlate w/ diagnosis

•  Genomics for biofuels [Strnadova et al.] –  Problem: scale to millions of markers times thousands of individuals –  Tools: min spanning tree, customized clustering –  Results: using much more data leads to much better genomic maps

•  Alignment and matching of brain scans [Vogelstein et al.] –  Problem: match corresponding functional regions across individuals –  Tools: graph partitioning, clustering, and more. . . –  Results: in progress

Page 5: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

5

Computers

Continuous physical modeling

Linear algebra

Discrete structure analysis

Graph theory

Computers

The middleware of scientific computing

Page 6: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

6

Top 500 List (June 2013)

= x P A L U

Top500 Benchmark: Solve a large system of linear equations

by Gaussian elimination

Page 7: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

7

Graph 500 List (June 2013)

Graph500 Benchmark:

Breadth-first search in a large

power-law graph

1 2

3

4 7

6

5

Page 8: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

8

Floating-Point vs. Graphs, June 2013

= x P A L U 1 2

3

4 7

6

5

33.8 Peta / 15.3 Tera is about 2200.

33.8 Petaflops 15.3 Terateps

Page 9: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

9

Floating-Point vs. Graphs, June 2013

= x P A L U 1 2

3

4 7

6

5

Jun 2013: 33.8 Peta / 15.3 Tera ~ 2,200 Nov 2010: 2.5 Peta / 6.6 Giga ~ 380,000

15.3 Terateps 33.8 Petaflops

Page 10: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

10

•  By analogy to numerical scientific computing. . .

•  What should the combinatorial BLAS look like?

The challenge of the software stack

C = A*B

y = A*x

µ = xT y

Basic Linear Algebra Subroutines (BLAS): Ops/Sec vs. Matrix Size

Page 11: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

11

Outline

•  Motivation

•  Sparse matrices for graph algorithms

•  CombBLAS: sparse arrays and graphs on parallel machines

•  KDT: attributed semantic graphs in a high-level language

•  Standards for graph algorithm primitives

Page 12: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

12

Multiple-source breadth-first search

X

1 2

3

4 7

6

5

AT

Page 13: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

13

Multiple-source breadth-first search

•  Sparse array representation => space efficient •  Sparse matrix-matrix multiplication => work efficient

•  Three possible levels of parallelism: searches, vertices, edges

X AT ATX

à

1 2

3

4 7

6

5

Page 14: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Graph  contrac+on  via    sparse  triple  product  

5

6

3

1 2

4

A1  

A3  A2  

A1  

A2   A3  

Contract

1 5 2 3 4 6 1

5

2 3 4

6

1 1 0 00 00 0 1 10 00 0 0 01 1

1 1 01 0 10 1 01 11 1

0 0 1

x   x   =  

1 5 2 3 4 6 1 2 3

Page 15: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Subgraph  extrac+on  via  sparse  triple  product  

5

6

3

1 2

4

Extract 3

12

1 5 2 3 4 6 1

5

2 3 4

6

1 1 1   00 00 0 1 11 00 0 0 01 1

1 1 01 0 11 1 01 11 1

0 0 1

x   x   =  

1 5 2 3 4 6 1 2 3

Page 16: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

16

Betweenness centrality [Robinson 2008]

•  Slide on BC in CombBLAS, or at least array based.

Page 17: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

17

Graph algorithms in the language of linear algebra

•  Kepner et al. study [2006]: fundamental graph algorithms including min spanning tree, shortest paths, independent set, max flow, clustering, …

•  SSCA#2 / centrality [2008]

•  Basic breadth-first search / Graph500 [2010]

•  Beamer et al. [2013] direction-optimizing breadth-first search, using CombBLAS

Page 18: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

18

Identification of Primitives

Sparse matrix-matrix multiplication (SpGEMM)

Element-wise operations

×

Matrices over various semirings: (+ . x), (min . +), (or . and), …

Sparse matrix-dense vector multiplication Sparse matrix indexing

×

.*

Sparse array-based primitives

Page 19: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

The  case  for  sparse  matrix  graph  primi+ves  

Many irregular applications contain coarse-grained parallelism that can be exploited

by abstractions at the proper level.

Tradi-onal  graph  computa-ons Data  driven,  

unpredictable  communica+on.

Irregular  and  unstructured,    poor  locality  of  reference

Fine  grained  data  accesses,  dominated  by  latency

The  case  for  sparse  matrix  graph  primi+ves  

Page 20: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Many irregular applications contain coarse-grained parallelism that can be exploited

by abstractions at the proper level.

Tradi-onal  graph  computa-ons

Graphs  in  the  language  of  linear  algebra

Data  driven,  unpredictable  communica+on.

Fixed  communica+on  paHerns

Irregular  and  unstructured,    poor  locality  of  reference

Opera+ons  on  matrix  blocks  exploit  memory  hierarchy

Fine  grained  data  accesses,  dominated  by  latency

Coarse  grained  parallelism,  bandwidth  limited

The  case  for  sparse  matrix  graph  primi+ves  

Page 21: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

21

Matrices over semirings

•  E.g. matrix multiplication C = AB (or matrix/vector):

Ci,j = Ai,1×B1,j + Ai,2×B2,j + · · · + Ai,n×Bn,j

•  Replace scalar operations × and + by ⊗ : associative, distributes over ⊕

⊕ : associative, commutative

•  Then Ci,j = Ai,1⊗B1,j ⊕ Ai,2⊗B2,j ⊕ · · · ⊕ Ai,n⊗Bn,j

•  Examples: ×.+ ; and.or ; +.min ; . . .

•  Same data reference pattern and control flow

Page 22: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

22

Examples of semirings in graph algorithms

(R, +, x) Real Field

Standard numerical linear algebra

({0,1}, |, &) Boolean Semiring

Graph traversal

(R U {∞}, min, +) Tropical Semiring

Shortest paths

(R U {∞}, min, x) Select subgraph, or contract nodes to form quotient graph

(edge/vertex attributes, vertex data aggregation, edge data processing)

Schema for user-specified computation at vertices and edges

Page 23: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

23

Outline

•  Motivation

•  Sparse matrices for graph algorithms

•  CombBLAS: sparse arrays and graphs on parallel machines

•  KDT: attributed semantic graphs in a high-level language

•  Standards for graph algorithm primitives

Page 24: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

24

Combinatorial BLAS: Functions

or run-time type information), all elementwise operationsbetween two sparse matrices may have a single functionprototype in the future.

On the other hand, the data access patterns of matrix–matrix and matrix–vector multiplications are independentof the underlying semiring. As a result, the sparsematrix–matrix multiplication routine (SpGEMM) and thesparse matrix–vector multiplication routine (SpMV) eachhave a single function prototype that accepts a parameterrepresenting the semiring, which defines user-specifiedaddition and multiplication operations for SpGEMM andSpMV.

(2) If an operation can be efficiently implemented bycomposing a few simpler operations, then we do notprovide a special function for that operator.

For example, making a non-zero matrix A column sto-chastic can be efficiently implemented by first callingREDUCE on A to get a dense row vector v that contains thesums of columns, then obtaining the multiplicative inverseof each entry in v by calling the APPLY function with theunary function object that performs f !vi" # 1=vi for everyvi it is applied to, and finally calling SCALE(V) onA to effec-tively divide each non-zero entry in a column by its sum.Consequently, we do not provide a special function to makea matrix column stochastic.

On the other hand, a commonly occurring operation is tozero out some of the non-zeros of a sparse matrix. Thisoften comes up in graph traversals, where Xk represents thekth frontier, the set of vertices that are discovered during

the kth iteration. After the frontier expansion ATXk ,previously discovered vertices can be pruned by performingan elementwise multiplication with a matrixY that includesa zero for every vertex that has been discovered before, andnon-zeros elsewhere. However, this approach might not bework-efficient as Y will often be dense, especially in theearly stages of the graph traversal.

Consequently, we provide a generalized functionSPEWISEX that performs the elementwise multiplicationof sparse matrices op!A" and op!B". It also accepts twoauxiliary parameters, notA and notB, that are used to negatethe sparsity structure of A and B. If notA is true, thenop!A"!i; j" # 0 for every non-zero A!i; j" 6# 0, andop!A"!i; j" # 1 for every zero A!i; j" # 0. The role ofnotB is identical. Direct support for the logical NOT oper-ations is crucial to avoid the explicit construction of thedense not!B" object.

(3) To avoid expensive object creation and copying, manyfunctions also have in-place versions. For operationsthat can be implemented in place, we deny access toany other variants only if those increase the runningtime.

Table 2. Summary of the current API for the Combinatorial BLAS

Function Applies to Parameters Returns Matlab Phrasing

Sparse Matrix A, B: sparse matricesSPGEMM (as friend) trA: transpose A if true Sparse Matrix C # A $ B

trB: transpose B if true

SPMV Sparse Matrix A: sparse matrices(as friend) x: sparse or dense vector(s) Sparse or Dense y # A $ x

trA: transpose A if true Vector(s)

Sparse Matrices A, B: sparse matricesSPEWISEX (as friend) notA: negate A if true Sparse Matrix C # A $ B

notB: negate B if true

Any Matrix dim: dimension to reduceREDUCE (as method) binop: reduction operator Dense Vector sum(A)

Sparse Matrix p: row indices vectorSPREF (as method) q: column indices vector Sparse Matrix B # A(p, q)

Sparse Matrix p: row indices vectorSPASGN (as method) q: column indices vector none A(p, q) # B

B: matrix to assign

Any Matrix rhs: any object Check guidingSCALE (as method) (except a sparse matrix) none principles 3 and 4

Any Vector rhs: any vector none noneSCALE (as method)

Any Object unop: unary operatorAPPLY (as method) (applied to non-zeros) None none

Buluc and Gilbert 5

at UNIV CALIFORNIA BERKELEY LIB on June 6, 2011hpc.sagepub.comDownloaded from

Page 25: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Combinatorial  BLAS:  Distributed-­‐memory  reference  implementa+on  

CommGrid  

DCSC   CSC   CSB  Triples  

SpMat  SpDistMat  DenseDistMat  

DistMat  

Enforces  interface  only  

Combinatorial  BLAS    func7ons  and  operators  

DenseDistVec  SpDistVec  

FullyDistVec  ...  HAS  A  

Polymorphism  

Page 26: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Matrix/vector  distribu+ons,    interleaved  on  each  other.    

5

8

x1,1

x1,2

x1,3

x2,1

x2,2

x2,3

x3,1

x3,2

x3,3

A1,1

A1,2

A1,3

A2,1

A2,2

A2,3

A3,1

A3,2

A3,3

n pr€

n pc

2D  layout  for  sparse  matrices  &  vectors  

-­‐  2D  matrix  layout  wins  over  1D  with  large  core  counts                              and  with    limited  bandwidth/compute  -­‐  2D  vector  layout  some+mes  important  for  load  balance  

Default  distribu+on  in  Combinatorial BLAS.    Scalable  with  increasing  number  of  processes      

Page 27: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Parallel  sparse  matrix-­‐matrix  mul+plica+on  algorithm  

x    =  

Cij

100K 25K

5K

25K

100K

5K

A   B   C  

2D algorithm: Sparse SUMMA (based on dense SUMMA) General implementation that handles rectangular matrices

Cij += HyperSparseGEMM(Arecv, Brecv)

Page 28: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

1D  vs.  2D  scaling  for  sparse    matrix-­‐matrix  mul+plica+on  16 BULUC AND GILBERT

0

5

10

15

20

25

30

35

9 36 64 121 150 180 256

Seco

nds

Number of Cores

1.6X

2.1X2.2X

3.1X

66XSpSUMMAEpetraExt

(a) R-MAT � R-MAT product (scale 21).

0

10

20

30

40

50

60

70

4 9 16 36 64 121

Seco

nds

Number of Cores

3.9X

32X

65XSpSUMMAEpetraExt

(b) Multiplication of an R-MAT matrix of scale23 with the restriction operator of order 8.

Fig. 4.10: Comparison of SpGEMM implementation of Trilinos’ EpetraExt packagewith our Sparse SUMMA implementation. The data labels on the plots show thespeedup of our implementation over EpetraExt.

rithms.Finally, with the number of cores per node increasing due to multicore scaling, the

contention on the network interface card increases. Without hierarchical parallelismthat exploits the faster on-chip network, the flat MPI parallelism will be unscalablebecause more processes will be competing for the same network link. Therefore,designing hierarchically SpGEMM and SpRef algorithms is an important future di-rection.

REFERENCES

[1] Franklin, Nersc’s Cray XT4 System. http://www.nersc.gov/users/computational-systems/franklin/.

[2] Mark Adams and James W. Demmel. Parallel multigrid solver for 3d unstructured finiteelement problems. In Supercomputing ’99: Proceedings of the 1999 ACM/IEEE conferenceon Supercomputing, page 27, New York, NY, USA, 1999. ACM.

[3] Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Minimizing communication innumerical linear algebra. SIAM. J. Matrix Anal. & Appl, 32:pp. 866–901, 2011.

[4] William L. Briggs, Van Emden Henson, and Steve F. McCormick. A multigrid tutorial: secondedition. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

[5] Aydın Buluc and John R. Gilbert. New ideas in sparse matrix-matrix multiplication. InJ. Kepner and J. Gilbert, editors, Graph Algorithms in the Language of Linear Algebra.SIAM, Philadelphia. 2011.

[6] Aydın Buluc and John R. Gilbert. Challenges and advances in parallel sparse matrix-matrixmultiplication. In ICPP’08: Proc. of the Intl. Conf. on Parallel Processing, pages 503–510,Portland, Oregon, USA, 2008. IEEE Computer Society.

[7] Aydın Buluc and John R. Gilbert. On the representation and multiplication of hypersparsematrices. In IPDPS’08: Proceedings of the 2008 IEEE International Symposium on Par-allel&Distributed Processing, pages 1–11. IEEE Computer Society, 2008.

[8] Aydın Buluc and John R. Gilbert. Highly parallel sparse matrix-matrix multiplication. Tech-nical Report UCSB-CS-2010-10, Computer Science Department, University of California,Santa Barbara, 2010. http://arxiv.org/abs/1006.2183.

[9] Aydın Buluc and John R. Gilbert. The Combinatorial BLAS: Design, implementation, andapplications. The International Journal of High Performance Computing Applications,online first, 2011.

•  1-D data layout •  2-D data layout (Combinatorial BLAS)

Page 29: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Almost linear scaling until bandwidth costs starts to dominate

Scaling to more processors…

Scaling proportional to √p afterwards

0.125

0.25

0.5

1

2

4

8

16

4 9 16 36 64 121 256

529 1024

2025 4096

Seco

nds

Number of Cores

Slope = -0.854

Slope = -0.471

Scale-21Compute bound

Bandwidth-bound

Tcomm =O(n p )

Tcomp =O(n)

Page 30: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

30

Combinatorial BLAS users (Sep 2013)

•  IBM (T.J. Watson, Zurich, & Tokyo) •  Microsoft •  Intel •  Cray •  Stanford •  UC Berkeley •  Carnegie-Mellon •  Georgia Tech •  Ohio State •  Columbia •  U Minnesota

•  King Fahd U •  Tokyo Inst of Technology

•  Chinese Academy of Sciences

•  U Ghent (Belgium)

•  Bilkent U (Turkey)

•  U Canterbury (New Zealand)

•  Purdue •  Indiana U •  Mississippi State •  UC Merced

Page 31: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

31

Outline

•  Motivation

•  Sparse matrices for graph algorithms

•  CombBLAS: sparse arrays and graphs on parallel machines

•  KDT: attributed semantic graphs in a high-level language

•  Standards for graph algorithm primitives

Page 32: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Parallel  graph  analysis  soVware  

Discrete  structure  analysis  

Graph  theory  

Computers  

Page 33: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Parallel  graph  analysis  soVware  

Discrete  structure  analysis  

Graph  theory  

Computers  Communica+on  Support  (MPI,  GASNet,  etc)  

Threading  Support  (OpenMP,  Cilk,  etc))  

Distributed  Combinatorial  BLAS  

Shared-­‐address  space  Combinatorial  BLAS  

HPC  scien+sts  and  engineers    

Graph  algorithm  developers  

Knowledge  Discovery  Toolbox  (KDT)  

•  KDT  is  higher  level  (graph  abstrac+ons)  •  Combinatorial  BLAS  is  for  performance  

Domain  scien+sts  

Page 34: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

•  (Seman+c)  directed  graphs  –  constructors,  I/O  –  basic  graph  metrics  (e.g.,  degree())  –  vectors  

•  Clustering  /  components  •  Centrality  /  authority:                  betweenness  centrality,  PageRank      •  Hypergraphs  and  sparse  matrices  •  Graph  primi+ves  (e.g.,  bfsTree())  •  SpMV  /  SpGEMM  on  semirings  

Domain  expert  vs.  graph  expert  

Markov  Clustering  

Input  Graph  

Largest  Component  

Graph  of  Clusters  

Page 35: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Markov  Clustering  

Input  Graph  

Largest  Component  

Graph  of  Clusters  

Domain  expert  vs.  graph  expert  

•  (Seman+c)  directed  graphs  –  constructors,  I/O  –  basic  graph  metrics  (e.g.,  degree())  –  vectors  

•  Clustering  /  components  •  Centrality  /  authority:                  betweenness  centrality,  PageRank      •  Hypergraphs  and  sparse  matrices  •  Graph  primi+ves  (e.g.,  bfsTree())  •  SpMV  /  SpGEMM  on  semirings  

comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clusters = G.cluster(‘Markov’) clusNedge = G.nedge(clusters) smallG = G.contract(clusters) # visualize

Page 36: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Domain  expert  vs.  graph  expert  

Markov  Clustering  

Input  Graph  

Largest  Component  

Graph  of  Clusters  

•  (Seman+c)  directed  graphs  –  constructors,  I/O  –  basic  graph  metrics  (e.g.,  degree())  –  vectors  

•  Clustering  /  components  •  Centrality  /  authority:                  betweenness  centrality,  PageRank      •  Hypergraphs  and  sparse  matrices  •  Graph  primi+ves  (e.g.,  bfsTree())  •  SpMV  /  SpGEMM  on  semirings  

[…] L = G.toSpParMat() d = L.sum(kdt.SpParMat.Column) L = -L L.setDiag(d) M = kdt.SpParMat.eye(G.nvert()) – mu*L pos = kdt.ParVec.rand(G.nvert()) for i in range(nsteps): pos = M.SpMV(pos)

comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clusters = G.cluster(‘Markov’) clusNedge = G.nedge(clusters) smallG = G.contract(clusters) # visualize

Page 37: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

•  Aimed  at  domain  experts  who  know  their  problem  well  but  don’t  know  how  to  program  a  supercomputer  

•  Easy-­‐to-­‐use  Python  interface  •  Runs  on  a  laptop  as  well  as  a  cluster  with  10,000  processors      

•  Open  source  soVware  (New  BSD  license)    •  V0.3  release  April  2013  

A  general  graph  library  with  opera+ons  based  on  linear  

algebraic  primi+ves  

Knowledge  Discovery  Toolbox  hHp://kdt.sourceforge.net/  

Page 38: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

38

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%

Betweenness Centrality

H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%

Betweenness Centrality

H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%

Betweenness Centrality

H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%

Betweenness Centrality

H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

A few KDT applications

Page 39: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Example:  •  Vertex  types:    Person,  Phone,  

Camera,  Gene,  Pathway  •  Edge  types:    PhoneCall,  TextMessage,  

CoLoca+on,  SequenceSimilarity  •  Edge  aHributes:    Time,  Dura+on  

•  Calculate  centrality  just  for  emails  among  engineers  sent  between  given  start  and  end  +mes  

AHributed  seman+c  graphs  and  filters  

def onlyEngineers (self): return self.position == Engineer def timedEmail (self, sTime, eTime): return ((self.type == email) and (self.Time > sTime) and (self.Time < eTime)) G.addVFilter(onlyEngineers) G.addEFilter(timedEmail(start, end)) # rank via centrality based on recent email transactions among engineers bc = G.rank(’approxBC’)

Page 40: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

KDT$Algorithm$

CombBLAS$Primi4ve$

Filter$(Py)$

Python'

C++'

Semiring$(Py)$KDT$Algorithm$

CombBLAS$Primi4ve$ Filter$(C++)$

Semiring$(C++)$

Standard$KDT$ KDT+SEJITS$

SEJITS$$$$Transla4on$

Filter$(Py)$

Semiring$(Py)$

SEJITS  for  filter/semiring  accelera+on  

Page 41: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

KDT$Algorithm$

CombBLAS$Primi4ve$

Filter$(Py)$

Python'

C++'

Semiring$(Py)$KDT$Algorithm$

CombBLAS$Primi4ve$ Filter$(C++)$

Semiring$(C++)$

Standard$KDT$ KDT+SEJITS$

SEJITS$$$$Transla4on$

Filter$(Py)$

Semiring$(Py)$

SEJITS  for  filter/semiring  accelera+on  

Embedded  DSL:  Python  for  the  whole  applica+on  •  Introspect,  translate  Python  to  equivalent  C++  code  •  Call  compiled/op+mized  C++  instead  of  Python  

Page 42: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

Filtered  BFS  with  SEJITS  

!"#$%!"$!%&"!!%#"!!%'"!!%("!!%&)"!!%*#"!!%)'"!!%

&#&% #$)% $+)% &!#'% #!#$%

!"#$%&'

(%)*

"%

+,*-".%/0%!12%3./4"55"5%

,-.% /012./3,-.% 456789:/%

Time  (in  seconds)  for  a  single  BFS  itera+on  on  scale  25  RMAT  (33M  ver+ces,  500M  edges)  with  10%  of  elements  passing  filter.  Machine  is  NERSC’s  Hopper.  

Page 43: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

43

Outline

•  Motivation

•  Sparse matrices for graph algorithms

•  CombBLAS: sparse arrays and graphs on parallel machines

•  KDT: attributed semantic graphs in a high-level language

•  Standards for graph algorithm primitives

Page 44: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

44

History of BLAS

•  Separation of concerns: •  Experts in mapping algorithms onto hardware tuned BLAS to specific platforms.

•  Experts in linear algebra built software on top of the BLAS to obtain high performance “for free”.

•  Today every computer, phone, etc. comes with /usr/lib/libblas!

The Basic Linear Algebra Subroutines had a revolutionary impact

on computational linear algebra.

BLAS 1 vector ops Lawson, Hanson, Kincaid, Krogh, 1979

LINPACK

BLAS 2 matrix-vector ops Dongarra, Du Croz, Hammarling, Hanson, 1988

LINPACK on vector machines

BLAS 3 matrix-matrix ops Dongarra, Du Croz, Hammarling, Hanson, 1990

LAPACK on cache based machines

Page 45: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

45

•  No, it is not reasonable to define a universal set of graph algorithm building blocks: –  Huge diversity in matching algorithms to hardware platforms.

–  No consensus on data structures and linguistic primitives. –  Lots of graph algorithms remain to be discovered.

–  Early standardization can inhibit innovation.

•  Yes, it is reasonable to define a common set of graph algorithm building blocks … for Graphs as Linear Algebra: –  Representing graphs in the language of linear algebra is a mature

field.

–  Algorithms, high level interfaces, and implementations vary. –  But the core primitives are well established.

Can we define and standardize the “Graph BLAS”?

Page 46: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse
Page 47: Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3 Outline • Motivation • Sparse matrices for graph algorithms • CombBLAS: sparse

47

Conclusion

•  It helps to look at things from two directions.

•  Sparse arrays and matrices yield useful primitives and algorithms for high-performance graph computation.

•  Graphs in the language of linear algebra are sufficiently mature to support a standard set of BLAS.