Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...

1

Sparse Matrices for High-Performance Graph Analytics

John R. Gilbert University of California, Santa Barbara Chesapeake Large-Scale Analytics Conference October 17, 2013

Support: Intel, Microsoft, DOE Office of Science, NSF

2

Thanks …

Aydin Buluc (LBL), Kevin Deweese (UCSB), Erika Duriakova (Dublin), Armando Fox (UCB),

Shoaib Kamil (MIT), Jeremy Kepner (MIT), Adam Lugowski (UCSB), Tim Mattson (Intel),

Lenny Oliker (LBL), Carey Priebe (JHU), Steve Reinhardt (YarcData), Lijie Ren (Google), Eric Robinson (Lincoln), Viral Shah (UIDAI),

Veronika Strnadova (UCSB), Yun Teng (UCSB), Joshua Vogelstein (Duke), Drew Waranis (UCSB),

Sam Williams (LBL)

3

Outline

•  Motivation

•  Sparse matrices for graph algorithms

•  CombBLAS: sparse arrays and graphs on parallel machines

•  KDT: attributed semantic graphs in a high-level language

•  Standards for graph algorithm primitives

4

A few biological graph analysis problems

•  Connective abnormalities in schizophrenia [van den Heuvel et al.] –  Problem: understand disease from anatomical brain imaging –  Tools: betweenness centrality, shortest path length –  Results: global statistics on connection graph correlate w/ diagnosis

•  Genomics for biofuels [Strnadova et al.] –  Problem: scale to millions of markers times thousands of individuals –  Tools: min spanning tree, customized clustering –  Results: using much more data leads to much better genomic maps

•  Alignment and matching of brain scans [Vogelstein et al.] –  Problem: match corresponding functional regions across individuals –  Tools: graph partitioning, clustering, and more. . . –  Results: in progress

5

Computers

Continuous physical modeling

Linear algebra

Discrete structure analysis

Graph theory

Computers

The middleware of scientific computing

6

Top 500 List (June 2013)

= x P A L U

Top500 Benchmark: Solve a large system of linear equations

by Gaussian elimination

7

Graph 500 List (June 2013)

Graph500 Benchmark:

Breadth-first search in a large

power-law graph

1 2

3

4 7

6

5

8

Floating-Point vs. Graphs, June 2013

= x P A L U 1 2

3

4 7

6

5

33.8 Peta / 15.3 Tera is about 2200.

33.8 Petaflops 15.3 Terateps

9

Floating-Point vs. Graphs, June 2013

= x P A L U 1 2

3

4 7

6

5

Jun 2013: 33.8 Peta / 15.3 Tera ~ 2,200 Nov 2010: 2.5 Peta / 6.6 Giga ~ 380,000

15.3 Terateps 33.8 Petaflops

10

•  By analogy to numerical scientific computing. . .

•  What should the combinatorial BLAS look like?

The challenge of the software stack

C = A*B

y = A*x

µ = xT y

Basic Linear Algebra Subroutines (BLAS): Ops/Sec vs. Matrix Size

11

Outline

•  Motivation





12

Multiple-source breadth-first search

X

1 2

3

4 7

6

5

AT

13

Multiple-source breadth-first search

•  Sparse array representation => space efficient •  Sparse matrix-matrix multiplication => work efficient

•  Three possible levels of parallelism: searches, vertices, edges

X AT ATX

à

1 2

3

4 7

6

5

Graph contrac+on via sparse triple product

5

6

3

1 2

4

A1

A3 A2

A1

A2 A3

Contract

1 5 2 3 4 6 1

5

2 3 4

6

1 1 0 00 00 0 1 10 00 0 0 01 1

1 1 01 0 10 1 01 11 1

0 0 1

x x =

1 5 2 3 4 6 1 2 3

Subgraph extrac+on via sparse triple product

5

6

3

1 2

4

Extract 3

12

1 5 2 3 4 6 1

5

2 3 4

6

1 1 1 00 00 0 1 11 00 0 0 01 1

1 1 01 0 11 1 01 11 1

0 0 1

x x =

1 5 2 3 4 6 1 2 3

16

Betweenness centrality [Robinson 2008]

•  Slide on BC in CombBLAS, or at least array based.

17

Graph algorithms in the language of linear algebra

•  Kepner et al. study [2006]: fundamental graph algorithms including min spanning tree, shortest paths, independent set, max flow, clustering, …

•  SSCA#2 / centrality [2008]

•  Basic breadth-first search / Graph500 [2010]

•  Beamer et al. [2013] direction-optimizing breadth-first search, using CombBLAS

18

Identification of Primitives

Sparse matrix-matrix multiplication (SpGEMM)

Element-wise operations

×

Matrices over various semirings: (+ . x), (min . +), (or . and), …

Sparse matrix-dense vector multiplication Sparse matrix indexing

×

.*

Sparse array-based primitives

The case for sparse matrix graph primi+ves

Many irregular applications contain coarse-grained parallelism that can be exploited

by abstractions at the proper level.

Tradi-onal graph computa-ons Data driven,

unpredictable communica+on.

Irregular and unstructured, poor locality of reference

Fine grained data accesses, dominated by latency


Many irregular applications contain coarse-grained parallelism that can be exploited

by abstractions at the proper level.

Tradi-onal graph computa-ons

Graphs in the language of linear algebra

Data driven, unpredictable communica+on.

Fixed communica+on paHerns

Irregular and unstructured, poor locality of reference

Opera+ons on matrix blocks exploit memory hierarchy

Fine grained data accesses, dominated by latency

Coarse grained parallelism, bandwidth limited


21

Matrices over semirings

•  E.g. matrix multiplication C = AB (or matrix/vector):

Ci,j = Ai,1×B1,j + Ai,2×B2,j + · · · + Ai,n×Bn,j

•  Replace scalar operations × and + by ⊗ : associative, distributes over ⊕

⊕ : associative, commutative

•  Then Ci,j = Ai,1⊗B1,j ⊕ Ai,2⊗B2,j ⊕ · · · ⊕ Ai,n⊗Bn,j

•  Examples: ×.+ ; and.or ; +.min ; . . .

•  Same data reference pattern and control flow

22

Examples of semirings in graph algorithms

(R, +, x) Real Field

Standard numerical linear algebra

({0,1}, |, &) Boolean Semiring

Graph traversal

(R U {∞}, min, +) Tropical Semiring

Shortest paths

(R U {∞}, min, x) Select subgraph, or contract nodes to form quotient graph

(edge/vertex attributes, vertex data aggregation, edge data processing)

Schema for user-specified computation at vertices and edges

23

Outline

•  Motivation





24

Combinatorial BLAS: Functions

or run-time type information), all elementwise operationsbetween two sparse matrices may have a single functionprototype in the future.

On the other hand, the data access patterns of matrix–matrix and matrix–vector multiplications are independentof the underlying semiring. As a result, the sparsematrix–matrix multiplication routine (SpGEMM) and thesparse matrix–vector multiplication routine (SpMV) eachhave a single function prototype that accepts a parameterrepresenting the semiring, which defines user-specifiedaddition and multiplication operations for SpGEMM andSpMV.

(2) If an operation can be efficiently implemented bycomposing a few simpler operations, then we do notprovide a special function for that operator.

For example, making a non-zero matrix A column sto-chastic can be efficiently implemented by first callingREDUCE on A to get a dense row vector v that contains thesums of columns, then obtaining the multiplicative inverseof each entry in v by calling the APPLY function with theunary function object that performs f !vi" # 1=vi for everyvi it is applied to, and finally calling SCALE(V) onA to effec-tively divide each non-zero entry in a column by its sum.Consequently, we do not provide a special function to makea matrix column stochastic.

On the other hand, a commonly occurring operation is tozero out some of the non-zeros of a sparse matrix. Thisoften comes up in graph traversals, where Xk represents thekth frontier, the set of vertices that are discovered during

the kth iteration. After the frontier expansion ATXk ,previously discovered vertices can be pruned by performingan elementwise multiplication with a matrixY that includesa zero for every vertex that has been discovered before, andnon-zeros elsewhere. However, this approach might not bework-efficient as Y will often be dense, especially in theearly stages of the graph traversal.

Consequently, we provide a generalized functionSPEWISEX that performs the elementwise multiplicationof sparse matrices op!A" and op!B". It also accepts twoauxiliary parameters, notA and notB, that are used to negatethe sparsity structure of A and B. If notA is true, thenop!A"!i; j" # 0 for every non-zero A!i; j" 6# 0, andop!A"!i; j" # 1 for every zero A!i; j" # 0. The role ofnotB is identical. Direct support for the logical NOT oper-ations is crucial to avoid the explicit construction of thedense not!B" object.

(3) To avoid expensive object creation and copying, manyfunctions also have in-place versions. For operationsthat can be implemented in place, we deny access toany other variants only if those increase the runningtime.

Table 2. Summary of the current API for the Combinatorial BLAS

Function Applies to Parameters Returns Matlab Phrasing

Sparse Matrix A, B: sparse matricesSPGEMM (as friend) trA: transpose A if true Sparse Matrix C # A $ B

trB: transpose B if true

SPMV Sparse Matrix A: sparse matrices(as friend) x: sparse or dense vector(s) Sparse or Dense y # A $ x

trA: transpose A if true Vector(s)

Sparse Matrices A, B: sparse matricesSPEWISEX (as friend) notA: negate A if true Sparse Matrix C # A $ B

notB: negate B if true

Any Matrix dim: dimension to reduceREDUCE (as method) binop: reduction operator Dense Vector sum(A)

Sparse Matrix p: row indices vectorSPREF (as method) q: column indices vector Sparse Matrix B # A(p, q)

Sparse Matrix p: row indices vectorSPASGN (as method) q: column indices vector none A(p, q) # B

B: matrix to assign

Any Matrix rhs: any object Check guidingSCALE (as method) (except a sparse matrix) none principles 3 and 4

Any Vector rhs: any vector none noneSCALE (as method)

Any Object unop: unary operatorAPPLY (as method) (applied to non-zeros) None none

Buluc and Gilbert 5

at UNIV CALIFORNIA BERKELEY LIB on June 6, 2011hpc.sagepub.comDownloaded from

Combinatorial BLAS: Distributed-‐memory reference implementa+on

CommGrid

DCSC CSC CSB Triples

SpMat SpDistMat DenseDistMat

DistMat

Enforces interface only

Combinatorial BLAS func7ons and operators

DenseDistVec SpDistVec

FullyDistVec ... HAS A

Polymorphism

Matrix/vector distribu+ons, interleaved on each other.

5

8

€

x1,1

€

x1,2

€

x1,3

€

x2,1

€

x2,2

€

x2,3

€

x3,1

€

x3,2

€

x3,3

€

A1,1

€

A1,2

€

A1,3

€

A2,1

€

A2,2

€

A2,3

€

A3,1

€

A3,2

€

A3,3

€

n pr€

n pc

2D layout for sparse matrices & vectors

-‐ 2D matrix layout wins over 1D with large core counts and with limited bandwidth/compute -‐ 2D vector layout some+mes important for load balance

Default distribu+on in Combinatorial BLAS. Scalable with increasing number of processes

Parallel sparse matrix-‐matrix mul+plica+on algorithm

x =

€

Cij

100K 25K

5K

25K

100K

5K

A B C

2D algorithm: Sparse SUMMA (based on dense SUMMA) General implementation that handles rectangular matrices

Cij += HyperSparseGEMM(Arecv, Brecv)

1D vs. 2D scaling for sparse matrix-‐matrix mul+plica+on 16 BULUC AND GILBERT

0

5

10

15

20

25

30

35

9 36 64 121 150 180 256

Seco

nds

Number of Cores

1.6X

2.1X2.2X

3.1X

66XSpSUMMAEpetraExt

(a) R-MAT � R-MAT product (scale 21).

0

10

20

30

40

50

60

70

4 9 16 36 64 121

Seco

nds

Number of Cores

3.9X

32X

65XSpSUMMAEpetraExt

(b) Multiplication of an R-MAT matrix of scale23 with the restriction operator of order 8.

Fig. 4.10: Comparison of SpGEMM implementation of Trilinos’ EpetraExt packagewith our Sparse SUMMA implementation. The data labels on the plots show thespeedup of our implementation over EpetraExt.

rithms.Finally, with the number of cores per node increasing due to multicore scaling, the

contention on the network interface card increases. Without hierarchical parallelismthat exploits the faster on-chip network, the flat MPI parallelism will be unscalablebecause more processes will be competing for the same network link. Therefore,designing hierarchically SpGEMM and SpRef algorithms is an important future di-rection.

REFERENCES

[1] Franklin, Nersc’s Cray XT4 System. http://www.nersc.gov/users/computational-systems/franklin/.

[2] Mark Adams and James W. Demmel. Parallel multigrid solver for 3d unstructured finiteelement problems. In Supercomputing ’99: Proceedings of the 1999 ACM/IEEE conferenceon Supercomputing, page 27, New York, NY, USA, 1999. ACM.

[3] Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. Minimizing communication innumerical linear algebra. SIAM. J. Matrix Anal. & Appl, 32:pp. 866–901, 2011.

[4] William L. Briggs, Van Emden Henson, and Steve F. McCormick. A multigrid tutorial: secondedition. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

[5] Aydın Buluc and John R. Gilbert. New ideas in sparse matrix-matrix multiplication. InJ. Kepner and J. Gilbert, editors, Graph Algorithms in the Language of Linear Algebra.SIAM, Philadelphia. 2011.

[6] Aydın Buluc and John R. Gilbert. Challenges and advances in parallel sparse matrix-matrixmultiplication. In ICPP’08: Proc. of the Intl. Conf. on Parallel Processing, pages 503–510,Portland, Oregon, USA, 2008. IEEE Computer Society.

[7] Aydın Buluc and John R. Gilbert. On the representation and multiplication of hypersparsematrices. In IPDPS’08: Proceedings of the 2008 IEEE International Symposium on Par-allel&Distributed Processing, pages 1–11. IEEE Computer Society, 2008.

[8] Aydın Buluc and John R. Gilbert. Highly parallel sparse matrix-matrix multiplication. Tech-nical Report UCSB-CS-2010-10, Computer Science Department, University of California,Santa Barbara, 2010. http://arxiv.org/abs/1006.2183.

[9] Aydın Buluc and John R. Gilbert. The Combinatorial BLAS: Design, implementation, andapplications. The International Journal of High Performance Computing Applications,online first, 2011.

•  1-D data layout •  2-D data layout (Combinatorial BLAS)

Almost linear scaling until bandwidth costs starts to dominate

Scaling to more processors…

Scaling proportional to √p afterwards

0.125

0.25

0.5

1

2

4

8

16

4 9 16 36 64 121 256

529 1024

2025 4096

Seco

nds

Number of Cores

Slope = -0.854

Slope = -0.471

Scale-21Compute bound

Bandwidth-bound

Tcomm =O(n p )

Tcomp =O(n)

30

Combinatorial BLAS users (Sep 2013)

•  IBM (T.J. Watson, Zurich, & Tokyo) •  Microsoft •  Intel •  Cray •  Stanford •  UC Berkeley •  Carnegie-Mellon •  Georgia Tech •  Ohio State •  Columbia •  U Minnesota

•  King Fahd U •  Tokyo Inst of Technology

•  Chinese Academy of Sciences

•  U Ghent (Belgium)

•  Bilkent U (Turkey)

•  U Canterbury (New Zealand)

•  Purdue •  Indiana U •  Mississippi State •  UC Merced

31

Outline

•  Motivation





Parallel graph analysis soVware


Graph theory

Computers

Parallel graph analysis soVware


Graph theory

Computers Communica+on Support (MPI, GASNet, etc)

Threading Support (OpenMP, Cilk, etc))

Distributed Combinatorial BLAS

Shared-‐address space Combinatorial BLAS

HPC scien+sts and engineers

Graph algorithm developers

Knowledge Discovery Toolbox (KDT)

•  KDT is higher level (graph abstrac+ons) •  Combinatorial BLAS is for performance

Domain scien+sts

•  (Seman+c) directed graphs –  constructors, I/O –  basic graph metrics (e.g., degree()) –  vectors

•  Clustering / components •  Centrality / authority: betweenness centrality, PageRank •  Hypergraphs and sparse matrices •  Graph primi+ves (e.g., bfsTree()) •  SpMV / SpGEMM on semirings

Domain expert vs. graph expert

Markov Clustering

Input Graph

Largest Component

Graph of Clusters

Markov Clustering

Input Graph

Largest Component

Graph of Clusters




comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clusters = G.cluster(‘Markov’) clusNedge = G.nedge(clusters) smallG = G.contract(clusters) # visualize


Markov Clustering

Input Graph

Largest Component

Graph of Clusters



[…] L = G.toSpParMat() d = L.sum(kdt.SpParMat.Column) L = -L L.setDiag(d) M = kdt.SpParMat.eye(G.nvert()) – mu*L pos = kdt.ParVec.rand(G.nvert()) for i in range(nsteps): pos = M.SpMV(pos)

comp = bigG.connComp() giantComp = comp.hist().argmax() G = bigG.subgraph(comp==giantComp) clusters = G.cluster(‘Markov’) clusNedge = G.nedge(clusters) smallG = G.contract(clusters) # visualize

•  Aimed at domain experts who know their problem well but don’t know how to program a supercomputer

•  Easy-‐to-‐use Python interface •  Runs on a laptop as well as a cluster with 10,000 processors

•  Open source soVware (New BSD license) •  V0.3 release April 2013

A general graph library with opera+ons based on linear

algebraic primi+ves

Knowledge Discovery Toolbox hHp://kdt.sourceforge.net/

38

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%

Betweenness Centrality

H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%


H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%


H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

PageRank

0*1.2!/3%4!56(!%7608.*,6%98556%%

<'()b'*@$&'0&$'$

7)+2)?$%&$%6-3+2'*2$

%=$32.)+$%6-3+2'*2$

7)+4>)&$,%*@$23$%28$

e'>.$7)+2)?$c1)/-'()d$732)&$/0$&-,%}*($

%2&$<'()b'*@$&>3+)$)7)*,0$'63*($%2&$3;2$

)5()&$c,%*@&d8$#.%&$/+3'5>'&2$c'*$G-KPd$%&$

=3,,31)5$/0$'$*3+6',%N'43*$&2)-$

cF3,f%&)d8$b)-)'2$;*4,$>3*7)+()*>)8$$

$

<'()b'*@$%&$2.)$&2'43*'+0$5%&2+%/;43*$3=$'$

K'+@37$F.'%*$2.'2$&%6;,'2)&$'$~+'*536$

&;+=)+Ä8$

{$<R%$s$µ$R%$X$6)&&'()&$

/)21))*$*)%(./3+&$

!"#$"%&'

(%)*+,'-./'#,//*0,/'+1'-+/'2,-03415/'

M';&&%'*$/),%)=$-+3-'('43*$cM'H<d$%&$'*$

%2)+'47)$',(3+%2.6$=3+$&3,7%*($2.)$,%*)'+$

&0&2)6$3=$)W;'43*&$Y?$^$/D$1.)+)$Y$%&$

&066)2+%>$-3&%47)$5)T*%2)8$$

M'H<$'&&;6)&$)'>.$7'+%'/,)$=3,,31&$'$

*3+6',$5%&2+%/;43*8$92$%2)+'47),0$>',>;,'2)&$

2.)$-+)>%&%3*$<$'*5$6)'*$7',;)$µ$3=$)'>.$7'+%'/,)|$2.)$>3*7)+()5$6)'*:7',;)$7)>23+$

'--+3?%6'2)&$2.)$'>2;',$&3,;43*8$

Belief Propagation

Markov Clustering

K'+@37$F,;&2)+%*($cKFId$T*5&$>,;&2)+&$/0$

-3&2;,'4*($2.'2$'$+'*536$1',@$2.'2$7%&%2&$

'$5)*&)$>,;&2)+$1%,,$-+3/'/,0$7%&%2$6'*0$3=$

%2&$7)+4>)&$/)=3+)$,)'7%*(8$

$

f)$;&)$'$K'+@37$>.'%*$=3+$2.)$+'*536$

1',@8$#.%&$-+3>)&&$%&$+)%*=3+>)5$/0$'55%*($

'*$%*U'43*$&2)-$2.'2$;&)&$2.)$Z'5'6'+5$

-+35;>2$'*5$+)&>',%*(8$

6:8#!%0*1.2!/3%;<=,%>8,%?*,#!,%%


H)21))**)&&$F)*2+',%20$&'0&$2.'2$'$7)+2)?$

%&$%6-3+2'*2$%=$%2$'--)'+&$3*$6'*0$

&.3+2)&2$-'2.&$/)21))*$32.)+$7)+4>)&8$

Y*$)?'>2$>36-;2'43*$+)W;%+)&$'$HQG$=3+$

)7)+0$7)+2)?8$Y$(335$'--+3?%6'43*$>'*$

/)$'>.%)7)5$/0$&'6-,%*($&2'+4*($7)+4>)&8$

$

6:8#!%0*1.2!/3%@581A6*%B*00C6,6%

A few KDT applications

Example: •  Vertex types: Person, Phone,

Camera, Gene, Pathway •  Edge types: PhoneCall, TextMessage,

CoLoca+on, SequenceSimilarity •  Edge aHributes: Time, Dura+on

•  Calculate centrality just for emails among engineers sent between given start and end +mes

AHributed seman+c graphs and filters

def onlyEngineers (self): return self.position == Engineer def timedEmail (self, sTime, eTime): return ((self.type == email) and (self.Time > sTime) and (self.Time < eTime)) G.addVFilter(onlyEngineers) G.addEFilter(timedEmail(start, end)) # rank via centrality based on recent email transactions among engineers bc = G.rank(’approxBC’)

KDT$Algorithm$

CombBLAS$Primi4ve$

Filter$(Py)$

Python'

C++'

Semiring$(Py)$KDT$Algorithm$

CombBLAS$Primi4ve$ Filter$(C++)$

Semiring$(C++)$

Standard$KDT$ KDT+SEJITS$

SEJITS$$$$Transla4on$

Filter$(Py)$

Semiring$(Py)$

SEJITS for filter/semiring accelera+on

KDT$Algorithm$

CombBLAS$Primi4ve$

Filter$(Py)$

Python'

C++'

Semiring$(Py)$KDT$Algorithm$

CombBLAS$Primi4ve$ Filter$(C++)$

Semiring$(C++)$

Standard$KDT$ KDT+SEJITS$

SEJITS$$$$Transla4on$

Filter$(Py)$

Semiring$(Py)$

SEJITS for filter/semiring accelera+on

Embedded DSL: Python for the whole applica+on •  Introspect, translate Python to equivalent C++ code •  Call compiled/op+mized C++ instead of Python

Filtered BFS with SEJITS

!"#$%!"$!%&"!!%#"!!%'"!!%("!!%&)"!!%*#"!!%)'"!!%

&#&% #$)% $+)% &!#'% #!#$%

!"#$%&'

(%)*

"%

+,*-".%/0%!12%3./4"55"5%

,-.% /012./3,-.% 456789:/%

Time (in seconds) for a single BFS itera+on on scale 25 RMAT (33M ver+ces, 500M edges) with 10% of elements passing filter. Machine is NERSC’s Hopper.

43

Outline

•  Motivation





44

History of BLAS

•  Separation of concerns: •  Experts in mapping algorithms onto hardware tuned BLAS to specific platforms.

•  Experts in linear algebra built software on top of the BLAS to obtain high performance “for free”.

•  Today every computer, phone, etc. comes with /usr/lib/libblas!

The Basic Linear Algebra Subroutines had a revolutionary impact

on computational linear algebra.

BLAS 1 vector ops Lawson, Hanson, Kincaid, Krogh, 1979

LINPACK

BLAS 2 matrix-vector ops Dongarra, Du Croz, Hammarling, Hanson, 1988

LINPACK on vector machines

BLAS 3 matrix-matrix ops Dongarra, Du Croz, Hammarling, Hanson, 1990

LAPACK on cache based machines

45

•  No, it is not reasonable to define a universal set of graph algorithm building blocks: –  Huge diversity in matching algorithms to hardware platforms.

–  No consensus on data structures and linguistic primitives. –  Lots of graph algorithms remain to be discovered.

–  Early standardization can inhibit innovation.

•  Yes, it is reasonable to define a common set of graph algorithm building blocks … for Graphs as Linear Algebra: –  Representing graphs in the language of linear algebra is a mature

field.

–  Algorithms, high level interfaces, and implementations vary. –  But the core primitives are well established.

Can we define and standardize the “Graph BLAS”?

47

Conclusion

•  It helps to look at things from two directions.

•  Sparse arrays and matrices yield useful primitives and algorithms for high-performance graph computation.

•  Graphs in the language of linear algebra are sufficiently mature to support a standard set of BLAS.

Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...

Documents

Transcript of Sparse Matrices for High-Performance Graph Analyticsgilbert/talks/GilbertCLSAC17oct2013.pdf · 3...