The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen...

40
The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321 Brühl, Germany [email protected] http://www.pallas.c om SCICOMP 2000 Tutorial, San Diego

Transcript of The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen...

Page 1: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

The Vampir Performance Analysis Tool

Hans–Christian Hoppe

Gesellschaft für Parallele Anwendungen und Systeme mbH

Pallas GmbHHermülheimer Straße 10D-50321 Brühl, Germany

[email protected]://www.pallas.com

SCICOMP 2000 Tutorial, San Diego

Page 2: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Outline

Performance tools for parallel programming

Performance analysis for MPI

The Vampir tool

The Vampir roadmap

Page 3: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Why performance tools?

CPUs and interconnects are getting faster all the time

Compilers are improving

“Abundance of computing power”

Shouldn’t it be sufficient to just write an application and let the system do the rest?

Page 4: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Why performance tools?

In reality, there remain severe performance bottlenecks– slow memory access (instructions and data)– cache consistency effects– starvation of instruction units– contention of interconnection systems– adverse interaction with schedulers

Page 5: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Why performance tools?

The application programmer does the rest– excessive sequential sections– bad load balance– non–optimized communication patterns– excessive synchronization

Performance analysis tools can– help to diagnose system–level performance problems– help to identify user–level performance bottlenecks– assist the users in improving their applications

Page 6: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Achieved performance vs. effort

Effort

Cod

e P

erfo

rman

ce

OpenMP

MPI

Code doesn’t work

Performance tools

Performance tools

KAP, Debuggers

Page 7: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Performance tools – goals?

Holy grail– Automatic parallelisation and optimization– One code version for sequential and parallel– One code version for all platforms– Automatic code verification– Automatic performance verification– Automatic detection of performance problems– Integration of performance analysis and parallelisation

Page 8: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Event–based MPI Analysis

Record trace of application execution– Calls to MPI and user routines– MPI communication events– Source locations– Values of performance registers or program variables

From a trace, a performance analysis tool can show– Protocol of execution over time– Statistics for MPI routine execution– Statistics for communication– Dynamic calling tree

Important advantage– Focus on any phase of the execution

Page 9: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Vampirtrace details

Vampirtrace™– Instrumentation library producing traces for Vampir and

Dimemas– Supports MPI–1 (incl. collective operations) and MPI–I/O– Exploits MPI profiling interface– Works with vendors MPI implementations– API for user–level instrumentation– Capability to filter for event subsets

Developed, productized and marketed by Pallas

Available for IBM SP, PE 3.x

Page 10: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Vampir details

Vampir™– Event–trace visualization tool– Analyzes MPI and user routines– Analyzes point–to–point, collective and MPI–IO operations– Focus on arbitrary execution phases– Execution and communication statistics– Filter processes, messages, and user/MPI routines

Jointly developed by TU Dresden and Pallas Productized and marketed by Pallas

Available for IBM RS6000, AIX 4.2/AIX 4.3

Page 11: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Dimemas details

Dimemas– Event–based performance prediction tool– Parameterized machine model

•CPU performance•Communication and network performance

– Predicts performance on modeled platform– What–if analysis determined influence of parameters

Jointly developed by UPC Barcelona and Pallas

Productized and marketed by Pallas

Available for IBM RS6000, AIX 4.2/AIX 4.3

Page 12: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Vampir main window

Vampir 2.5 main window

Tracefile loading can be interrupted at any time Tracefile loading can be resumed Tracefile can be loaded starting at a specified time offset Tracefile can be re–written

Page 13: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Aggregated profiling information– Execution time– Number of calls

Inclusive or exclusive of called routines

Summary chart

Page 14: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Vampir state model

User specifies activities and symbol grouping Look at all/any activities or all symbols

Summary chart

Calculation TracingMPI

MPI_Send

MPI_Recv

MPI_Wait

ssor

exchange

Activities

Symbols

Page 15: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Timeline display

To zoom, mark region with the mouse

Page 16: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Timeline display – message details

Click on message line

Message receive op

Messagesend op

Message information

Page 17: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Communication statistics

Message statistics for each process/node pair:– Byte and message count– min/max/avg message length, bandwidth

Page 18: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Message histograms

Message statistics by length, tag or communicator– Byte and message count– Min/max/avg bandwidth

Page 19: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Collective operations

For each process: mark operation locally

Connect start/stop points by lines

Start of opData being sent

Data being received

Stop of op

Connection lines

Page 20: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Collective operations

Click on collective operation display

See global timing info

See local timing info

Page 21: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

I/O transfers are shown as lines

MPI–I/O operations

Click on I/O line

See detailed I/O information

Page 22: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Activity chart

Profiling information for all processes

Page 23: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Global calling tree

Display for each symbol:– Number of calls, min/max. execution time

Fold/unfold or restrict to subtrees

Page 24: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Process–local displays

Timeline (showing calling levels) Activity chart Calling tree (showing number of calls)

Page 25: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Effects of zooming

Select one iteration

Updated summary

Updated message statistics

Page 26: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Compare traces

Compare profiling information– To check load balance (between processes)– To evaluate scalability (different runs)– To look at optimization effects (different code versions)

Compare processes 6 and 19

Comparison by routine

Page 27: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Coupling Vampir and Dimemas

Actual program run

vs.

Ideal communication

Page 28: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Vampir/Vampirtrace roadmap

Ongoing developments– Scalability enhancements– Functionality enhancements– Instrumentation enhancements

Will be first available commercially on NEC and Compaq platforms

– Earth simulator– ASCI machines

PathForward developments for ASCI machines

Page 29: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Scalability challenges

Scalability in processor count– ASCI–class machines have 1000s of processors– High–end systems have 100s of processors– Applications use most of them

Scalability in time– Need to analyze actual production runs (hours/days)

Scalability in detail– Record and analyze system–specific performance data– Support for threaded and hybrid models

Page 30: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Scalability problems

Counter–based profiling tools are basically OK– Severely limited in the level of detail– Can’t focus into parts of application run

Event–based tools have problems– Event traces get really large– Display tools use huge amounts of memory– Many displays do not scale

Example: Vampir tracefiles for NAS NPB–LU– 128 processes: 3.000.000 records (120 Mbyte)– 256 processes: 15.000.000 records (600 Mbyte)– 512 processes: 150.000.000 records (6 Gbyte)

Page 31: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Threaded programming models

Enhance Vampir to display– Thread fork/join– Thread synchronization– Show a timeline per thread / aggregate threads into single

timeline– Display subroutine/code block execution for each thread

Create instrumentation library for thread packages

Integrate instrumentation capability into OpenMP systems

Page 32: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Cluster node display

Cluster information is already recorded Enhance Vampir to

– show aggregate execution information per node– show communication volume per node

Page 33: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Cluster timeline display

Display node–level information Show communication volume within nodes Show communication between nodes as usual Allow to expand nodes into processes

There may be more than two hierarchy levels ...

Page 34: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Cluster timeline display

Page 35: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Structured tracefile format

Subdivide the tracefile into frames– Time intervals, thread/process/node subsets

Put frame data – All in one file (as today)– In multiple files (one per frame ...)– On a parallel filesystem (exploit parallelism)

Frame index file holds– Location of frame start/end– Frame statistic data for immediate display– “Frame thumbnail”

Page 36: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Structured tracefile format

Vampir loads the frame index Displays immediately available

– Global profiling/communication statistics– By–frame profiling/communication statistics– Thumbnail timeline

User gets overview of application run– Can load particular frame data– Can navigate between frames

User can refine instrumentation/tracing– Get detailed trace of interesting frames

Page 37: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Dynamic tracing control

What can be controlled– Definition of frames– Data to be recorded per frame

Control methods– Instrumentation with Vampirtrace API– Binary instrumentation (atom) or use of a debugger– Configuration file– Interactive control agent (debugger)

Tracing the right data is an iterative process!

Page 38: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Cluster timeline display

For very large systems, still can’t look at complete system (too many nodes)

Display “interesting” nodes only– Regarding communication volume/delays– Regarding load imbalance– Regarding execution times of particular code modules

Page 39: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Scalable Vampir structure

Scalable user–interface Scalable internals

Data Control

Vampir SC

User Interaction

Trace Data Processing

Trace Data I/O

Data Control

Vampir DC

User Interaction

Trace Data Analysis

Display Handling

Structured Trace Data

runs on WS

runs on parallelsystem

may exploit parallel

FS

Page 40: The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

© Pallas GmbH

Access to Pallas tools

Download free evaluation copies from

http://www.pallas.com