Fakultät für informatik informatik 12 technische universität dortmund Worst-Case Execution Time...

38
fakultät für informatik informatik 12 technische universität dortmund Worst-Case Execution Time Analysis - Session 19 - Heiko Falk TU Dortmund Informatik 12 Germany Slides use Microsoft cliparts. All Microsoft restrictions apply.

Transcript of Fakultät für informatik informatik 12 technische universität dortmund Worst-Case Execution Time...

fakultät für informatikinformatik 12

technische universität dortmund

Worst-CaseExecution Time Analysis

- Session 19 -

Heiko FalkTU DortmundInformatik 12

Germany

Slides use Microsoft cliparts. All Microsoft restrictions apply.

- 2 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Schedule of the course

Time Monday Tuesday Wednesday Thursday Friday

09:30-11:00

1: Orientation, introduction

2: Models of computation + specs

5: Models of computation + specs

9: Mapping of applications to platforms

13: Memory aware compilation

17: Memory aware compilation

11:00  Brief break  Brief break Brief break   Brief break

11:15-12:30

6: Lab*: Ptolemy

10: Lab*: Scheduling

14: Lab*: Mem. opt.

18: Lab*: Mem. opt.

12:30 Lunch Lunch Lunch Lunch Lunch

14:00-15:20

3: Models of computation + specs

7: Mapping of applications to platforms

11: High-level optimizations*

15: Memory aware compilation

19: WCET & compilers*

15:20 Break  Break Break Break Break

15:40-17:00

4: Lab*: Kahn process networks

8: Mapping of applications to platforms

12: High-level optimizations*

16: Memory aware compilation

20: Wrap-up

* Dr. Heiko Falk

- 3 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Outline

Worst-Case Execution Time Definition Static WCET Analysis WCET-aware Compilation and Optimization

Function Specialization / Procedure Cloning Introduction and Code Examples Function Specialization and WCETEST

ResultsReferences & Summary

- 4 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Worst-Case Execution Time

Definition of the Worst-Case Execution Time (WCET): Upper bound for the execution time of a program irrespective of the program’s input data.

Problem:Determination of a program’s actual WCET is not computable!(WCET computation can be reduced to the Halting Problem)

- 5 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Estimation of Worst-Case Execution Times

Solution: Estimation of upper bounds for the actual (unknown) WCET

Requirements on WCET estimates: Safeness: WCET WCETEST! Tightness: WCETEST – WCET minimal

- 6 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop BoundAnalysis

AbsInt’sC R L2

W C ETEstim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

Workflow of the static WCET-Analyzer aiT

Input: Binary executable program P to be analyzed.

exec2crl: Disassembler, translates P into aiT’s own LIR CRL2.

Value Analysis: Determines potential contents of processor registers for any point in P’s execution time.Note: P is never executed, emulated or simulated by aiT! Only static analyses are applied.

- 7 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop BoundAnalysis

AbsInt’sC R L2

W C ETEstim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

Workflow of the static WCET-Analyzer aiT

Loop Bound Analysis: Tries to determine lower and upper iteration bounds for each loop of P.

Cache Analysis: Uses formal cache model, classifies each memory access within P as guaranteed cache hit or as cache miss.

- 8 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop BoundAnalysis

AbsInt’sC R L2

W C ETEstim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

Workflow of the static WCET-Analyzer aiT

Pipeline Analysis: Includes accurate model of processor pipeline.Depending on initial pipeline states, possible cache states etc., possible final pipeline states are determined per basic block.Result of pipeline analysis is the WCETEST of each basic block.

- 9 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop BoundAnalysis

AbsInt’sC R L2

W C ETEstim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

Workflow of the static WCET-Analyzer aiT

Path Analysis: Models all possible execution paths of P under consideration of the WCETEST of all basic blocks and computes the overall longest execution path leading to the global WCETEST of P.Output of aiT is e.g. the length of this longest path, i.e. the WCETEST of P.

- 10 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop BoundAnalysis

AbsInt’sC R L2

W C ETEstim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

Complexity of Static WCET-Analysis

Problem: WCET analysis intractable for computers! If WCET were computable, one could solve the Halting Problem by checking in O(1) if WCET < holds.

Reason: It is not computable how long P stays in a loop. Automatic loop bound analysis only feasible for simple classes of loops.(Similar problems arise for recursive functions.)

- 11 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop BoundAnalysis

AbsInt’sC R L2

W C ETEstim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

User Annotations for WCET Analysis

Way out: aiT user must provide information about e.g. minimal / maximal loop iterations and recursion depths.

Annotations file: Contains such user annotations (“Flow Facts”) and is – amongst program P to be analyzed – one more input to aiT.

Loop BoundAnalysis

AbsInt’sC R L2

C R L2 w ithW C ET

Estim ates

D ecoderexec2crl

ValueAnalysis

PathAnalysis

P ipelineAnalysis

Executable C acheAnalysis

U ser Anno-tations

[AbsInt Angewandte Informatik GmbH,

http://www.absint.com ]

- 12 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

ICD-CParser

ANSI C ICD-CIR

CodeSelector

LLIR

RegisterAllocator

LLIR CRL2

CRL2

aiT WCET

Analysis

CRL2 +

WCETEST

CRL2 LLIR

WCET-Opt. ASM

Analyses,

Optimi-

zations

A WCET-Aware C-Compiler (WCC)

Distinctive Feature:Tight integration of WCET analyzer aiT into WCC’s backend.

- 13 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Tight Coupling WCC aiT

• CRL2: Low-level intermediate representation (LIR) of aiT.• Structure of CRL2: Control Flow Graph (CFG) consisting of routines, basic blocks, instructions and operations.

• WCET analysis applied to low-level machine code coupling WCC aiT done in compiler backend.• Converter LLIR2CRL: translates WCC’s LIR into CRL2, executes aiT in the background.• After successful WCET analysis: relevant WCET data extracted from CRL2, imported into WCC.

- 14 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Imported WCET Data & Flow Facts in WCC

Most important WCET data imported into WCC: WCETEST of the entire program WCETEST of each single basic block Worst-Case execution frequencies of each CFG edge

Flow Fact Annotation within WCC: Annotation of e.g. loop iteration bounds directly in C source code: _Pragma( “loopbound min 10 max 10” ); Since compiler optimizations may restructure loops and thus their annotated bounds, WCC automatically keeps

Flow Facts consistent during all applied optimizations.

- 15 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Problems during WCETEST Minimization

The Worst-Case Execution Path (WCEP): WCET of a program P = length of longest execution path of P (WCEP) To minimize P’s WCETEST, optimizations must exclusively focus on those parts of P lying on the WCEP.

Optimization of parts not lying on the WCEP don’t reduce WCETEST at all!

Optimization strategies for WCETEST Minimization must have detailed knowledge about the WCEP.

WCET Analyzer aiT provides this detailed information about the WCEP, but…

- 16 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Switching WCEPs (1)

main() {

if ( … )

a();

else

d();

return;

}

a() {

b();

}

b() {

c();

}

d() {

c();

}

main

a

b

c

d

10 Cyc.

50 Cyc.

80 Cyc.

65 Cyc.

120 Cyc.WCETEST of a for

a in Main Mem.

Example:Scratchpad allocation of functions for WCETEST minimizationAbove: Function Call Graph

- 17 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Switching WCEPs (2)

main() {

if ( … )

a();

else

d();

return;

}

a() {

b();

}

b() {

c();

}

d() {

c();

}

main

a

b

c

d

10 Cyc.

50 Cyc.

80 Cyc.

65 Cyc.

120 Cyc.

WCETEST = 205 Cyc.

• Initial WCEP: main, a, b, c• Length of WCEP = WCETEST: 205• SPM-Allocation of b reduces WCETEST of b down to 40

- 18 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Switching WCEPs (3)

main() {

if ( … )

a();

else

d();

return;

}

a() {

b();

}

b() {

c();

}

d() {

c();

}

main

a

b

c

d

10 Cyc.

50 Cyc.

40 Cyc.

65 Cyc.

120 Cyc.

• Initial WCEP: main, a, b, c• Length of WCEP = WCETEST: 205• SPM-Allocation of b reduces WCETEST of b down to 40

- 19 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Switching WCEPs (4)

main() {

if ( … )

a();

else

d();

return;

}

a() {

b();

}

b() {

c();

}

d() {

c();

}

main

a

b

c

d

10 Cyc.

50 Cyc.

40 Cyc.

65 Cyc.

120 Cyc.

WCETEST = 195 Cyc.

• New WCEP: main, d, c• New WCETEST: 195WCEP has switched as side effect of optimization!

- 20 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Consequences for Compiler Optimizations

Optimizations aiming at WCETEST minimization… …always have to consider that the WCEP may switch after each and every decision taken by the optimization. …should not take their decision where to optimize on the basis of local information. Instead, they should always consider the global effects of a decision.

(SPM allocation of b in previous example leads to a local reduction of b’s WCETEST by 40 cycles. However, a global reduction of only 10 cycles was achieved!)

Challenge: To design completely new optimization strategies matching the above requirements and always considering the global CFG with its current WCEP.

- 21 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Function Specialization

Also known as “Procedure Cloning”: Well-known optimization technique (1993) Goals: To enable further standard optimizations and to reduce overhead for parameter passing during function calls Approach: For worthwhile functions, specialized copies (“Clones”) are created having less parameters than their original versions.

- 22 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Typical Function Calls

int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i];

if ( i == 10 ) { ... }

}

return x[n];

}

int main() {

... f( y, 5, 2 ) ...

... f( z, 5, 2 ) ...

return f( a, 5, 2 );

}

Observations: f is called several times. f has three parameters. Constants 5 and 2 are passed several times for parameters n and p.

- 23 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Further Observations

int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i];

if ( i == 10 ) { ... }

}

return x[n];

}

int main() {

... f( y, 5, 2 ) ...

... f( z, 5, 2 ) ...

return f( a, 5, 2 );

}

• f is so-called general purpose function: Via parameter n, an array x of arbitrary size can be processed.• Control flow within f depends on function parameter.

Within main, f is used for special purpose:Processing of arrays ofsize n = 5.

- 24 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i];

if ( i == 10 ) { ... }

}

return x[n];

}

int main() {

... f_5_2( y ) ...

... f_5_2( z ) ...

return f_5_2( a );

}

int f_5_2( int *x ) { for (i=0; i<5; i++) { x[i] = 2 * x[i];

if ( i == 10 ) { ... }

}

return x[5];

}

Specialization of General Purpose Routines

- 25 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Expected Effects of Function Specialization

Reduction of the Average-Case Execution Time (ACET): Enabling of further standard optimizations in cloned function:

• Constant propagation and constant folding• Strength reduction• Control flow simplification

Less code to be executed in order to pass arguments to cloned function.

Increase of Code Size: Potentially, many new highly specialized clones are generated.

- 26 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Why Function Specialization and WCETEST?

Motivation: In Embedded Software: General Purpose functions heavily used in special purpose contexts. In particular, loop bounds are very often controlled by function parameters. Recall: Loop bounds are extremely critical for a precise WCET analysis. Function Specialization enables the extremely precise annotation of loop bounds!

[P. Lokuciejewski, H. Falk et al., Influence of Procedure Cloning on WCET Prediction, Salzburg, CODES+ISSS 2007]

- 27 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop Bound Annotations & Cloning

int f( int *x, int n, int p ) { // loopbound min 0, max n for (i=0; i<n; i++ ) {

x[i] = p * x[i];

if ( i == 10 ) { ... }

}

return x[n];

}

Original Code:Loop annotations extremely imprecise since annotation must cover all calls of f in a safe way.

If f is called somewhere with n = 2000 as maximal value, the annotation must always consider a maximum of 2000 iterations.

For calls like e.g. f(a, 5, 2), 2000 iterations are considered as well Massive WCET overestimation.

- 28 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Loop Bound Annotations & Cloning

int f_5_2( int *x ) { // loopbound min 5, max 5 for (i=0; i<5; i++) {

x[i] = 2 * x[i];

if ( i == 10 ) { ... }

}

return x[5];

}

Specialized Code:Loop annotations extremely precise since annotation covers all calls of f_5_2 exactly.

For calls like e.g. f_5_2(a), exactly 5 iterations are considered Exact WCET estimation.

Original function f or other additional clones of f fully independent of annotations within f_5_2 no interdependencies during WCET analysis for all these different clones of f.

- 29 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Workflow of Function Specialization

Attention when applying Function Specialization: Code size increases should be kept clearly in mind during specialization! Cloning of each function which is called at least twice with the same constant parameter usually unacceptable.

Parameterized Application of Function Specialization: Size G: Never specialize a function f having a size larger than G. Fraction K of same constant parameters: clone f only iff at least K% of all calls of f use the same constant parameters.

- 30 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Workflow of Function Specialization

Com pilerBackend

B inaryExecutable

ANSI-CSource

O ptim izedH igh-Level

IR

W CET

Tim ingAnalyzer

S im ulator

SimulatedTime

FlowFacts

IC D -CO ptim izer

• Function Specialization• Const. Folding & Propagation• Strength Reduction• If-Statement Optimization

• G = 2000 ICD-C Expressions• K = 50%

Considered Processors:• Infineon TriCore TC1796• ARM 7 TDMI (ARM & THUMB instruction sets)

- 31 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Relative WCETEST after Function Specialization

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

TriCore ARM Thumb

Rel

. WC

ET

ES

T [

%]

EPIC MPEG2 GSM Average

WCETEST improvements between 13% and up to 95%!

- 32 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Relative ACETs after Function Specialization

0%10%

20%30%40%50%

60%70%80%90%

100%110%

TriCore ARM Thumb

Re

l. A

CE

T [

%]

EPIC MPEG2 GSM Average

• Only marginal ACET improvements of max. 3%. Influence of overhead for parameter passing and effect of successive optimizations on ACET obviously marginal.

- 33 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Discussion of Results

Very astonishing: How can one optimization impact two apparently similar objectives like WCET and ACET so differently?

Reasons: Code before specialization only poorly annotatable aiT provides very imprecise WCET estimates

Code after specialization very precisely annotatable aiT provides highly precise WCET estimates

Function Specialization obviously effective in increasing the tightness of WCETEST (cf. Slide 4).The actually unknown (since intractable) WCET should be reduced by only a similar order of magnitude as the ACET.

- 34 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Relative Code Sizes after Specialization

0%25%50%75%

100%125%150%175%200%225%250%275%300%325%350%

TriCore ARM Thumb

Re

l. C

od

e S

ize

[%

]

EPIC MPEG2 GSM Average

Code size increases between 2% up to 325%!

- 35 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

References

Function Specialization: D. Bacon, S. Graham, O. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys 26 (4), 1994. P. Lokuciejewski, H. Falk, M. Schwarzer et al., Influence of Procedure Cloning on WCET Prediction, CODES+ISSS, Salzburg 2007.

- 36 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Summary

WCET Upper bound of a program’s execution time Impossible to compute a program’s actual WCET Static WCET analysis using aiT

Function Specialization / Procedure Cloning Specialization of general purpose functions Enabling of standard compiler optimizations in specialized functions, simplification of code required for parameter passing Small impact on ACET, huge impact on WCETEST

- 37 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

12th International Workshop on Software & Compilersfor Embedded Systems April 23-24, 2009

Acropolis, Nice, France Co-located with DATE 2009 Paper Submission: around mid November 2008

Website & CfP: http://www.scopesconf.orgMailing List: http://www.scopesconf.org/list

- 38 -technische universitätdortmund

fakultät für informatik

h. falk, informatik 12, 2008

TU Dortmund

Coffee/tea break (if on schedule)

Q&A?