Heiko Schröder, 2003

81
Heiko Schröder, 2003 OUTING ? orting? mage Processing? parse Matrices? Reconfigurable Meshes !

description

ROUTING ? Sorting? Image Processing? Sparse Matrices?. Reconfigurable Meshes !. Heiko Schröder, 2003. Reconfigurable architectures. FPGAs reconfigurable multibus reconfigurable networks (Transputers, PVM) dynamically reconfigurable mesh Aim: efficiency - PowerPoint PPT Presentation

Transcript of Heiko Schröder, 2003

Page 1: Heiko Schröder,  2003

Heiko Schröder, 2003

ROUTING ?Sorting?Image Processing?Sparse Matrices?

Reconfigurable Meshes !

Page 2: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 2

Reconfigurable architecturesReconfigurable architectures

• FPGAs

• reconfigurable multibus

• reconfigurable networks (Transputers, PVM)

• dynamically reconfigurable mesh

Aim:efficiency

special purpose --> general purpose architectures

Page 3: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 3

contentscontents

1.) Motivation for the reconfigurable mesh

2.) Routing (and sorting):• better than PRAM

• better than mesh

3.) Image processing

4.) Sparse matrix

multiplication

5.) Bounded bus length

Page 4: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 4

PRAMPRAM

0 1 2 3 4 5 6 7 8 9

8 976 5432 0 1

0 1 2 3 4 5 6 7 8 9

diameter O(1) bisection width (n)

cut

EREW CRCW

Page 5: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 5

Mesh/TorusMesh/Torus

Diameter ( ) bisection width ( )

nn

2D mesh

Page 6: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 6

HypercubeHypercube

0-D0

11-D

00

01

10

112-D

000 010

001 011

100 110

101 111

3-D

0 1

4-D

diameter O(log n)bisection width (n)

Page 7: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 7

reconfigurable meshreconfigurable mesh

reconfigurable mesh = mesh + interior connections

15 positionsdiameter 1 !!

low cost

Page 8: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 8

global ORglobal OR

1 0 000 1 0

* * “V”

Time: O(1) on RM-- (log n) on EREW-PRAM

Page 9: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 9

Prefix sumPrefix sum

0 1 1 0 1 0 0 1 1 1

*

6

012345

789

Fast butexpensive

Time : O(1)Area: (nxn)

Page 10: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 10

Modulo 3 counterModulo 3 counter

10 11 10

*1 mod 3

Time: O(1) on RM (log n / log log n) on CRCW-PRAM

Page 11: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 11

• 2 digit numbers to the basis of k represent all numbers smaller than k2.

• 1.) determine x mod k (=lsd)

• 2.) count number of “wraps” (=msd).

modulo k2 counter (ranking)modulo k2 counter (ranking)

10 11 10

*1 mod k

--> modulo k2 counting in 2 steps on a k x k2 array

Page 12: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 12

enumeration / prefix sumenumeration / prefix sum

1 1 1 1 1 1 1 11 2 1 2 1 2 1 21 2 3 4 1 2 3 41 2 3 4 5 6 7 8

time: O(log n)

wire efficiency ! -- (compared with tree)1/2 number of processors

Page 13: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 13

permutation routing - 2 stepspermutation routing - 2 steps

n x n

2 steps !!!

Page 14: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 14

Kunde’s all-to-all mappingKunde’s all-to-all mapping

Sorting:sort blocksall-to-all (columns)sort blocks all-to-all (rows)o-e-sort blocks

Page 15: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 15

sorting in constant timesorting in constant time

n2

3

n1

3

Complete sort: sort blocks all-to-all (2) sort blocks all-to-all (2) o-e-sort blocks

block

broadcast (1)

Sort blocks:

broadcast (1)

rank (2)

Page 16: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 16

• better than PRAM --- but useless!!!

Page 17: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 17

Kunde’s all-to-all mappingKunde’s all-to-all mapping

n2

3

n x n

Page 18: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 18

vertical all-to-allvertical all-to-all

Page 19: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 19

horizontal all-to-allhorizontal all-to-all

Page 20: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 20

Use of bus -- no conflictUse of bus -- no conflict

1 step

2 steps

3 steps

k/2 steps

3 steps

2 steps

1 step

(k/2)2 steps

Page 21: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 21

sorting in optimal time Kunde / Schröder

sorting in optimal time Kunde / Schröder

(k/2)2 stepsk=n1/3

each step takes n1/3 time --> T= n/4

x 2

T = n/2all-to-all

Sorting:sort blocks (O(n2/3))all-to-all (n/2)sort blocks (O(n2/3))all-to-all (n/2)o-e sort blocks (O(n2/3))(snake like order of blocks)

time: n + o(n)

x 2

/2

Page 22: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 22

Why optimal?Why optimal?

Sorter for n keys

Bisection of data with k wires

Sorting time n/k

Page 23: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 23

Use of theoremUse of theorem

1.) n keys on a kxk RM:Time n/k

Proof:Wherever the data is stored there is always a bisection of length k-- this can be demonstrated sweeping left right through the array.Q.e.d.

2.) nxn keys on an nxn RM:Time n.

Proof: trivial

Page 24: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 24

n + o(n)n + o(n)

Optimal --- but ...

Page 25: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 25

enumeration / prefix sumenumeration / prefix sum

1 1 1 1 1 1 1 11 2 1 2 1 2 1 21 2 3 4 1 2 3 41 2 3 4 5 6 7 8

time: O(log n)

wire efficiency ! -- (compared with tree)1/2 number of processors

Page 26: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 26

• move and smooth

ABCD-routingABCD-routing

A

BC

D

Row-major enumeration of A, B, C and D packets within each quadrant in time 4 log n.Determine destination position of each packet.

Page 27: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 27

elementary stepselementary steps

21

108

5

7

9

36

4

21

108

5

7

9

36

4

move

21

109

5

7

8

36

4

smooth

21109

5 78

3 64

collect

Page 28: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 28

time analysistime analysis

A B C D

move smooth

A B C D

collect

time: 3 x n/2

T=3n+o(n)

Page 29: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 29

T < 2nT < 2n

4 destination squarestime: 3n + 4 log n

16 destination squarestime: 2n + 16 log n

64 destination squarestime: 12/7 n + 64 log n

mesh-diameter: 2n

Page 30: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 30

enough of routing/sortingenough of routing/sorting

Constant factor !Can we do better ?What kind of problems ?

Image processingSparse problems !

Page 31: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 31

Image processingImage processing

•Border following

•Edge detection

•Component labeling

•Skeletons

•Transforms

Page 32: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 32

Component labellingComponent labelling

ObjectDefine border (candidates)Set bus

While own label is not received:1.) Candidates brake busand send their label a) clockwiseb) anti-clockwise2.) Candidates switch offand restore bus if they see smaller labelTime: O(1) -- O(log n)

Page 33: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 33

TransformsTransforms

• Wavelet transform: Time log n on RM

-- time n on mesh

• FFT: Time n on RM and mesh

• Hough transform: Time m x log n on RM

-- time m x n on mesh

Page 34: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 34

systolic matrix multiplicationsystolic matrix multiplication

B

A C

c a bij ik kjk

n

1

time: ni

j

ijc

Page 35: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 35

sparse matrix multiplicationsparse matrix multiplication

Ax

B=

C

c a bij ik kjk

n

1

Time: n (nxn mesh)

A and B column sparse (k2)A and B row sparse (k2)A row sparse, B column sparse (k2)A column sparse, B row sparse k n

Page 36: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 36

unlimited bus lengthunlimited bus length

• ring broadcast

1 2 32 2 2 2 2 2 2 2 2 3 3 3 3 33 3 3 3 1

2 2 2 2 3 1 1 1 1 1 1 1 1 1 2 2 2 2 21 1 1 1 2 3 3 3 3 3 3 3 3 3 1 1 1 1 1

Page 37: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 37

A row-sparse B column-sparseA row-sparse B column-sparse

Repeat r times

Begin

horizontal ring broadcast aik

Repeat c times

vertical ring broadcast bkj

End. B

A C

r

c

Page 38: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 38

A and B column-sparseA and B column-sparse

Repeat c1 times

Begin

horizontal ring broadcast aik (meets bkj)

Repeat c2 times

vertical ring broadcast product to final position

End.

A B/C

c1

c2 elements

T

i

i}k{

j

Page 39: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 39

lower bound (c,r) c=r=klower bound (c,r) c=r=k

c a bij ik kjk

n

1

A

B

C

k=3

nk

nk

n=48

t nk

Page 40: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 40

splitting the problemsplitting the problem

Repeat k times

Begin

vertical ring broadcast

Repeat s times

horizontal ring broadcast

End.

A B/C

first s

s

k B-elements

T

A=As +Ar

C=AsB+ArB

s

time: ks

s

C=C+Cs r

Page 41: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 41

CRCR

A

first s

s

s

A has nk non-zero elements Ar has at most nk/s non-zero rows for s= n Ar has at most k n non-zero rows.

As B is a RR- problem it takes time k n .

A=As +Ar

Page 42: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 42

Ar B calculating productsAr B calculating products

Ar

k A-elements

Ar

k B-elements

B/CT

time: k2

elementsk n2

Page 43: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 43

column sumcolumn sum

ii+1i-1

j

row itime: log n

k n only elements per column rout time: k n

Page 44: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 44

routing within columnsrouting within columns

rout time: k n

Page 45: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 45

Reconfigurable architecturesReconfigurable architectures

Reconfigurable mesh ?

constant diameter !

No !!!

Physical laws!

Page 46: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 46

Physical limitsPhysical limits

c=300 000 km/sec • 30cm/ns

• on chip: 1cm/ns

• --> bounded bus length

good idea !

Page 47: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 47

bounded broadcastbounded broadcast

1 2 31 2 2 3 3 3

1 1 2 2 2 2 2 3 33 33 3 1 1 1 1 1 2 2

3 1 1 2 2 23 3 1 1 1 2 22 2

2 2 3 3 3 3 3 1 11 11 1 2 3 3

time: k + n/l

Page 48: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 48

creating main stationscreating main stations

1 2 3 1 2 3 1 2 3

1 1 1 1 1 1 1 1 1 1 1 1 1 1 12 2 2 2 2 2 2 2 2 2 2 2 2 2 23 3 3 3 3 3 3 3 3 3 3 3 3 3 3

time: k

Page 49: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 49

Create main stations 1,…,k for A and B (time: n/l+k)

For i=1,…,k do

Begin

horizontal ring broadcast i of A

For j=1,…,k do

vertical ring broadcast j of B

End.

A row-sparse B column-sparseA row-sparse B column-sparse

B

A C

k

k

Page 50: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 50

Create main stations 1, … , k for A (time: n/l+k)

For i=1,…,k do

Begin

horizontal ring broadcast i

k bounded vertical broadcasts of products

merging new products

End.

A and B column-sparseA and B column-sparse

A B/C

k

k elements

T

i

i

Page 51: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 51

remove minor stationsremove minor stations

1 2 32 21 3 3 31 1 2 2 2 2 2 3 33 3

1 1 1 1 1 2 2 2 2 23 3 3 3 33 3 3 3 3 1 1 1 1 1 2 22 2

2 2 2 2 2 3 3 3 3 3 1 11 1

Page 52: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 52

resultsresults

Time: n (nxn mesh)A and B column sparse (k2) (k2+2n/l)A and B row sparse (k2) (k2 +2n/l)A row sparse, B column sparse (k2) (k2 +n/l)A column sparse, B row sparse (+11n/l) 3k n

•image processing

•sorting

•routing

•load balancingbetter than the mesh !

(Kunde, Middendorf, Schmeck, Schröder, Turner)

(Kapoor, Kunde, Kaufmann, Schroeder, Sibeyn)

Page 53: Heiko Schröder,  2003

•The RM is in some cases “better” than PRAM•The RM is always at least as “good” as mesh •The RM is often “better” than the mesh

Page 54: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 54

Fault tolerant On-board computingFault tolerant On-board computing

10 km/s 1 image/s 100 Mbit/image 4000 s/orbit 400 Gbit/orbit download: 400 Mbit/orbit

On-board imageanalysis andcompression

800 km

Singapore

Page 55: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 55

Due to radiation:•Single event upsets (many)•latch ups (extra hardware)•total loss (rare at 800 km)

1 task per processorseveral tasks/instruments/sources per processor1 task per 3 to 4 processors

!

?16 processors (+ spares?)fault tolerant reconfigurable network

Page 56: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 56

Methods currently usedMethods currently used

shadow-processorsmajorityvoting

Byzantine systems ASTRIUM, deep space

Page 57: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 57

1 CAN2 CANs •Industrial spec.

•mil-spec.•radiation tolerant•radiation hardened

386 is modern

Page 58: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 58

Fault tolerance through reconfigurationin regular networks

Page 59: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 59

Every fault pattern, that does not contain a 2x2 array of faulty PEs survives.

PS(7)=0.7

A simple solution with high fault tolerance (torus)

processor Data sourceinstrument

“atomic fault pattern”

Page 60: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 60

To the right

up

Replacement paths

Replace to the right -- 1 fault per row

Faulty processors

Page 61: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 61

To the right

Replacement paths

Page 62: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 62

Preserving horizontal connections Preserving vertical connections

spares Replacement paths

Two separate networks for horizontal and vertical connections

Page 63: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 63

Number of switches: 2, 4, 6, 30Wire area: 0, 2.8, 3.8, 4

S

N

E

W

P

I

kNp 1

1

PS

# faults

1

161 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 64: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 64

… an arrary of SHARCs to provides throughput 160 Mb/s.… 2.5 billion floating point operations per second. … first demonstration of real-time image processing in space.

image cube froma 30 km wide swath of Korea’s coastline.(Launch: 2001?)

Nemo

Page 65: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 65

1.) You need to know which algorithm you want to use.But in image processing (if you do not use FFT – wavelet can also be a problem)You can usually assume that every calculation you do depends only on data in the close neighbourhood. That results in the fact that at any stge of your processing you need to have in your memory only the data of 3 neighbouring planes. 2.) you have several choices of processing the data (each time you do all the processing related to a single plane in parallel). You can either slice your cube into le to cut “Horizontal planes or into vertical planes (sometimes it is desirable to cut “diagonally”).3.) It is important that you read every data element only once.4.) Such data processing you would call systolic, i.e. you move the raw data through the architecture with constant speed and constant direction.

In the picture below I have cut the data-cube vertically -- obviously there are many directions of slicing vertically.

If the memory of the processors is large enough, you might be able to hold the complete cub in memory – then you can also process FFTs and wavelets easily.

Page 66: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 66

Compressionratio (CR=4loss-less)

Segmentation gain (SG=16, 1/16 of a useful image is useful)

Classification gain(CG=5, 1 in 5 images contain useful information)

U=.8

U=.2

U=4

U=1 U=16

The satellite efficiency cube

Not likely

LOSSY=60U=32

U=64

(0,0,0)

Page 67: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 67

Our aim: High performance via COTS16 processors (+ spares) off-the-shelfconnected via afault tolerant reconfigurable network

In X-SAT restricted to image processing

Mesh/torus

Page 68: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 68

processorsfault

tolerantmesh

on-board

Page 69: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 69

switch

current communication

FPGA

ctrlh/vo/er/w

Instructionsto PEs

link to PE

Page 70: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 70

spares

C3 -- torus

spares

Replacement algorithm exists for up to 4 faults.Reconfiguration software runs on FPGA.Could be repaired within << 1sec.

Page 71: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 71

ctrlh/vo/er/w

Instructionsto PEs

Diagnosticset switches

Page 72: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 72

4 FPGAsConnected to k*(k+1) SA-processors each

k+1 horizontal and vertical connections plus diagonals

Theorem:Given a 2x2 array of FPGAs, each connected to k2+k processors, with k+1 vertical and horizontal connections and 1 connection in the diagonals.

The processors can be connected to a 2kx2k mesh as long as the sum of working processors is at least 4k2.

Page 73: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 73

Proof:There are 3 cases, which we treat separately:1. Two neighbouring FPGAs have more than k2 working processors 2. No two neighbouring FPGAs have more than k2 working processors but two opposite FPGAs have more than k2 working processors 3. Only one FPGA has more than k2 processors

Case 1: red and green are greater than k2

Figure 1 shows all possible combinations forred and green having more then k2 processors.(in the drawing k=8).The light red and green areas show the minimal Number of working processors. If more than k2 processors are Working in the red and green FPGAs, they are added in the order indicted by the arrows in the dark red and dark green areas. These placesare otherwise occupied by processors belonging to the yellow and blue areas.

The border between yellow and blue is determined by the 4 numbers and is within the orange area. The maximal number for the Yellow or green area is k2+k. If red has also this many elements then the yellow areaneeds to be extended to the right into the orange area. The maximal size of yellow plus orange is(k-1)(k+1)+2(k-2)=k2-3+2k= k(k+1)+k-5 which is greater or equal to k(k+1) for k 5. Please note that due to the above inequality for k=8 (as shown in the drawing) the length of the left and right arrow can be reduced by 3 each. It is easy to see from the drawing that for any possible sum of red and yellow this can be done with the length of the yellow/blue borderline not exceeding k+1. Remark: The length of the border between orange and blue is k+1. Also there is at most one diagonal connection required.

Let the sum of red and yellow be smaller than 2k(k+1), i.e. smaller or equal to 2k2+2k-1. The maximal number of red yellow (assuring that no border is longer than k+1) and orange elements is 2k2+3k-5, which is sufficient to cover all 2k2+2k-1 elements as long as 3k-5 2k-1, i.e. k 4. The case where red and yellow have together 2k2+2k elements can be solved easily as shown in Figure …

Figure 1

Page 74: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 74

Proof:There are 3 cases, which we treat separately:1. Two neighbouring FPGAs have more than k2 working processors 2. No two neighbouring FPGAs have more than k2 working processors but two opposite FPGAs have more than k2 working processors 3. Only one FPGA has more than k2 processors

1

aAll

All cases with only 3 FPGAs >0All cases with 4 FPGAs>0 under case 1 above

Page 75: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 75

Case 2: Two FPGAs have more than 64 processors, but no two neighbouring FPGAs do:Let the red and blue have between k2+1 and k2+k working processors. Case 2a: Assume that either green or yellow have more than k2-k elements.Lets assume that green has at least k2-k+1 elements (and due to the general assumption of case 2 it has at most k2 elements). If yellow has more than k2-k elements we produce the mirror image.

Under these assumptions red can have up to k2+k elements and green can have up to k2 elementsAs indicated by the horizontal arrows (see Figure 2).Green plus blue can range from 2k2-k+2 to 2k2+k. Thus 2k-4 k needs to be satisfied, thus k 4.

Case 2b: Neither green nor yellow are greater k2-k, then green and yellow have to be exactly k2-k and red and blue need to be k2+k in order to have 4k2 elements.This can be solved as shown in Figure 3.

Figure 2 Figure 3

Page 76: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 76

Case 3: Only red is larger than k2.All other colours are at least k2-k and at most k2.Case 3a: No colour is k2-k.Yellow occupies the yellow area plus orange if required (see Figure 4). Red occupies the bright red area plus the rest of orange plus the dark red in the order indicated by the two vertical arrows. Blue occupies the light blue and dark blue area in the order indicated by the arrow.

Case 3b: One has k2-k elements. This can then either be a neighbour of red (say green) – see Figure 5, or it can be opposite red – see Figure 6.

This completes the proof that for k >4 there is always a Solution, as long as at least 4k2 processors are active.

Figure 5 Figure 6Figure 4

Page 77: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 77

For the case of more than 4 FPGAsWe also assume that we have k +1Horizontal and vertical connections plusone diagonal connection. We also attachk2 +k processors to every FPGA.

Page 78: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 78

Page 79: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 79

27x27=7298x90=72021 yellow (9 lower bound)

24x24=5768x72=5768 yellow (0 lower bound)

Page 80: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 80

30x30=9008x110=88029 yellow (20 lower bound)

Page 81: Heiko Schröder,  2003

Heiko Schröder, 2003 Reconfigurable mesh 81