Reconfigurable Computing On-line communication strategies ... · Reconfigurable Computing On-line...

49
Reconfigurable Computing Reconfigurable Reconfigurable Computing Computing On On - - line line communication communication strategies strategies Chapter Chapter 7 7 Prof. Dr. Prof. Dr. - - Ing. Jürgen Teich Ing. Jürgen Teich Lehrstuhl für Hardware Lehrstuhl für Hardware - - Software Software - - Co Co - - Design Design

Transcript of Reconfigurable Computing On-line communication strategies ... · Reconfigurable Computing On-line...

Reconfigurable Computing

ReconfigurableReconfigurable ComputingComputing

OnOn--lineline communicationcommunication strategiesstrategies

ChapterChapter 77

Prof. Dr.Prof. Dr.--Ing. Jürgen TeichIng. Jürgen TeichLehrstuhl für HardwareLehrstuhl für Hardware--SoftwareSoftware--CoCo--DesignDesign

Reconfigurable Computing 2

OnOn--line connection line connection -- Motivation Motivation

Routing-conscious temporal placement algorithms consider distance among components during placementHowever, they do not consider implementation of a dynamic connection mechanism required for communication among components.

In this section, we will investigate existing approaches for solving the communication problem between components dynamically placed on and removed from the device, namely:

Bus-based approachesCircuit routingNetwork-on-Chip (NoC) approaches

Reconfigurable Computing

BUSBUS--based communicationbased communication

3

Reconfigurable Computing 4

BUS BUS -- oriented communicationoriented communicationMany components connected at fixed locationsOne arbiter for BUS-ManagementSoC (System on Chip): Buses can be used to connect different modules

ARM AMBAAdvance high-performance bus (AHB)Advance peripheral bus (APB)

IBM CoreConnectProcessor local bus (PLB)On-chip peripheral bus (OPB)

Silicore Whishbone

Mod4 Mod 1

Mod3

Mod2

Arbiter

Reconfigurable Computing 5

Using standard bus-arbiter (Becker)

Device is divided into slots

Each task must be placed in a slot

Each component implements the bus-transaction

Each component can be a master

An arbiter manages the bus-assignment

OS-frame

Quelle: ITIV, Uni Karlsruhe (TH)

Decompessor

Control

Bus-MacroMaster-Module

Controller Com

ICAP

Mod

ule

0

ModCom

Mod

ule

1

ModCom

ModCom

Mod

ule

2

ModCom

Mod

ule

3

ModCom

Mod

ule

4

Quelle: ITIV, Uni Karlsruhe (TH)

BUS BUS -- oriented communicationoriented communication

Reconfigurable Computing 6

Encapsulating the BUS-transaction in a wrapper (Platzner, Walder)

Divide the device into slotsEach task must be placed in a given slotA slot is enveloped in a wrapper which hides the bus-transaction process

Communication takes place through a fixed module called the OS.

Each module can send a message by writing in its send bufferThe OS copies messages from the send buffers to the receive buffers of modulesThe receive modules read their message from its receive buffer

OS-frame

task-slot

task-slot

task-slot

task-slot

Inter Frame Communication Channels (IFCC)

Communication via the OSCommunication via the OS

Reconfigurable Computing 7

Communication with off-chip module is also done via the OS

OS-frame

Communication via the OSCommunication via the OS

Reconfigurable Computing

Circuit switchingCircuit switching

8

Reconfigurable Computing 9

Architecture:Set of Processing elementsCommunication signals are set between two PEs using a set of switches on a path from the source to the destinationAdvantage:

Direct communication. No need to process packets

Drawbacks:Computing a route is expensive. Difficult to be done on-lineRouted lines create a large amount of prohibited area

Prohibited area can be overcome by using an extra layer exclusively for circuit routing

Dynamic Networks Dynamic Networks –– circuit routing circuit routing

Prohibited area

Reconfigurable Computing 10

A set of n processing elements and k segmented busesCrosspoints (switches) are used to set the connection between the segments at run-time

The reconfigurable multiple bus (RMB) approachThe reconfigurable multiple bus (RMB) approach

PE 5PE 4PE 3PE 2PE 1

Switches

Reconfigurable Computing 11

The sender always initiates a communication request and terminates (frees) an established communication pathEach communication path is granted until the end of the communication

OS-frame

The reconfigurable multiple bus (RMB) approachThe reconfigurable multiple bus (RMB) approach

PE 5PE 4PE 3PE 2PE 1

Reconfigurable Computing 12

On a columnwise reconfigurable device, the RMB provides a modular communication infrastructureAll the switches in one column are grouped togetherThe separation of horizontal reconfigurable regions is done via bus macros

OS-frame

The reconfigurable multiple bus (RMB) approachThe reconfigurable multiple bus (RMB) approach

PE 5PE 4PE 3PE 2PE 1

Bus macros

Reconfigurable Computing 13

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

T1 T2

T3

T7T6

T5

T8

T4

T9

T1

M11

M11

M11

M11

…T1 T2

T3

Reconfigurable Computing 14

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

T1 T2

T3

FPGA

RMB

M1 M2

M3

Reconfigurable Computing 15

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

FPGA

RMB

M1

M2M3

Reconfigurable Computing 16

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

FPGA

RMB

M1

M2M3

Reconfigurable Computing 17

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

FPGA

RMB

M1

M2M3

Reconfigurable Computing 18

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

FPGA

RMB

M1

M2M3

Reconfigurable Computing 19

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

FPGA

RMB

M1M2

M3

FPGA

RMB

M1M2

M3 FPGA

RMB

M1M2

M3

{ } ( ) ( ) ( ){ }jkiEjink σσσ ≤≤∈∈ ,,maxmin ,,1 K

( ) ( ) ( )jiEji σσσ −∈,maxmin

( ) ( )( )∑ ∈−

Ejiji

,min σσσ

Minimum Bandwidth (MBW)

Minimum Cutwidth Linear Arrangement (MCLA)

Optimal Linear Arrangement (OLA)

Reconfigurable Computing 20

AlgorithmsAlgorithms forfor ReconfigurationReconfiguration

FPGA

RMB

M1

M2M3

Reconfigurable Computing 21

Example: Video game PongExample: Video game Pong

Reconfigurable Computing 22

Video game: Module RelocationVideo game: Module Relocation

Racket Position

User Input

Ball Position

Visualization

4

20

20 38

Reconfigurable Computing 23

Video game: Module RelocationVideo game: Module Relocation

Racket Position

User Input

Ball Position

Visualization

4

20

20 38

Reconfigurable Computing 24

4

20

20 38

Video game: Module RelocationVideo game: Module Relocation

Racket Position

User Input

Ball Position

Visualization

CP

Use

rIn

put

CP

Rac

ket

Posi

tion

CP

Bal

lPo

sitio

n

CP

Visu

aliz

atio

n

Task:• Place modules such that the least number of

bus segments is required Solution:

• Integer Linear Program (FPL’06)

Reconfigurable Computing 25

4

20

20 38

Video game: Module RelocationVideo game: Module Relocation

Racket Position

User Input

Ball Position

Visualization

CP

Use

rIn

put

CP

Rac

ket

Posi

tion

CP

Bal

lPo

sitio

n

CP

Visu

aliz

atio

n

CP

Use

rIn

put

CP

Bal

lPo

sitio

n

CP

Visu

aliz

atio

n

CP

Rac

ket

Posi

tion

58 parallel segments

Reconfigurable Computing 26

Video game: Module RelocationVideo game: Module Relocation

CP

Use

rIn

put

CP

Bal

lPo

sitio

n

CP

Visu

aliz

atio

n

CP

Rac

ket

Posi

tion

Length of longest connection is 3

58 parallel segmentsTask:

• Place modules such that for given maximalnumber of parallel bus segmentsthe length of the longest connectiondistance is minimized

Solution:• Integer Linear Program (FPL’06)

Reconfigurable Computing 27

Video game: Module RelocationVideo game: Module Relocation

CP

Use

rIn

put

CP

Bal

lPo

sitio

n

CP

Visu

aliz

atio

n

CP

Rac

ket

Posi

tion

Length of longest connection is 2

Reconfigurable Computing 28

Video game: Module RelocationVideo game: Module Relocation

Length of longest connection is 2

CP

Rac

ket

Posi

tion

CP

Use

rIn

put

CP

Bal

lPo

sitio

n

CP

Visu

aliz

atio

n

Reconfigurable Computing 29

Erlangen Erlangen SlotSlot MachineMachine

Reconfigurable Computing 30

Video game: Video game: ErlangenErlangen Slot Machine (ESM)Slot Machine (ESM)

Reconfigurable Computing 31

ImplementationImplementation

CP0 CP2CP1 CP3

RacketPosition

UserInput DisplayBall

Position

Reconfigurable Computing 32

ReferencesReferences

CP0 CP2CP1 CP3

RacketPosition

UserInput DisplayBall

Position

[1] Minimizing Communication Costs for Reconfigurable Slot ModulesS. Fekete, J. van der Veen, M. Majer, J. TeichIn Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), Madrid, Spain, August 28-30, 2006.

[2] A Practical Approach for Circuit Routing on Dynamic Reconfigurable DevicesA. Ahmadinia, C. Bobda, J. Ding, M. Majer, J. Teich, J. van der Veen and S. Fekete,In Proceedings of the IEEE International Workshop on Rapid System Prototyping (RSP), Montreal, Canada, pp. 84-90, June 8-10, 2005.

[3] The Erlangen Slot Machine: A flexible FPGA-platform for partially reconfigurable applications at run-time. J. Angermeier, D. Göhringer, M. Majer and J. Teich.Tutorial, 20th International Conference on Architecture of Computing Systems (ARCS 2007), Springer LNCS series, Zurich, Switzerland, March 12-15, 2007.

[4] The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-Based Computer. M. Majer, J. Teich, A. Ahmadinia and C. Bobda. Journal of VLSI Signal Processing Systems, Springer, vol. 46(2), March 2007.

[5] The Erlangen Slot Machine - A Platform for Interdisciplinary Research in Reconfigurable Computing. J. Angermeier, D. Göhringer, M. Majer, J. Teich, S. Fekete and J. van der Veen. it - Information Technology, Heft 3/2007, Oldenbourg, München, 2007.

[6] Optimal free-space management and routing-conscious dynamic placement for reconfigurable computing.A. Ahmadinia, C. Bobda, S. Fekete, J. Teich and J. van der Veen.IEEE Transactions on Computers, volume 56, number 3, 2007.

Reconfigurable Computing

PacketPacket--based communicationbased communication

33

Reconfigurable Computing 34

A Network on Chip consists ofA set of processing elementsA set of network elements also called routersEach PE is connected to a network elementEach PE is assigned to the same address as its corresponding network elementCommunication is packet-basedEach packet contains the destination address and some dataRouters are used to forward packets in the right direction according to the destination addressA router contains little logic. It may have some buffer for storage of packets in case of high traffic

NoC (Network on Chip) NoC (Network on Chip) –– based based communicationcommunication

Router

NoC

Reconfigurable Computing 35

Limitations of fixed NoC communicationFixed position for modulesLarger modules must be split

Packet based communication inside a component is not efficient

Direct communication must be used on a module boundaryWe seek a network infrastructure which

allows modules placed at a given location to use all the resources in their areachanges according to the placement of modules on the deviceEach component always accesses other components and pins for communication

Dynamic Networks Dynamic Networks

Reconfigurable Computing 36

Architecture: like NoC architectureSet of Processing elementsSet of network elements implementing routers in their basic configurationEach PE is connected to a network elementDirect communication among neighbour PEsCommunication is packet-basedEach packet contains the destination address and some dataThe ratio router size/module size must be kept small

Dynamic Networks Dynamic Networks –– DyNoC (Dynamic NoC) DyNoC (Dynamic NoC)

Reconfigurable Computing 37

Dynamics in the NoCEach module is represented as a rectangular box encapsulating a given functionAll resources (routers and PEs) in a placement area of a module are assigned to the module

Therefore, the network logic should be flexible to be used as logic in a given module

Upon completion, each module restores its routers to their basic configurationUp to a selected router, all the routers in the area of a component are no more accessible from the networkEach placed component accesses the network using the router attached to it North-East (NE) PENetwork varies with temporal placement of modules on the device

Dynamic Networks Dynamic Networks –– DyNoC DyNoC

Reconfigurable Computing 38

Module and pin reachability:A module (pin) is reachable iff all the messages sent to this module (pin) can reach their destination.

We define the component graphG = (V,E) as follows:

V is the set of components and pinsAn edge (u,v) belongs to E iff a path exists between u an v

If G is connected, then all components and pins are reachableThis increases the architectural requirements

Dynamic Networks Dynamic Networks –– DyNoC DyNoC -- Reachability Reachability

Reconfigurable Computing 39

Additional architectural requirementsA ring of network elements must be available around the chipThe PEs at the chip boundary must be connected to the router at the chip boundaryEach placed component accesses the network using the PE associated to it North-East (NE) PEOnly PEs are allowed to be at the boundary of a component

Dynamic Networks Dynamic Networks –– DyNoC DyNoC -- Reachability Reachability

Reconfigurable Computing 40

Theorem (Bobda et al.): If each component is synthesized in such a way that it is internally surrounded only by processing elements, then each placement on the reconfigurable device causes a strongly connected component graph. Proof:

Assume that the corresponding component graph is not strongly connected, then

at least two components abut or one component abuts the device boundary.

Consider, for example, case 1):Either the two components must overlapOr, one component uses some routers on its boundary.

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Reachability Reachability

PE

PE

PE

PE

PE

PE

PE

PE

A

PEPEPEPE

PE

X XX

XXX

X

X

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

X

A

Reconfigurable Computing 41

Example of a feasible placement

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Reachability Reachability

Reconfigurable Computing 42

Routing in a mesh without obstacles The XY-router

Fast and EfficientLocal decisive5 inputs and 5 outputs channelsInput-FIFO on each channel

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– RoutingRouting

The router compares its address to the destination address of a packet

If X-router < X-packet, packet is sent eastIf X-router > X-packet, packet is sent westIf X-router = X-packet and Y-router < Y-packet, packet is sent northIf X-router = X-packet and Y-router > Y-packet, packet is sent southIf X-router = X-packet and Y-router = Y-packet, copy packet to local FIFO

xy

Reconfigurable Computing 43

The dynamic placement of components creates obstacles in the networkThe routing must be able to recognize obstacles and be able to surround components.Vertical and horizontal obstacles are treated differently

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing

Obstacles

Reconfigurable Computing 44

Dealing with obstaclesXY-RouterAdditionally:

Activate signal to neighboursIf the router is available, this control signal is high. Otherwise, it is low

Component surrounding strategies are requiredThe S-XY (Surround XY) router

Operates in three modesThe N-XY: Normal operating mode. The packets are routed according to the XY strategyThe SH-XY: The router enters this mode when a horizontal obstacle is foundThe SV-XY: The router enters this mode when a vertical obstacle is found

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XYXY

Reconfigurable Computing 45

The SH-XY mode: Surrounding obstacles in the horizontal direction

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY

DestObstacle

Component

Routing Path 2

Routing Path 1

YDest > YRouter

XDest < XRouter

XDest = XRouterYDest < YRouter

XDest = XRouterYDest = YRouter

Stamp packets to avoid “ping-pong” game

Reconfigurable Computing 46

ImplementationImplementation

• Virtex II 6000

– 4x4 DyNoC– 7% of FPGA-Usage– Router latency: 2,5ns– 32bit Data-BUS and

6x4x32bit FIFO per Router

Reconfigurable Computing 47

Surrounding obstacles in the vertical directionPlace a stamp on packets to avoid a “ping-pong”game

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY

ObstacleComponent

DestinationComponent

RoutingPath1 Routing

Path2

Ping ponggame

Reconfigurable Computing 48

Theorem (Bobda et al.): The S-XY algorithm is deadlock-free, i.e., each packet will reach its destination after a finite number of steps.

Proof: ExerciseProve that

Each component is reacheable, i.e., a path is always available from source to destinationA packet is never blocked in the network (Theorem 1)Since a packet can never be blocked, this will happen only if apacket is looping around a component.

Prove that this will never happen!

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY

Reconfigurable Computing 49

The decision to left/right or up/down is taken arbitrarilyIn the worst case, the path can be very longTo avoid this, consider guiding the router by the components

Dynamic Networks Dynamic Networks –– DyNoC DyNoC –– Routing Routing –– SS--XY XY

00000101

01

01

01

00

00

00000101

01

01

01

00

00

Guided routing

C4

C 3C

2C

1

S

D