2.ALU Design · 2.ALU Design. Olle Seger ([email protected]) Dake Liu ([email protected]) •ALU, an...

23
2.ALU Design Olle Seger ([email protected]) Dake Liu ([email protected]) ALU, an overview AU, a case study Exercises About Lab-2 1

Transcript of 2.ALU Design · 2.ALU Design. Olle Seger ([email protected]) Dake Liu ([email protected]) •ALU, an...

Page 1: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

2.ALU Design

Olle Seger ([email protected])Dake Liu ([email protected])

•ALU, an overview•AU, a case study•Exercises•About Lab-2

1

Page 2: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

ALU Key component in datapath of a DSP Processor Usually all operands from RF, except imm Execution Cost : 1 Clock Cycle Use one guard bit

Key Components of ALU Arithmetic Unit Logic Unit (AND, OR, XOR etc) Shifter (LRS, LLS, ASR, ASL) Special Functions (e.g. bit manipulation) Multiplexers

2

Page 3: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

ALU Overview

Logic Shift Special

Flags

AU

Pre-Processing

Post-Processing

ResultSaturation

3

Page 4: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Let’s design a small AUFunctional Specification

0. A + B with saturation OP=00001. A + B without saturation OP=00012. A + B + Cin with saturation OP=00103. A + B + Cin without saturation OP=00114. A - B with saturation OP=01005. A - B without saturation OP=01016. A compare to B with saturation OP=01107. ABS(A) Absolute operation on A OP=01118. NEG(A) Negate operation on A OP=10009. (A+B)/2 Average operation OP=1001

10. NOP OP=1010

The C, Z, V, and N flag should be updated for OP0-9

4

Page 5: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

AU functions

A B A B

Saturation

+

A B

+ + +

A B

CinCin

SAT(A + B) A + B SAT(A + B + C) A + B +C

Saturation

Average (A+B)

+

A B

‘1’+

A B

‘1’Flag-only

+

A B

‘1’+

A

B=0

MSB of A

0 1

+

A B=0

‘1’

ASR

+

A B

SAT(A -B) A - B compare ABS(A) NEG(A)

Saturation

5

Page 6: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

HW with multiplexing

C1

=1

A[15] A[15:0] B[15:0]

01

CA[15]

ASRSAT

C4

C3

DECC1C2C3C4

OP

00 01 10

00 01 10

11 100100

Flags

17-bit adder

C5

C5

0 1

CinCout = S[16]

S

R

C2

0

00 01 10

trunc

6

Page 7: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

HW with multiplexingalways @(posedge clk)if (c5) begin

C <= Cout;Z <= !|R;N <= R[15];V <= (S[16] != S[15]);

end

Flags

ASR ½ assign R = S[16:1];

always @(*)if (S[16]==S[15])

R <= S[15:0];else if (S[16]==0)

R <= 16’h7fff;else

R <= 16’h8000;

Sat

DEC

OP C1 C2 C3 C4 C50 Sat(A+B) 00 00 01 00 11 A+B 00 00 01 01 12 Sat(A+B+C) 00 00 10 00 13 A+B+C 00 00 10 01 14 A-B 00 01 00 01 15 Sat(A-B) 00 01 00 00 16 Cmp(A,B) 00 01 00 - 17 Abs(A) 10 10 01 01 18 Neg(A) 01 10 01 01 19 (A+B)/2 00 00 01 10 110 NOP - - - 0

Truncassign R = S[15:0];

7

Page 8: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Exercise 2.1

8

Page 9: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Exercise 2.2

10

Page 10: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

We have a processor with a pipeline where we can:* Read out two operands from the register file and write one operand

to the register file, all at the same time

* Instead of reading out one of the operands you can choose to take a 16-bit immediate from the instruction word

* We have 32 16-bit registers

* A conditional branch takes 3 clock cycles

* We have a repeat instruction

* We have only one load instruction of interest: load Rd, DM0[AR0++], AR0 is set with the instruction set AR0, Rs

* The store instruction works the same waystore DM0[AR0++],Rs

* After a load instruction we must wait a clock cycle before we can use the result

Exercise 2.3

11

Page 11: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Function 1 (execution time max 105 clock cycles, exclusive the RET instruction)

int16_t dct_indata[32];

// Return value in r0uint16_t find_maxabsval(void){uint16_t biggest = 0, b;int16_t a;

for(int i=0; i < 32; i++){a = dct_indata[i];b = abs(a);if(b > biggest)biggest = b;

}}

Exercise 2.3

12

Page 12: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

int64_t packet_ctr;

int update_statistics(int16_t length) /* Length is in register r0 when this function is called */{

packet_ctr += length;}

max 25 clockcycles (exclusive the RET instruction)

Exercise 2.3

13

Page 13: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

SET ar0,dct_indataSET r0,0 ; max valueREPEAT loop,32LD r1,(ar0++)NOPABS r2,r1MAX r0,r2,r0

loopRET

SET ar0,dct_indataSET r0,0 ; max valueREPEAT loop,16LD r1,(ar0++)LD r3,(ar0++)ABS r2,r1MAX r0,r2,r0ABS r4,r3MAX r0,r4,r0

loopRET

4*32 + 3 = 131 6*16 + 3 = 99

A goldstar if you can do it faster!

Exercise 2.3

14

Page 14: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

SET ar0,dct_indataLD r1,(ar0++)SET r0,0 ; max value prologABS r2,r1REPEAT loop,31LD r1,(ar0++)MAX r0,r2,r0 loopABS r2,r1

loop:MAX r0,r2,r0 epilogRET

3*31 + 6 = 99

Exercise 2.3

15

Page 15: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

set ar0,packet_ctrset r4,0add r1,r0,0x8000 ; carry = (length<0)addc r4,r4,r4 ; r4 = (length<0)ld r1,(ar0)sub r4,0,r4 ; r4 = (length<0)?-1:0add r1,r0st (ar0++),r1repeat endloop,3ld r1,(ar0)nop ; Silverstar if you remove this

; without unrolling loop completely!addc r1,r4st (ar0++),r1

endloopret

P_c[0]

ext length

P_c[1]P_c[2]P_c[3]

ar0

ext ext

r0

Exercise 2.3

3*4 + 9 = 2116

Page 16: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

set ar0,packet_ctrset r4,0add r1,r0,0x8000 ; carry = (length<0)addc r4,r4,r4 ; 1 in r4 if length<0ld r1,(ar0)sub r4,0,r4 ; -1 in r4 if negadd r2,r1,r0repeat endloop,3ld r1,(ar0+1)st (ar0++),r2 ; loop addc r2,r1,r4

endloopst (ar0++),r2ret

Exercise 2.3 software pipelining

3*3 + 9 = 18 17

Page 17: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

ALU

C1 C2 C3 C4 C5ABS(A) 1 10 11 0 0 MAX(A,B) 0 01 00 1 0A+B 0 00 01 0 1A-B 0 01 00 0 1A+B+C 0 00 10 0 1

17-bit adder

{B[15],B[15:0]}

00 01 10

{A[15],A[15:0]}

0 1

Cout

17

C1 C2

C4

=1

A[15]

0

01

A[15]

C3

11 100100

C

10 00,01 11

always @(posedge clk)if (C5) begin

C <= Cout;end

S

[15:0]

S[16]12

Exercise 2.3

18

Page 18: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Exercise 2.4

19

Page 19: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Exercise 2.4Software pipelining

SET ar0,dct_indataSET r0,0 ; max valueLD r1,(ar0++) ; prologREPEAT loop,31LD r1,(ar0++)MAXABS r0,r1,r0 ; loop

loop:MAXABS r0,r1,r0 ; epilogRET

2*31+5=67

This code utilizes pipeline delay!20

Page 20: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

Exercise 2.4Loop unrolling

SET ar0,dct_indataSET r0,0 ; max value

REPEAT loop,16LD r1,(ar0++) LD r2,(ar0++)MAXABS r0,r1,r0 MAXABS r0,r2,r0

loop RET

4*16+3=67

21

Page 21: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

About Lab 2 (Datapath)• Manual for Lab 2 (Ch-2) • Source code for LAB-2• You can use Verilog or VHDL.• Go through Ch-0 and Ch-2 for all details

Read the manuals carefully before starting the labs!

22

Page 22: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

About Lab 2

saturation.vhd mac_dp.vhd adder_ctrl.vhd min_max_ctrl.vhd

saturation.asm rounding_vector.asm alu_test.asm

Write this HW Write this SW

1) Run SW on srsim for reference2) Run SW and HW using vsim3) Compare output4) Check coverage. Was all your HW tested?

SW should test allcorner cases

23

Page 23: 2.ALU Design · 2.ALU Design. Olle Seger (olles@isy.liu.se) Dake Liu (dake@isy.liu.se) •ALU, an overview •AU, a case study •Exercises •About Lab-2 1

About Lab 2 Verification

– Write Assembly Program to test your modules– Some Templates are provided– Fill with your choice of registers, and operands– Perform the operation– Write the results to a file using “out 0x11, r?”– Use coverage metrics to find obvious missing corner cases

– Run Modelsim Simulator using commands mentioned in Section 0.5

– Simulate and Debug

24