EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions ©...

29
Lecture 13 Slide 1 EECS 470 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar EECS 470 Lecture 13 Basic Caches Winter 2019 Prof. Ronald Dreslinski h8p://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

Transcript of EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions ©...

Page 1: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 1 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

EECS470Lecture13BasicCaches

Winter2019

Prof.RonaldDreslinski

h8p://www.eecs.umich.edu/courses/eecs470

Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, and Vijaykumar of Carnegie Mellon University, Purdue University, University of Michigan, and University of Wisconsin.

Page 2: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 2 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Readings ForToday:

❒  H&P2.1

ForWednesday:❒  H&P2.2,2.3,B.3❒  N.Jouppi.Improvingdirect-mappedcacheperformance…

Page 3: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 3 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Memory Systems: Basic Caches

Page 4: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 4 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Memory Systems

Basiccaches❒  introducAon❒  fundamentalquesAons❒  cachesize,blocksize,associaAvity

Advancedcaches

Mainmemory

Virtualmemory

Start today

Page 5: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 5 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Motivation

Wantmemorytoappear:❒  asfastasCPU❒  aslargeasrequiredbyalloftherunningapplicaAons

1

10

100

1000

10000

1985 1990 1995 2000 2005 2010

Perf

orm

ance

Processor

Memory

Page 6: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 6 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

LargerFaster

Memory Hierarchy Makecommoncasefast:

❒  common:temporal&spaAallocality❒  fast:smallermoreexpensivememory

Registers

Caches

Memory

Disk (MEMS?)

Page 7: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 7 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Storage Hierarchies Storagesarelayeredbyhierarchiesinorderof

❒  increasinglatency(ti) ti<ti+1❒  increasingsize(si)

⇒decreaseunitcost(ci) si<si+1,ci>ci+1❒  decreasingbandwidth(bi) bi>bi+1❒  increasingxferunit(xi) xi<xi+1

Level0Registers

Level1(nlevelsof)Caches

Level2MainMemory(PrimaryStorage)

Level3Disks(SecondaryStorage)

Level4TapeBackup(TerAaryStorage)

ISA feature Memory Abstractions

Level 2.5: Flash?

Level 1.5: NVRAM?

Page 8: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 8 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Processor/Memory Boundaries

I-Unit E-Unit

L1 I-Cache L1 D-Cache

L2 Cache (SRAM on-chip)

D-TLB I-TLB

Regs

Main Memory (DRAM)

Processor

L3 Cache (SRAM off-chip)

Page 9: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 9 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Caches AnautomaAcallymanagedhierarchy

“Ahidingplace,esp.ofgoods,treasure,etc.”--OED

Keeprecentlyaccessedblock❒  temporallocality

Breakmemoryintoblocks(severalbytes)andtransferdatato/fromcacheinblocks

❒  spaAallocality

AlotofarchitecturesoptforsoFwaremanagedscratch-padmemoryinsteade.g.Cray-1,embeddedprocessors,Why??

CPU

$

Memory

Page 10: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 10 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cache (Abstractly) Keeprecentlyaccessedblockin“blockframe”

❒  state(e.g.,valid)❒  addresstag❒  data

address state

bookkeepingoverhead

data

mulAplebytesperblockframetoamorAzeoverhead

Page 11: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 11 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cache (Abstractly) Onmemoryread

ifincomingaddresscorrespondstooneofthestoredaddresstagthen❍  HIT❍  returndata

else❍  MISS❍  choose&displaceacurrentblockinuse❍  fetchnew(referenced)blockfrommemoryintoframe❍  returndata

- Whereandhowtolookforablock?(Blockplacement)- Whichblockisreplacedonamiss?(Blockreplacement)- Whathappensonawrite?Writestrategy(Later)- Whatiskept?(Bookkeeping,data)

Page 12: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 12 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Terminology block(cacheline)—minimumunitthatmaybepresent

hit—blockisfoundinthecache

miss—blockisnotfoundinthecache

missraAo—fracAonofreferencesthatmiss

hitAme—Ametoaccessthecache

misspenalty❒  Ametoreplaceblockinthecache+delivertoupperlevel❒  accessAme—Ametogetfirstword❒  transferAme—Ameforremainingwords

Page 13: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 13 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cache Performance Assume

❒  CacheaccessAmeisequalto1cycle❒  CachemissraAois0.01❒  Cachemisspenaltyis20cycles

MeanaccessAme

=CacheaccessAme+missraAo*misspenalty

=1+0.01*20=1.2

Typically❒  level-1is16K-64K,level-2is512K-4M,memoryis128M-4G❒  level-1asfastastheprocessor(increasingly2-cycles)❒  level-1is1/10000capacitybutcontains98%ofreferences

MemoizaSon&amorSzaSon

Page 14: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 14 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Fundamental Cache Parameters that affects miss rate

Cachesize (C)

Blocksize (b)

CacheassociaAvity (a)

Page 15: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 15 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Cache Size Cachesizeisthetotaldata(notincludingtag)capacity

❒  biggercanexploittemporallocalitybener❒  notALWAYSbener

Toolargeacache❒  smallerisfaster=>biggerisslower❒  accessAmemaydegradecriAcalpath

Toosmallacache❒  don’texploittemporallocalitywell❒  usefuldataconstantlyreplaced

hit rate

C

“working set” size

holding b and a constant

Page 16: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 16 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Block Size Blocksizeisthedatathatis

❒  associatedwithanaddresstag❒  notnecessarilytheunitoftransferbetweenhierarchies(sub-blocking)

Toosmallblocks❒  don’texploitspaAallocalitywell❒  haveinordinatetagoverhead

Toolargeblocks❒  uselessdatatransferred❒  usefuldatapermanentlyreplaced—toofewtotal#blocks

b holding C and a constant

Page 17: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 17 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Associativity

Fully-associaAveblockgoesinanyframe

(thinkallframesin1set)

Direct-mappedblockgoesinexactly

oneframe

(think1frameperset)

Set-associaAveablockgoesinany

frameinexactlyoneset

(framesgroupedintosets)

Wheredoesblock12(b’1100)go?

0123

01234567

01010101

01234567

BlockSet/BlockSet

Page 18: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 18 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Impact of Associativity TypicalvaluesforassociaAvity

❒  1,2-,4-,8-wayassociaAve

LargerassociaAvity❒  lowermissrate,lessvariaAonamongprograms

❒  onlyimportantforsmall“C/b”

SmallerassociaAvity❒  lowercost,fasterhitAme

hit rate

a

~5

holding C and b constant

Page 19: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 19 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Direct Mapped Caches

tag idx b.o.

= Tag

match

(hit?)

Multiplexor de

code

r

= Tag

Match

(hit?)

deco

der

tag index

block index

Don’t forget to check the valid/state bits

Page 20: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 20 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

tag blk.offset

Fully Associative Cache

= = =

= Multiplexor

Associative Search

Tag

Page 21: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 21 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

N-Way Set Associative Cache

tag idx b.o.

= Tag match

deco

der

= Tag match

Multiplexor

deco

der

a set a way (bank)

Cache Size = N x 2B+b

Page 22: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 22 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Mark Hill’s DM vs. SA: “Bigger & Dumber is Better”

tavg=thit+missraAoxtmiss❒  comparableDMandSAcacheswithsametmiss❒  but,associaAvitythatminimizestavgisosensmallerthanassociaAvitythatminimizesmissraAo

remember:

diff(tcache)=tcache(SA)-tcache(DM)≥0 (SAneedsslowerclock)

diff(miss)=miss(SA)-miss(DM)≤0 (DMmissesmore)

e.g.,Ifdiff(tcache)=0=>SAbener,butassumingdiff(miss)=-1%,tmiss=20 ⇒ifdiff(tcache)>0.2cyclethenSAloses

Page 23: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 23 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Associative Block Replacement Whichblockinasettoreplaceonamiss?Ideally—Belady’salgorithm,replacetheblockthat“will”beaccessedthefurthestinthefuture

❒  Howdoyouimplementit?

ApproximaAons:Leastrecentlyused—LRU

❒  opAmized(assume)fortemporallocality (expensiveformorethan2-way)

Notmostrecentlyused—NMRU❒  trackMRU,randomselectfromothers,goodcompromise

Random❒  nearlyasgoodasLRU,simpler(usuallypseudo-random)

HowmuchcanblockreplacementpolicymaVer?

Page 24: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 24 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Example: a=2, C=1kB, b=4B, word-size=2B Basic Solution

data 0

128-lines x

4-bytes

data 1

128-lines x

4-bytes

tag0

128-l x

23-b

v0 “ x

1-b

tag1

128-l x

23-b

v1 “ x

1-b

tag PA[31:9]

PA[0]

b.o. PA[1]

idx PA[8:2]

7

idx 7

idx 7

idx 7

idx

= tag

23

hit0

=

hit1

2-1-mux 2-1-mux b.o.

2-1-muxd hit0 hit1

HIT DATA

hit0

hi

t1

16

Page 25: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 25 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Write Policies WritesaremoreinteresAng

❒  onreads,datacanbeaccessedinparallelwithtagcompare❒  onwrites,needstwosteps❒  isturn-aroundAmeimportantforwrites? cacheopSmizaSonoFendeferwritesforreads

ChoicesofWritePolicies❒  Onwritehits,updatememory?

❍  Yes:write-through+nocoherenceissue,+immediateobservability,-morebandwidth

❍  No:write-back❒  Onwritemisses,allocateacacheblockframe?

❍  Yes:write-allocate❍  No:no-write-allocate

Page 26: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 26 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Write Policies (Cont.) Write-through

❒  updatememoryoneachwrite❒  keepsmemoryup-to-date❒  traffic/reference=fwrites,e.g.0.20 independentofcacheperformance(missrate)

Write-back❒  updatememoryonlyonblockreplacement❒  manycachelinesareonlyreadandneverwrinento❒  add“dirty”bittostatusword

❍  originallyclearedaserreplacement❍  setwhenablockframeiswrinento❍  onlywritebackadirtyblock,and“drop”cleanblocksw/omemoryupdate

❒  traffic/reference=fdirtyxmissxB❍  e.g.,traffic/reference=1/2x0.05x4=0.1

Page 27: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 27 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Store Buffers

BufferCPUwrites❒  allowsreadstoproceed❒  stallonlywhenfull❒  datadependence?

❍  Whathappensondependentloads/stores?

CPU $

Page 28: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 28 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Writeback Buffers

Betweenwrite-backcacheandnextlevel1.Movereplaced,dirtyblockstobuffer2.Readnewline3.Movereplaceddatatomemory

Usuallyonlyneed1or2write-backbufferentries

$ $$/Memory

Page 29: EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

Lecture 13 Slide 29 EECS 470

© Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

“Harvard” vs. “Princeton” Unified(someSmesknownasPrinceton)

❒  lesscostly,dynamicresponse,handleswritestoinstrucAons

SplitIandD(someSmesknownasHarvard)❒  mostoftheAmecodeanddatadon’tmix❒  2xbandwidth,placeclosetoI/Dports❒  cancustomizesize(I-footprintgenerallysmallerthand-footprint),nointerferencebetweenI/D

❒  self-modifyingcodecancause“coherence”problems

CachesshouldbesplitforfrequentsimultaneousI&Daccess❒  nolongeraquesAonin“high-performance”on-chipL-1caches