Massively Parallel Computing on Silicon: SIMD...

44
Massively Massively Parallel Parallel Computing Computing on on Silicon Silicon : : SIMD SIMD Implementations Implementations V.M V.M . Brea . Brea Univ. Univ. of of Santiago de Compostela Santiago de Compostela Spain Spain

Transcript of Massively Parallel Computing on Silicon: SIMD...

Page 1: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

MassivelyMassively ParallelParallelComputingComputing onon SiliconSilicon: : SIMD SIMD ImplementationsImplementations

V.MV.M. Brea. BreaUniv. Univ. ofof Santiago de CompostelaSantiago de CompostelaSpainSpain

Page 2: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

GOALGOAL

GiveGive anan overviewoverview onon thethe statestate--ofof--thethe--artart ofof Digital Digital onon--chip CMOS SIMD chip CMOS SIMD SolutionsSolutions, , mainlymainly Visual Visual ProcessorsProcessors, , throughthrough paperspapers foundfound in in thethe literatureliterature

Page 3: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

OutlineOutline

ParallelParallel ComputingComputingSIMD SIMD ComputingComputingDigital Digital SolutionsSolutionsConclusionsConclusions

Page 4: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

OutlineOutline

ParallelParallel ComputingComputingSIMD SIMD ComputingComputingDigital Digital SolutionsSolutionsConclusionsConclusions

Page 5: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

TrendsTrends in in HighHigh PerformancePerformance ComputingComputingIEEE IEEE Circuits&DevicesCircuits&Devices MagazineMagazine--JanuaryJanuary//FebruaryFebruary 20062006

SupercomputersSupercomputers-- evolutionevolution

Page 6: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

TrendsTrends in in HighHigh PerformancePerformance ComputingComputingIEEE IEEE Circuits&DevicesCircuits&Devices MagazineMagazine--JanuaryJanuary//FebruaryFebruary 20062006

SupercomputersSupercomputers-- EvolutionEvolution

–– Late 70Late 70’’ss-- EarlyEarly 8080’’ss-- Vector Vector SystemsSystems–– 8080’’ss-- SymmetricSymmetric MultiprocessorsMultiprocessors ((SMPsSMPs))-- memorymemory sharingsharing–– Late 80Late 80’’ss-- DistributedDistributed memorymemory computercomputer systemsystem--

overcomingovercoming thethe hdwhdw scalabilityscalability limitationslimitations ofof sharedsharedmemorymemory

–– 9090’’ss-- MassivelyMassively ParallelParallel ProcessorsProcessors (MPP)(MPP)-- 256 256 toto 10000 10000 processorsprocessors, , scalablescalable ((distributeddistributed memorymemory))

–– TodayToday-- offoff--thethe--shelfshelf componentscomponents, clusters , clusters ofof PCPC’’ss ororworkstationsworkstations

–– TodayToday andand tomorrowtomorrow -- GridGrid computingcomputing

Page 7: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

OutlineOutline

ParallelParallel ComputingComputingSIMD SIMD ComputingComputingDigital Digital SolutionsSolutionsConclusionsConclusions

Page 8: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A SystemSystem forfor EvaluatingEvaluating PerformancePerformanceandand CostCost ofof SIMD SIMD ArrayArray DesignsDesignsJ. J. ofof ParallelParallel andand DistributedDistributed ComputingComputing60, 21760, 217--246, 2000246, 2000

GoalGoal-- efficientefficient evaluationevaluation ofof SIMD SIMD arraysarrays withwithrespectrespect toto complexcomplex applicationsapplications whilewhile accountingaccountingforfor operatingoperating frequencyfrequency andand chip chip areaarea

RemarkRemark-- TheThe firstfirst 10% 10% ofof thethe designdesign cyclecycledetermines determines nearlynearly 80% 80% ofof a a systemsystem’’ss costcost …… manymanyalternativesalternatives in SIMD in SIMD realizationrealization

Page 9: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A SystemSystem forfor EvaluatingEvaluating PerformancePerformanceandand CostCost ofof SIMD SIMD ArrayArray DesignsDesignsJ. J. ofof ParallelParallel andand DistributedDistributed ComputingComputing60, 21760, 217--246, 2000246, 2000

DesignDesign AlternativesAlternatives in SIMD Machines (I)in SIMD Machines (I)

–– AsymmetricAsymmetric-- Control Control andand SIMD SIMD arrayarray–– PE PE DesignDesign-- CPU, CPU, datapathdatapath: 1: 1--, 8, 8--, 32, 32--bits bits ……–– PE PE MemoryMemory HierarchyHierarchy-- onon-- andand//oror offoff--chip chip memorymemory, , cachecache

((PropagatingPropagating templatestemplates))–– PE PE NearestNearest NeighborNeighbor CommunicationCommunication-- NEWS NEWS networknetwork

Page 10: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A SystemSystem forfor EvaluatingEvaluating PerformancePerformanceandand CostCost ofof SIMD SIMD ArrayArray DesignsDesignsJ. J. ofof ParallelParallel andand DistributedDistributed ComputingComputing60, 21760, 217--246, 2000246, 2000

DesignDesign AlternativesAlternatives in SIMD Machines (II)in SIMD Machines (II)–– General General CommunicationCommunication-- additionaladditional interinter--PE PE

communicationcommunication networksnetworks–– Feedback Feedback fromfrom arrayarray toto controlercontroler –– global OR global OR mechanismsmechanisms, ,

countcount ofof correspondingcorresponding PEsPEs, , flagsflags ……–– ArrayArray InstructionInstruction IssueIssue--

SpeedSpeed atat whichwhich instructionsinstructions can be can be issuedissuedTheThe latencylatency ofof thethe instructionsinstructions fromfrom thethe issuerissuer toto thethePEsPEsTheThe skewskew in in instructioninstruction distributiondistribution toto thethe variousvarious PesPesin in thethe arrayarray

–– MappingMapping data data toto processingprocessing elementselements-- oneone--toto--oneonecorrespondencecorrespondence, virtual , virtual PesPes ((VPEsVPEs))

Page 11: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A SystemSystem forfor EvaluatingEvaluating PerformancePerformanceandand CostCost ofof SIMD SIMD ArrayArray DesignsDesignsJ. J. ofof ParallelParallel andand DistributedDistributed ComputingComputing60, 21760, 217--246, 2000246, 2000

Variable Variable ParametersParameters atat PE PE levellevel::

–– PE PE datapathdatapath fromfrom 1 1 toto 64 bits64 bits–– FP FP unitsunits, , possiblypossibly sharedshared amongamong PEsPEs–– ALU ALU complexitycomplexity, , multipliermultiplier andand//oror dividerdivider–– NearestNearest--neighborneighbor communicationcommunication networknetwork parametersparameters–– NumberNumber ofof internalinternal buses buses andand partsparts in in thethe registerregister filefile–– Local Local indexingindexing–– PerPer PE PE cachingcaching–– Simple Simple pipelinedpipelined, PE , PE datapathsdatapaths

Page 12: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A SystemSystem forfor EvaluatingEvaluating PerformancePerformanceandand CostCost ofof SIMD SIMD ArrayArray DesignsDesignsJ. J. ofof ParallelParallel andand DistributedDistributed ComputingComputing60, 21760, 217--246, 2000246, 2000

Variable Variable ParametersParameters atat ArrayArray levellevel::

–– SizeSize-- 256 256 toto 16 16 millionmillion PEsPEs–– Load/store Load/store mechanismmechanism–– CommunicationCommunication networksnetworks-- broadcastbroadcast ……–– Feedback, Feedback, includingincluding OR OR andand countcount

Page 13: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

OutlineOutline

ParallelParallel ComputingComputingSIMD SIMD ComputingComputingDigital Digital SolutionsSolutionsConclusionsConclusions

Page 14: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

ContributionsContributions–– BitBit--slicedsliced digital digital implementationimplementation methodmethod ofof thethe sensorsensor--

computercomputer. . ForFor 0.18 0.18 umum andand belowbelow, digital , digital isis a viable a viable alternativealternative toto analoganalog technologytechnology

–– PhotocellsPhotocells andand A/D A/D convertersconverters sharedshared amongamong severalseveral PesPes(4) (4) withoutwithout performanceperformance degradationdegradation

–– Note: Note: TheThe chip, chip, namednamed as XENON, has as XENON, has notnot beenbeen testedtested ……

Page 15: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

TheThe ProcessingProcessing ElementElement

–– LookLook--upup--TableTable (LUT) (LUT) forfor morphologymorphology, hit, hit--andand--miss, miss, andandwhateverwhatever bitwisebitwise logiclogic operationsoperations

–– ArithmeticalArithmetical datapathdatapath-- fullfull--adderadder forfor additionaddition//subtractionsubtraction--basedbased arithmeticarithmetic

–– IndependentIndependent pathpath; ; bothboth can can workwork in in parallelparallel

Page 16: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

TheThe ProcessingProcessing ElementElement

–– 64 bits 64 bits forfor eacheach pixelpixel: 7 : 7 piecespieces ofof 8 8 bitbit variables, LAM, plus variables, LAM, plus 8 8 BooleanBoolean variables variables withwith randomrandom accessaccess

–– NeighborhoodNeighborhood interconnectionsinterconnections: : directlydirectly fromfrom thethe memorymemory, , oror throughthrough thethe crossbarcrossbar switchswitch

–– Global buses Global buses forfor Global Global interconnectioninterconnection, , andand Data Data conditioncondition flagsflags

Page 17: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

Page 18: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

Page 19: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

Page 20: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

Sensor Sensor SignalSignal AD AD ConversionConversion

–– SingleSingle--slopeslope--typetype AD AD conversionconversion-- thethe slopeslope isisprovidedprovided by a global by a global analoganalog signalsignal

–– TheThe digital digital staircasestaircase figure figure isis calculatedcalculated withwith thetheprocessorprocessor itselfitself, , particularlyparticularly withwith thethe arithmeticarithmeticpartpart

–– NeedNeed ofof anan S/HS/H

Page 21: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

Page 22: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

SensingSensing SchemesSchemes

–– Local Local integrationintegration time control time control featurefeature–– Feedback local Feedback local intensityintensity informationinformation toto thethe sensingsensing

mediummedium toto increaseincrease thethe integrationintegration time in time in thethe darkdarkregionsregions, , andand decreasedecrease itit in in thethe brightbright regionsregions

–– As a As a consequenceconsequence, , realisticrealistic compressedcompressed dynamicdynamic rangerangeimagesimages

Page 23: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

ComputationalComputational ThroughputThroughput

–– HigherHigher forfor bitbit--serial serial arithmeticarithmetic thanthan forfor bitbit parallelparallel–– StraightforwardStraightforward pipelinepipeline structuresstructures–– HighHigh speedspeed logiclogic

Page 24: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

Digital Digital ImplementationImplementation ofof CellularCellularSensorSensor--ComputersComputersInt. J. Circ. Int. J. Circ. TheorTheor. . ApplAppl. 2006; 34: 409. 2006; 34: 409--428428

Chip DataChip Data-- 0.18 0.18 umum

–– ClockClock-- 100MHz100MHz–– CellCell sizesize-- 33x33 33x33 umum–– ProcessorProcessor kernelkernel sizesize-- 33000 l33000 l22, , beingbeing l l thethe featurefeature sizesize–– SensorSensor-- 5x5 5x5 umum–– AD AD andand accompanyingaccompanying circuitrycircuitry 3000 l3000 l22

–– Note: Note: therethere are no are no measurementsmeasurements!!!!!!

Page 25: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A Digital A Digital VisionVision Chip Chip SpecializedSpecialized forforHighHigh--SpeedSpeed TargetTarget TrackingTrackingIEEE IEEE TransactionsTransactions onon ElectronElectron DevicesDevices, , vol. 50, n. 1, vol. 50, n. 1, JanuaryJanuary 20032003

GoalGoal

–– HighHigh--speedspeed targettarget trackingtracking includingincluding multitargetmultitarget trackingtrackingwithwith collisioncollision andand separationseparation

Page 26: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A Digital A Digital VisionVision Chip Chip SpecializedSpecialized forforHighHigh--SpeedSpeed TargetTarget TrackingTrackingIEEE IEEE TransactionsTransactions onon ElectronElectron DevicesDevices, , vol. 50, n. 1, vol. 50, n. 1, JanuaryJanuary 20032003

FeaturesFeatures

–– Global Global featurefeature extractionextraction-- I/O I/O bottleneckbottleneck, , scalarscalar featuresfeaturesextractedextracted fromfrom thethe arrayarray

–– NeedNeed ofof highhigh speedspeed global global operationsoperations–– TheThe chip chip calculatescalculates momentsmoments (global (global informationinformation))–– Serial Serial communicationcommunication throughthrough nearestnearest--neighborneighbor

connectionconnection, , andand global global operationsoperations performedperformed withwith bitbit serial serial cummulativecummulative addersadders atat thethe endend ofof rowsrows andand columnscolumns

–– Sensor output Sensor output binarizedbinarized-- timetime--controledcontroled–– DatapathDatapath-- bitbit--serialserial

Page 27: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A Digital A Digital VisionVision Chip Chip SpecializedSpecialized forforHighHigh--SpeedSpeed TargetTarget TrackingTrackingIEEE IEEE TransactionsTransactions onon ElectronElectron DevicesDevices, , vol. 50, n. 1, vol. 50, n. 1, JanuaryJanuary 20032003

Page 28: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A Digital A Digital VisionVision Chip Chip SpecializedSpecialized forforHighHigh--SpeedSpeed TargetTarget TrackingTrackingIEEE IEEE TransactionsTransactions onon ElectronElectron DevicesDevices, , vol. 50, n. 1, vol. 50, n. 1, JanuaryJanuary 20032003

TrackingTracking

–– Local Local operationsoperations-- logiclogic andand arithmeticarithmetic–– Global Global operationsoperations-- extractionextraction ofof momentsmoments

Page 29: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A Digital A Digital VisionVision Chip Chip SpecializedSpecialized forforHighHigh--SpeedSpeed TargetTarget TrackingTrackingIEEE IEEE TransactionsTransactions onon ElectronElectron DevicesDevices, , vol. 50, n. 1, vol. 50, n. 1, JanuaryJanuary 20032003

Page 30: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A Digital A Digital VisionVision Chip Chip SpecializedSpecialized forforHighHigh--SpeedSpeed TargetTarget TrackingTrackingIEEE IEEE TransactionsTransactions onon ElectronElectron DevicesDevices, , vol. 50, n. 1, vol. 50, n. 1, JanuaryJanuary 20032003

Page 31: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A DynamicallyDynamically ReconfigurableReconfigurable SIMD SIMD ProcessorProcessor forfor a a VisionVision ChipChipIEEE IEEE JournalJournal ofof SolidSolid--StateState CircuitsCircuits, , vol. 39, n. 1, vol. 39, n. 1, JanuaryJanuary 20042004

GoalGoal

–– ToTo overcomeovercome thethe poorpoor performanceperformance in global in global operationsoperationsshownshown by by conventionalconventional SIMD SIMD imageimage processorsprocessors

–– ContributionContribution-- SIMD SIMD visionvision chip chip thatthat reconfigures reconfigures itsits hdwhdwdynamicallydynamically by by chainingchaining processingprocessing elementselements

Page 32: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A DynamicallyDynamically ReconfigurableReconfigurable SIMD SIMD ProcessorProcessor forfor a a VisionVision ChipChipIEEE IEEE JournalJournal ofof SolidSolid--StateState CircuitsCircuits, , vol. 39, n. 1, vol. 39, n. 1, JanuaryJanuary 20042004

FeaturesFeatures

–– EarlyEarly visual visual processingprocessing-- edgeedge detectiondetection, , smoothingsmoothing ……–– Global Global featurefeature calculationcalculation-- calculatecalculate momentsmoments andand output output themthem

as as scalarscalar valuesvalues–– ScalarScalar featurefeature valuesvalues suchsuch as as summationssummations atat highhigh speedspeed–– FastFast communicationcommunication betweenbetween distantdistant PEsPEs–– GrainGrain sizesize ofof thethe PE PE andand networknetwork structurestructure dynamicallydynamically

reconfigurablereconfigurable–– KeyKey toto highhigh speedspeed-- n n PEsPEs chainedchained, ,

TheThe ALU ALU behavesbehaves as as oneone nn--bitbit ALU, ALU, memorymemory capacitycapacity multipliedmultiplied by nby nNumberNumber ofof instructionsinstructions reducedreduced

Page 33: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

A A DynamicallyDynamically ReconfigurableReconfigurable SIMD SIMD ProcessorProcessor forfor a a VisionVision ChipChipIEEE IEEE JournalJournal ofof SolidSolid--StateState CircuitsCircuits, , vol. 39, n. 1, vol. 39, n. 1, JanuaryJanuary 20042004

Page 34: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

GoalGoal-- contributionscontributions

–– PhysicalPhysical propertiesproperties ofof thethe imageimage sensor sensor itselfitself isis utilizedutilized totodo do partpart ofof thethe signalsignal processingprocessing tasktask

–– AnalogAnalog temporal temporal behaviorbehavior ofof photodiodesphotodiodes combinedcombined withwiththresholding thresholding amplifiersamplifiers

–– NonNon--linear linear operationsoperations likelike median median filterfilter oror somesome otherothertaskstasks likelike convolutionconvolution

–– AdaptivityAdaptivity toto differentdifferent lightlight levelslevels–– MomentsMoments andand shapeshape factorsfactors

Page 35: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

TheThe PhotodiodePhotodiode SensorSensor

–– InverselyInversely biasedbiased diodediode–– GeneratedGenerated photocurrentphotocurrent usedused toto dischargedischarge a a smallsmall

capacitorcapacitor thatthat has has beenbeen initiallyinitially chargedcharged toto a nominal a nominal levellevelU=U0U=U0--kItkIt

–– TwoTwo waysways toto proceedproceed1st1st-- exposureexposure time time constantconstant, , changechange thethe thresholdthreshold levellevel untiluntilitit isis equalequal toto thethe photodiodephotodiode voltagevoltage (CCD)(CCD)2nd2nd-- keepkeep thresholdthreshold constantconstant andand measuremeasure thethe time (NSIP)time (NSIP)

Page 36: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

Page 37: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

TheThe ProcessingProcessing ElementElement

–– BasedBased onon thethe traditionaltraditional bitbit--serial SIMD serial SIMD architecturearchitecture–– ExtensionsExtensions toto handlehandle imageimage operationsoperations ofof global global naturenature–– PE PE registerregister--accumulatoraccumulator orientedoriented–– MultilevelMultilevel, , graygray--scalescale operationsoperations performedperformed bitwisebitwise usingusing

thethe registersregisters forfor intermediateintermediate storagestorage ((bitbit--serial serial operationsoperations))

–– NeighborNeighbor communicationcommunication-- NEWSNEWS–– Global status Global status valuevalue-- COUNTCOUNT

Page 38: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

Page 39: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

SomeSome operationsoperations-- LocationLocation ofof thethe highesthighestintensityintensity pixelpixel-- solutionsolution sensesense thethe arrayarray ofofphotodiodesphotodiodes continuouslycontinuously afterafterprechargeprecharge. . TheThe firstfirst photodiodephotodiode toto pass pass thethe thresholdthreshold levellevel willwill correspondcorrespond toto thethelocationlocation ofof highesthighest intensityintensity

Page 40: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

SomeSome operationsoperations-- Median Median filterfilter-- forfor eacheachneighborhoodneighborhood ofof threethree, , thethe secondsecond pixelpixeltoto cross cross thethe thresholdthreshold isis thethe median median pixelpixel

Page 41: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

NearNear--Sensor Sensor ImageImage ProcessingProcessing: A : A NewNewParadigmParadigmIEEE IEEE TransactionsTransactions onon ImageImage ProcessingProcessing, , vol. 3, n. 6, Nov. 1994vol. 3, n. 6, Nov. 1994

Global Global featurefeature extractionextraction

–– ThroughThrough asynchronousasynchronous operationsoperations-- propagatingpropagating--typetypetemplatestemplates in in thethe CNN CNN frameworkframework. . AnAn inin--depthdepth studystudy in in ““Global Global FeatureFeature ExtractionExtraction OperationsOperations forfor NearNear--Sensor Sensor ImageImage ProcessingProcessing””, IEEE , IEEE TransactionsTransactions onon ImageImageProcessingProcessing, vol. 5, n. 1, , vol. 5, n. 1, JanuaryJanuary 19961996

–– Chip Chip measuredmeasured withwith 32x32 32x32 resolutionresolution in 0.8um CMOS in 0.8um CMOS technologytechnology in in ““VLSI VLSI ImplementationImplementation ofof a Focal a Focal PlanePlaneImageImage ProcessorProcessor-- A A RealizationRealization ofof thethe NearNear--Sensor Sensor ImageImageProcessingProcessing ConceptConcept””, IEEE , IEEE TransactionsTransactions onon VeryVery LargeLargeScaleScale IntegrationIntegration (VLSI) (VLSI) SystemsSystems, vol. 4, n. 3, , vol. 4, n. 3, SeptSept. 1996. 1996

Page 42: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

SomeSome otherother solutionssolutions

SIMD SIMD solutionssolutions

–– SRAM SRAM oror anyany otherother memorymemory--basedbased chip (chip (stillstill toto readreadsomethingsomething onon thisthis issueissue))

–– CAMCAM--basedbased ((stillstill toto readread somethingsomething onon thisthis issueissue))

–– FPGA FPGA solutionssolutions

Page 43: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

FPGAFPGA--BasedBased RealReal--Time Time OpticalOptical--FlowFlow SystemSystemIEEE IEEE TransactionsTransactions onon CircuitsCircuits andand SystemsSystemsforfor Video Video TechnologyTechnology, , vol.vol. 16, n.2, 16, n.2, FebruaryFebruary20062006

GoalGoal-- contributionscontributions

–– PipelinedPipelined opticaloptical--flowflow processingprocessing systemsystem-- virtual virtual motionmotionsensorsensor

–– ConventionalConventional systemsystem-- conventionalconventional camera camera actingacting as a as a frontfront--endend supportingsupporting by by anan FPGA FPGA processingprocessing devicedevice

–– FPGA FPGA programmabilityprogrammability permitspermits easyeasy changechange ofof parametersparameterstoto adaptadapt toto differentdifferent conditionsconditions likelike speedspeed, , environmentenvironment ororlightlight intensityintensity

–– StandStand--alonealone-- 30 30 HzHz withwith 320x240 320x240 pixelspixels–– HdwHdw-- StandStand--alonealone oror PCI PCI boardboard hdwhdw acceleratoraccelerator

RC1000RC1000--PP, PP, CeloxicaCeloxica-- VirtexVirtex 2000E2000E--6 6 XilinxXilinx FPGAFPGARC 200RC 200--PP, PP, CeloxicaCeloxica-- XC2V2000XC2V2000--4 FPGA4 FPGA

Page 44: Massively Parallel Computing on Silicon: SIMD …users.salleurl.edu/.../seminarios/presentaciones/SIMD.pdfand Cost of SIMD Array Designs J. of Parallel and Distributed Computing 60,

ConclusionsConclusions

EarlyEarly visual visual processingprocessing–– SIMD SIMD solutionssolutions withwith CMOS CMOS onon--chip chip implementationsimplementations forfor local local andand

global global operationsoperationsDigitalDigitalAnalogAnalog

–– FPGAFPGA--basedbased solutionssolutions

ThingsThings toto exploreexplore-- SRAM SRAM andand memorymemory--basedbased approachesapproaches alongalongwithwith how how ourour techniquestechniques couldcould be be adoptedadopted in in conventionalconventionalSIMD SIMD architecturesarchitectures ((notnot CNNsCNNs))

ContributionsContributions onon SIMDSIMD-- AtAt thethe firstfirst stagestage ofof thethe designdesign cyclecycle, , wherewhere thethe ideas ideas havehave significantsignificant impactimpact atat layout layout levellevel. . LookLookatat newnew paradigmsparadigms, , techniquestechniques, , algorithmsalgorithms ……