SQL Server 2012 Parallel Data Warehouse

Click here to load reader

  • date post

    22-Feb-2016
  • Category

    Documents

  • view

    40
  • download

    0

Embed Size (px)

description

SQL Server 2012 Parallel Data Warehouse . Meinrad Weiss Principal Consultant Trivadis AG. Trivadis solution portfolio and competences. CUSTOMER. IT SOLUTIONS, SERVICES & PRODUCTS. Business departments. Business Integration Services. Business Intelligence. Infrastructure Engineering. - PowerPoint PPT Presentation

Transcript of SQL Server 2012 Parallel Data Warehouse

HERZLICH WILLKOMMEN ODER ANDEREN TEXT

SQL Server 2012 Parallel Data Warehouse

Meinrad WeissPrincipal ConsultantTrivadis AG

BASEL BERN LAUSANNE ZRICH DSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MNCHEN STUTTGART WIEN

2012 Trivadis1Trivadis solution portfolio and competencesMai 2013SQL Server 2012 Parallel Data Warehouse 2IT SOLUTIONS, SERVICES & PRODUCTSTECHNOLOGIESOracle, Microsoft, IBM, Open SourceIntegration, Application Performance Management, SecurityInfrastructureEngineeringApplication DevelopmentITdepartmentsBusinessdepartmentsCUSTOMERBusiness IntelligenceBusiness Integration ServicesTrainingManaged Services2012 Trivadis2Trivadis facts & figuresdateTrivadis the company311 Trivadis locations with more than 600 employeesFinancially independent and sustainably profitableKey figures 2011Revenue CHF 104 / EUR 84 Mio.Services for more than 800 clients in over 1,900 projects200 Service Level AgreementsMore than 4,000 training participantsResearch and development budget: CHF 5.0 / EUR 4 Mio.

HamburgDusseldorfFrankfurtStuttgartMunichFreiburgViennaBaselBernZurichLausanne~380 employees~200 employees~30 employees

Mai 20133SQL Server 2012 Parallel Data Warehouse 2012 Trivadis3Visit us, win a price and be prepared for your next adventureSQL Server 2012 Parallel Data Warehouse

Mai 201342012 TrivadisAgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 20135

2012 TrivadisSQL Server 2012 Parallel Data Warehouse

Mai 201362012 TrivadisData Warehouse Products Positioning12Balanced solution for mostly scan-centric workloads.Max HW tune-up for most DW scenarios.3Most flexible architecture for handling all DW scenarios.ScaleComplexityHA by defaultSW-HW integrationSQL Server 2012Fast Track SQL Server 2012SQL Server 2012PDW123PDW with Distributed Data ArchitectureSQL Server 2012 Parallel Data Warehouse Mai 201372012 Trivadis7Built For Big DataNext-generation Performance At ScaleScale-Out Architecture on Industry Standard Hardware

SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 Parallel Data Warehouse Mai 201382012 TrivadisBig Data Integration with Polybase (transparent translation from SQL to Map/Reduce)Performance with Columnstore In-Memory technology (xVelocity feature)Industry Standard Hardware leveraged from HP and Dell, Best TCO in Market8AgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 20139

2012 TrivadisParallel Data Warehouse at a glanceShared-nothing parallel database systemMassively parallel processing (MPP)A Control server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodesMultiple Compute servers running SQL ServerDelivered as an applianceBalanced and pre-configured hardware and softwareScales from 2 to 56 NodesFastest Time to Market

SQL Server 2012 Parallel Data Warehouse Mai 2013102012 Trivadis10Base Unit (1/4 Rack)All hosts and VMs run Windows Server 2012 StandardAll Fabric and workload activity happens in Hyper-V virtual machines, with Fabric VMs sharing 1 serverFailover is handled by Hyper-VPDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workloadWindows Storage Spaces handles mirroring and spares

Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201311VMM2012 TrivadisWindow Server 2012DMS Core SQL Server 2012 8 Distributions spread across 32 Disk Spindles (+Spare), per Node!

11Current Limitations: Performance and Scale Up todayScale Up (and pay) Old server will be obsolete if bigger system is required$$$$$$$$$Mai 201312SQL Server 2012 Parallel Data Warehouse ??2012 TrivadisToday, if you are not using a MPP scale out appliance, most likely your data warehouse is built on the traditional Scale UP, SMP architecture, organized as row stores.A Scale Up solution runs queries sequentially on a shared everything architecture. This essentially means that everything is processed on a single box which thus shares memory, disk, I/O operations, etc. In order to get more scale in a scale up solution, you will need to acquire a more powerful HW box every time. You will not be able to add more HW to the existing rack solution. A scale up solution also has diminishing returns after a certain scale. Rowstores store data in traditional tables in row-wise fashion. The values comprising one row are stored contiguously on a page. Rowstores are sometimes not optimal for many queries that are being issued to the data warehouse. This is because, the query will return back the entire row of data including fields that might not be needed as part of the query.The combination of Scale UP SMP and rowstores are often common limitations to existing warehouses that affect performance.

12SCALING FROM 2 TO 56 NODESAppliance can grow with increasing workload

Provisioning consists of 3 phases:

Bare metal provisioning of new nodes

Provisioning of workload VMs and hooking up to other workload VMs

Redistribution of data

Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2Host 2Host 3Storage SpacesCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201313VMM2012 TrivadisWindow Server 2012DMS Core SQL Server 2012 8 Distributions spread across 32 Disk Spindles (+Spare), per Node!

13481216202428323640SQL Server Compute NodesSystem ThroughputRegular SQL Server( 1 Node)Seamless Scalability5 Nodes10 Nodes20 Nodes30 Nodes30 NodesMai 201314SQL Server 2012 Parallel Data Warehouse 2012 TrivadisHigh AvailabilityStorage Spaces manages the physical disks in the disk enclosures

Failover:One cluster across the whole applianceVMs are automatically migrated on host failureAffinity and anti-affinity maps enforce rules

Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201315VMM2012 TrivadisAgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWIntroduction to Big DataMicrosoft and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 201316

2012 TrivadisDistribution and Replication of Data: ReplicateTime DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeProduct DimProd Dim IDProd CategoryProd Sub CatProd DescMktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndSQLSQLSQLSQLTDPDSDMDTDPDSDMDTDPDSDMDSmaller ( 10 GB) Fact Table is Hash Distributed Across All Compute Nodes

SF-1SF-2SF-3SF-4Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeProduct DimProd Dim IDProd CategoryProd Sub CatProd DescMktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndSales FactsDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars SoldSQL Server 2012 Parallel Data Warehouse Mai 201319

2012 Trivadis19PDW Design Basics: Distributed and Replicated TablesDesign the schema to minimize data movement during the typical workload:Moving large amounts of data to the Control Node (Partition Move) is most expensive operation Shuffling data between nodes is also expensive (Shuffle Move) but is more scalableDesign to encourage local joins and local aggregations

Replicated TablesCreate all lookup tables, small reference tables, and dimension tables as replicated. Exception: Very large dimensions (e.g., customer) may be better as distributed tables.Any queries purely against replicated tables occur on just one node selected randomly, reducing performance.If you distribute a dimension table, try to distribute joined fact (or transaction) tables on the same key.Beware: Load and update times are much longer for replicated tables ~ 1% the load speed of distributed tables.

Distribution on a PDWPDW Node 1Create Table _aCreate Table _bCreate Table _h8 Tables per NodePDW Node 2Create Table _aCreate Table _bCreate Table _hPDW Node 8Create Table _aCreate Table _bCreate Table _hPDW Node Final Result:64 individual tables across a 8 node (1 data rack HP) applianceCREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));Mai 201320SQL Server 2012 Parallel Data Warehouse 2012 TrivadisA bit PDW-Arithmetics 2 QuadCore Procs per Compute Node 8 Cores x 10 Nodes = 80 individual Tables per logical table per Appliance ;-P20

xVelocity gives next-gen performanceLightning Fast Data Query Processing

CustomerSalesCountrySupplierProductsColumnstore Provides Dramatic Performance

Updateable and clustered xVelocity columnstoreStores data in columnar formatMemory-optimized for next-generation performanceUpdateable to support bulk and/or trickle loading

Save Timeand Costs

Real-TimeDW

Up to50X Faster

Up to 15x compressionMai 201321SQL Server 2012 Parallel Data Warehouse 2012 TrivadisBy changing the primary storage engine of SQL Server 2012 Parallel Data Warehouse to a new updateable version of xVelocity memory optimized columnstore, organizations can change to a columnstore (as opposed to rowstore). In a columnstore, data is grouped and stored one column at a time. The benefits to doing this are as follows:Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.Columns are heavily compressed. This reduces the number of bytes that must be read and moved.Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, comb