SQL Server 2012 Parallel Data Warehouse
date post
22-Feb-2016Category
Documents
view
40download
0
Embed Size (px)
description
Transcript of SQL Server 2012 Parallel Data Warehouse
HERZLICH WILLKOMMEN ODER ANDEREN TEXT
SQL Server 2012 Parallel Data Warehouse
Meinrad WeissPrincipal ConsultantTrivadis AG
BASEL BERN LAUSANNE ZRICH DSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MNCHEN STUTTGART WIEN
2012 Trivadis1Trivadis solution portfolio and competencesMai 2013SQL Server 2012 Parallel Data Warehouse 2IT SOLUTIONS, SERVICES & PRODUCTSTECHNOLOGIESOracle, Microsoft, IBM, Open SourceIntegration, Application Performance Management, SecurityInfrastructureEngineeringApplication DevelopmentITdepartmentsBusinessdepartmentsCUSTOMERBusiness IntelligenceBusiness Integration ServicesTrainingManaged Services2012 Trivadis2Trivadis facts & figuresdateTrivadis the company311 Trivadis locations with more than 600 employeesFinancially independent and sustainably profitableKey figures 2011Revenue CHF 104 / EUR 84 Mio.Services for more than 800 clients in over 1,900 projects200 Service Level AgreementsMore than 4,000 training participantsResearch and development budget: CHF 5.0 / EUR 4 Mio.
HamburgDusseldorfFrankfurtStuttgartMunichFreiburgViennaBaselBernZurichLausanne~380 employees~200 employees~30 employees
Mai 20133SQL Server 2012 Parallel Data Warehouse 2012 Trivadis3Visit us, win a price and be prepared for your next adventureSQL Server 2012 Parallel Data Warehouse
Mai 201342012 TrivadisAgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 20135
2012 TrivadisSQL Server 2012 Parallel Data Warehouse
Mai 201362012 TrivadisData Warehouse Products Positioning12Balanced solution for mostly scan-centric workloads.Max HW tune-up for most DW scenarios.3Most flexible architecture for handling all DW scenarios.ScaleComplexityHA by defaultSW-HW integrationSQL Server 2012Fast Track SQL Server 2012SQL Server 2012PDW123PDW with Distributed Data ArchitectureSQL Server 2012 Parallel Data Warehouse Mai 201372012 Trivadis7Built For Big DataNext-generation Performance At ScaleScale-Out Architecture on Industry Standard Hardware
SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 Parallel Data Warehouse Mai 201382012 TrivadisBig Data Integration with Polybase (transparent translation from SQL to Map/Reduce)Performance with Columnstore In-Memory technology (xVelocity feature)Industry Standard Hardware leveraged from HP and Dell, Best TCO in Market8AgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 20139
2012 TrivadisParallel Data Warehouse at a glanceShared-nothing parallel database systemMassively parallel processing (MPP)A Control server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodesMultiple Compute servers running SQL ServerDelivered as an applianceBalanced and pre-configured hardware and softwareScales from 2 to 56 NodesFastest Time to Market
SQL Server 2012 Parallel Data Warehouse Mai 2013102012 Trivadis10Base Unit (1/4 Rack)All hosts and VMs run Windows Server 2012 StandardAll Fabric and workload activity happens in Hyper-V virtual machines, with Fabric VMs sharing 1 serverFailover is handled by Hyper-VPDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workloadWindows Storage Spaces handles mirroring and spares
Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201311VMM2012 TrivadisWindow Server 2012DMS Core SQL Server 2012 8 Distributions spread across 32 Disk Spindles (+Spare), per Node!
11Current Limitations: Performance and Scale Up todayScale Up (and pay) Old server will be obsolete if bigger system is required$$$$$$$$$Mai 201312SQL Server 2012 Parallel Data Warehouse ??2012 TrivadisToday, if you are not using a MPP scale out appliance, most likely your data warehouse is built on the traditional Scale UP, SMP architecture, organized as row stores.A Scale Up solution runs queries sequentially on a shared everything architecture. This essentially means that everything is processed on a single box which thus shares memory, disk, I/O operations, etc. In order to get more scale in a scale up solution, you will need to acquire a more powerful HW box every time. You will not be able to add more HW to the existing rack solution. A scale up solution also has diminishing returns after a certain scale. Rowstores store data in traditional tables in row-wise fashion. The values comprising one row are stored contiguously on a page. Rowstores are sometimes not optimal for many queries that are being issued to the data warehouse. This is because, the query will return back the entire row of data including fields that might not be needed as part of the query.The combination of Scale UP SMP and rowstores are often common limitations to existing warehouses that affect performance.
12SCALING FROM 2 TO 56 NODESAppliance can grow with increasing workload
Provisioning consists of 3 phases:
Bare metal provisioning of new nodes
Provisioning of workload VMs and hooking up to other workload VMs
Redistribution of data
Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2Host 2Host 3Storage SpacesCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201313VMM2012 TrivadisWindow Server 2012DMS Core SQL Server 2012 8 Distributions spread across 32 Disk Spindles (+Spare), per Node!
13481216202428323640SQL Server Compute NodesSystem ThroughputRegular SQL Server( 1 Node)Seamless Scalability5 Nodes10 Nodes20 Nodes30 Nodes30 NodesMai 201314SQL Server 2012 Parallel Data Warehouse 2012 TrivadisHigh AvailabilityStorage Spaces manages the physical disks in the disk enclosures
Failover:One cluster across the whole applianceVMs are automatically migrated on host failureAffinity and anti-affinity maps enforce rules
Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201315VMM2012 TrivadisAgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWIntroduction to Big DataMicrosoft and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 201316
2012 TrivadisDistribution and Replication of Data: ReplicateTime DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeProduct DimProd Dim IDProd CategoryProd Sub CatProd DescMktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndSQLSQLSQLSQLTDPDSDMDTDPDSDMDTDPDSDMDSmaller ( 10 GB) Fact Table is Hash Distributed Across All Compute Nodes
SF-1SF-2SF-3SF-4Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeProduct DimProd Dim IDProd CategoryProd Sub CatProd DescMktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndSales FactsDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars SoldSQL Server 2012 Parallel Data Warehouse Mai 201319
2012 Trivadis19PDW Design Basics: Distributed and Replicated TablesDesign the schema to minimize data movement during the typical workload:Moving large amounts of data to the Control Node (Partition Move) is most expensive operation Shuffling data between nodes is also expensive (Shuffle Move) but is more scalableDesign to encourage local joins and local aggregations
Replicated TablesCreate all lookup tables, small reference tables, and dimension tables as replicated. Exception: Very large dimensions (e.g., customer) may be better as distributed tables.Any queries purely against replicated tables occur on just one node selected randomly, reducing performance.If you distribute a dimension table, try to distribute joined fact (or transaction) tables on the same key.Beware: Load and update times are much longer for replicated tables ~ 1% the load speed of distributed tables.
Distribution on a PDWPDW Node 1Create Table _aCreate Table _bCreate Table _h8 Tables per NodePDW Node 2Create Table _aCreate Table _bCreate Table _hPDW Node 8Create Table _aCreate Table _bCreate Table _hPDW Node Final Result:64 individual tables across a 8 node (1 data rack HP) applianceCREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));Mai 201320SQL Server 2012 Parallel Data Warehouse 2012 TrivadisA bit PDW-Arithmetics 2 QuadCore Procs per Compute Node 8 Cores x 10 Nodes = 80 individual Tables per logical table per Appliance ;-P20
xVelocity gives next-gen performanceLightning Fast Data Query Processing
CustomerSalesCountrySupplierProductsColumnstore Provides Dramatic Performance
Updateable and clustered xVelocity columnstoreStores data in columnar formatMemory-optimized for next-generation performanceUpdateable to support bulk and/or trickle loading
Save Timeand Costs
Real-TimeDW
Up to50X Faster
Up to 15x compressionMai 201321SQL Server 2012 Parallel Data Warehouse 2012 TrivadisBy changing the primary storage engine of SQL Server 2012 Parallel Data Warehouse to a new updateable version of xVelocity memory optimized columnstore, organizations can change to a columnstore (as opposed to rowstore). In a columnstore, data is grouped and stored one column at a time. The benefits to doing this are as follows:Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.Columns are heavily compressed. This reduces the number of bytes that must be read and moved.Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, comb