SQL Server 2012 Parallel Data Warehouse

of 33 /33
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN SQL Server 2012 Parallel Data Warehouse Meinrad Weiss Principal Consultant Trivadis AG

Embed Size (px)

description

SQL Server 2012 Parallel Data Warehouse . Meinrad Weiss Principal Consultant Trivadis AG. Trivadis solution portfolio and competences. CUSTOMER. IT SOLUTIONS, SERVICES & PRODUCTS. Business departments. Business Integration Services. Business Intelligence. Infrastructure Engineering. - PowerPoint PPT Presentation

Transcript of SQL Server 2012 Parallel Data Warehouse

HERZLICH WILLKOMMEN ODER ANDEREN TEXT

SQL Server 2012 Parallel Data Warehouse

Meinrad WeissPrincipal ConsultantTrivadis AG

BASEL BERN LAUSANNE ZRICH DSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MNCHEN STUTTGART WIEN

2012 Trivadis1Trivadis solution portfolio and competencesMai 2013SQL Server 2012 Parallel Data Warehouse 2IT SOLUTIONS, SERVICES & PRODUCTSTECHNOLOGIESOracle, Microsoft, IBM, Open SourceIntegration, Application Performance Management, SecurityInfrastructureEngineeringApplication DevelopmentITdepartmentsBusinessdepartmentsCUSTOMERBusiness IntelligenceBusiness Integration ServicesTrainingManaged Services2012 Trivadis2Trivadis facts & figuresdateTrivadis the company311 Trivadis locations with more than 600 employeesFinancially independent and sustainably profitableKey figures 2011Revenue CHF 104 / EUR 84 Mio.Services for more than 800 clients in over 1,900 projects200 Service Level AgreementsMore than 4,000 training participantsResearch and development budget: CHF 5.0 / EUR 4 Mio.

HamburgDusseldorfFrankfurtStuttgartMunichFreiburgViennaBaselBernZurichLausanne~380 employees~200 employees~30 employees

Mai 20133SQL Server 2012 Parallel Data Warehouse 2012 Trivadis3Visit us, win a price and be prepared for your next adventureSQL Server 2012 Parallel Data Warehouse

Mai 201342012 TrivadisAgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 20135

2012 TrivadisSQL Server 2012 Parallel Data Warehouse

Mai 201362012 TrivadisData Warehouse Products Positioning12Balanced solution for mostly scan-centric workloads.Max HW tune-up for most DW scenarios.3Most flexible architecture for handling all DW scenarios.ScaleComplexityHA by defaultSW-HW integrationSQL Server 2012Fast Track SQL Server 2012SQL Server 2012PDW123PDW with Distributed Data ArchitectureSQL Server 2012 Parallel Data Warehouse Mai 201372012 Trivadis7Built For Big DataNext-generation Performance At ScaleScale-Out Architecture on Industry Standard Hardware

SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 Parallel Data Warehouse Mai 201382012 TrivadisBig Data Integration with Polybase (transparent translation from SQL to Map/Reduce)Performance with Columnstore In-Memory technology (xVelocity feature)Industry Standard Hardware leveraged from HP and Dell, Best TCO in Market8AgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 20139

2012 TrivadisParallel Data Warehouse at a glanceShared-nothing parallel database systemMassively parallel processing (MPP)A Control server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodesMultiple Compute servers running SQL ServerDelivered as an applianceBalanced and pre-configured hardware and softwareScales from 2 to 56 NodesFastest Time to Market

SQL Server 2012 Parallel Data Warehouse Mai 2013102012 Trivadis10Base Unit (1/4 Rack)All hosts and VMs run Windows Server 2012 StandardAll Fabric and workload activity happens in Hyper-V virtual machines, with Fabric VMs sharing 1 serverFailover is handled by Hyper-VPDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workloadWindows Storage Spaces handles mirroring and spares

Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201311VMM2012 TrivadisWindow Server 2012DMS Core SQL Server 2012 8 Distributions spread across 32 Disk Spindles (+Spare), per Node!

11Current Limitations: Performance and Scale Up todayScale Up (and pay) Old server will be obsolete if bigger system is required$$$$$$$$$Mai 201312SQL Server 2012 Parallel Data Warehouse ??2012 TrivadisToday, if you are not using a MPP scale out appliance, most likely your data warehouse is built on the traditional Scale UP, SMP architecture, organized as row stores.A Scale Up solution runs queries sequentially on a shared everything architecture. This essentially means that everything is processed on a single box which thus shares memory, disk, I/O operations, etc. In order to get more scale in a scale up solution, you will need to acquire a more powerful HW box every time. You will not be able to add more HW to the existing rack solution. A scale up solution also has diminishing returns after a certain scale. Rowstores store data in traditional tables in row-wise fashion. The values comprising one row are stored contiguously on a page. Rowstores are sometimes not optimal for many queries that are being issued to the data warehouse. This is because, the query will return back the entire row of data including fields that might not be needed as part of the query.The combination of Scale UP SMP and rowstores are often common limitations to existing warehouses that affect performance.

12SCALING FROM 2 TO 56 NODESAppliance can grow with increasing workload

Provisioning consists of 3 phases:

Bare metal provisioning of new nodes

Provisioning of workload VMs and hooking up to other workload VMs

Redistribution of data

Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2Host 2Host 3Storage SpacesCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201313VMM2012 TrivadisWindow Server 2012DMS Core SQL Server 2012 8 Distributions spread across 32 Disk Spindles (+Spare), per Node!

13481216202428323640SQL Server Compute NodesSystem ThroughputRegular SQL Server( 1 Node)Seamless Scalability5 Nodes10 Nodes20 Nodes30 Nodes30 NodesMai 201314SQL Server 2012 Parallel Data Warehouse 2012 TrivadisHigh AvailabilityStorage Spaces manages the physical disks in the disk enclosures

Failover:One cluster across the whole applianceVMs are automatically migrated on host failureAffinity and anti-affinity maps enforce rules

Host 1Host 0Host 2Host 3Storage SpacesIB &EthernetDirect attached SASControlFAB ADCompute 1Compute 2SQL Server 2012 Parallel Data Warehouse Mai 201315VMM2012 TrivadisAgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWIntroduction to Big DataMicrosoft and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 201316

2012 TrivadisDistribution and Replication of Data: ReplicateTime DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeProduct DimProd Dim IDProd CategoryProd Sub CatProd DescMktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndSQLSQLSQLSQLTDPDSDMDTDPDSDMDTDPDSDMDSmaller ( 10 GB) Fact Table is Hash Distributed Across All Compute Nodes

SF-1SF-2SF-3SF-4Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeProduct DimProd Dim IDProd CategoryProd Sub CatProd DescMktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndSales FactsDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars SoldSQL Server 2012 Parallel Data Warehouse Mai 201319

2012 Trivadis19PDW Design Basics: Distributed and Replicated TablesDesign the schema to minimize data movement during the typical workload:Moving large amounts of data to the Control Node (Partition Move) is most expensive operation Shuffling data between nodes is also expensive (Shuffle Move) but is more scalableDesign to encourage local joins and local aggregations

Replicated TablesCreate all lookup tables, small reference tables, and dimension tables as replicated. Exception: Very large dimensions (e.g., customer) may be better as distributed tables.Any queries purely against replicated tables occur on just one node selected randomly, reducing performance.If you distribute a dimension table, try to distribute joined fact (or transaction) tables on the same key.Beware: Load and update times are much longer for replicated tables ~ 1% the load speed of distributed tables.

Distribution on a PDWPDW Node 1Create Table _aCreate Table _bCreate Table _h8 Tables per NodePDW Node 2Create Table _aCreate Table _bCreate Table _hPDW Node 8Create Table _aCreate Table _bCreate Table _hPDW Node Final Result:64 individual tables across a 8 node (1 data rack HP) applianceCREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));Mai 201320SQL Server 2012 Parallel Data Warehouse 2012 TrivadisA bit PDW-Arithmetics 2 QuadCore Procs per Compute Node 8 Cores x 10 Nodes = 80 individual Tables per logical table per Appliance ;-P20

xVelocity gives next-gen performanceLightning Fast Data Query Processing

CustomerSalesCountrySupplierProductsColumnstore Provides Dramatic Performance

Updateable and clustered xVelocity columnstoreStores data in columnar formatMemory-optimized for next-generation performanceUpdateable to support bulk and/or trickle loading

Save Timeand Costs

Real-TimeDW

Up to50X Faster

Up to 15x compressionMai 201321SQL Server 2012 Parallel Data Warehouse 2012 TrivadisBy changing the primary storage engine of SQL Server 2012 Parallel Data Warehouse to a new updateable version of xVelocity memory optimized columnstore, organizations can change to a columnstore (as opposed to rowstore). In a columnstore, data is grouped and stored one column at a time. The benefits to doing this are as follows:Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache.Columns are heavily compressed. This reduces the number of bytes that must be read and moved.Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage, which reduces total I/O.Once xVelocity columnstore is implemented, organizations to enjoy high compression on their data (4-15x) and next-generation performance (up to 50x faster). New in SQL Server 2012 PDW: Updateable and clustered columnstore:SQL Server 2012 PDW is using the new updateable xVelocity columnstore which is yet to be released in the next major version of SQL Server. Updates and direct bulk load are fully supported simplifying and speeding up data loading and enabling real-time data warehousing and trickle loading. It can also save roughly 70% on overall storage space if you chose to eliminate the row store copy of the data entirely.

21

Web-Based Management DashboardSQL Server 2012 Parallel Data Warehouse Mai 2013222012 Trivadis22Web-Based Management Dashboard (2)SQL Server 2012 Parallel Data Warehouse

Mai 2013232012 Trivadis23Web-Based Management Dashboard (3)SQL Server 2012 Parallel Data Warehouse

Mai 2013242012 Trivadis24PDW Querying 1 Petabyte of data in 1 second

Mai 201325SQL Server 2012 Parallel Data Warehouse 294000000000 rowshttp://www.sqlpass.org/summit/2012/DayOneKeynote.aspx2012 Trivadishttp://www.sqlpass.org/summit/2012/DayOneKeynote.aspx 1hr 10min / 1hr 17min / 1hr 24min (Polybase)http://www.sqlpass.org/summit/2012/DayTwoKeynote.aspx 1hr 15min25Reference Case: Todays process flow / Building blocksDB_GSAPOPDB_MasterTablesDB_ReportTablesFinanceCubeBaseline : Once data extracted from SAP:Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies)DW_FinanceTransactions

MasterFinance table population6 hours 21min6 hours 1 hourSuspicious wordsReports 3hr21minSAP

Mai 201326SQL Server 2012 Parallel Data Warehouse 2012 Trivadis26Reference Case: Audit Process with PDWDB_GSAPOPDB_MasterTablesDB_ReportTablesFinanceCubeOnce data is extracted from SAP:Creating 5 CM Reports & FSCP Finance Cube;Time taken: 30 MinutesDW_FinanceTransactions

MasterFinance table population 8m50sec load from FlatFile23min10m10sec11 min

All 5 Reportswithin 6min(80)(80)(80)(80)SAP

Mai 201327SQL Server 2012 Parallel Data Warehouse 2012 Trivadis27AgendaPositioning Parallel Data Warehouse (PDW)ArchitectureWorking with PDWPDW and Big Data

SQL Server 2012 Parallel Data Warehouse Mai 201328

2012 Trivadis

Introducing PolyBaseSeamless integration of two worlds

Single Query; Structured and Unstructured

Query and join Hadoop tables with Relational TablesUse Standard SQL language Select, From Where

Existing SQLSkillset

Save Timeand Costs

DatabaseHDFS (Hadoop)

SQL Server 2012 PDW Powered by PolyBase

SQL

Analyze AllData Types

Mai 201329SQL Server 2012 Parallel Data Warehouse 2012 TrivadisMicrosoft solves this limitation through a new feature in SQL Server 2012 PDW called PolyBase. For PolyBase, we significantly improve how traditional data warehouses interacts with Hadoop. It enables integrated query across Hadoop and relational data. Without a prior manual intervention (moving the data by IT), PolyBase Query Processor can accept a standard SQL query and join tables from a relational source with tables from a Hadoop source to return a combined result seamlessly to the user. There are two main benefits around this:The user only issues a standard SQL Query. They dont need to learn MapReduce. This means, standard BI tools can be used to query structured and unstructured data togetherExample Standard SQL query: select c from hdfsCustomer, o from PDW where c.c_custkey = o.o_custkey

IT doesnt need to do a prior mapping and moving of data from HDFS into the warehouse. PolyBase was pioneered in Created in Jim Gray Systems Labs by David DeWitt, former Professor Emeritus of Computer Sciences. Dr. DeWitt is known for pioneering research in parallel databases, benchmarking, object-oriented and XML databases. He has over 120 papers published.Details of how this work is as follows:When the user uses PolyBase, they can see the tables in Hadoop through external tables. They can then select the tables they want in Hadoop and join them with PDWPolyBase will then import/export Hadoop data to and from PDW in parallel for performance (using DMS on each PDW compute node) Unstructured data that returns will be temporarily or permanently stored on PDW and compute node does all processing to join and return back resultsAdditionally, Microsoft also released a Windows-based Hadoop distribution (HD Inisghts) both on premise on Windows Server and on the cloud with Azure. 29Additional ResourcesSQL Server Parallel Data Warehouse (PDW) Landing Page:www.microsoft.com/PDWIntroduction to Polybase:http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx Price/TB comparison:http://www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf HP QuickSpecshttp://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.pdf

Mai 201330SQL Server 2012 Parallel Data Warehouse 2012 TrivadisConclusionSQL Server 2012 Parallel Data Warehouse is an MPP based appliance optimized for data warehouse workloadVery Similar to a regular SQL Server, but T-SQL not 100 % identicalSupport for development via SQL Server Data ToolsScalable, balanced platformNot just storage (data + CPU power)Central part of Microsoft BI stackWell integrated in Microsoft management tools

SQL Server 2012 Parallel Data Warehouse Mai 2013312012 TrivadisVisit us, win a price and be prepared for your next adventureSQL Server 2012 Parallel Data Warehouse

Mai 2013322012 TrivadisLets go.Wettbewerb

BASEL BERN LAUSANNE ZRICH DSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MNCHEN STUTTGART WIEN

2012 Trivadis33