SQL Server 2012 Parallel Data Warehouse
Embed Size (px)
description
Transcript of SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
SQL Server 2012 Parallel Data Warehouse
Meinrad WeissPrincipal ConsultantTrivadis AG

2012 © Trivadis
Trivadis solution portfolio and competences
Mai 2013SQL Server 2012 Parallel Data Warehouse
2
IT SOLUTIONS, SERVICES & PRODUCTS
TECHNOLOGIESOracle, Microsoft, IBM, Open Source
Integration, Application Performance Management, Security
InfrastructureEngineering
Application Development
ITdepartment
s
Businessdepartment
s
CUSTOMER
Business Intelligence
Business Integration Services
Training
Managed Services

2012 © Trivadis
Trivadis facts & figures
dateTrivadis – the company
3
11 Trivadis locations with more than 600 employees
Financially independent and sustainably profitable
Key figures 2011
Revenue CHF 104 / EUR 84 Mio.
Services for more than 800 clients in over 1,900 projects
200 Service Level Agreements
More than 4,000 training participants
Research and development budget: CHF 5.0 / EUR 4 Mio.
Hamburg
Dusseldorf
Frankfurt
Stuttgart
MunichFreiburg
Vienna
Basel
Bern
Zurich
Lausanne~380 employees
~200 employees
~30 employees
Mai 2013
3SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
Visit us, win a price and be prepared for your next adventure
SQL Server 2012 Parallel Data Warehouse Mai 2013
4

2012 © Trivadis
Agenda
1. Positioning Parallel Data Warehouse (PDW)2. Architecture3. Working with PDW4. PDW and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 2013
5

2012 © TrivadisSQL Server 2012 Parallel Data Warehouse Mai 2013
6

2012 © Trivadis
Data Warehouse – Products Positioning
1
2
Balanced solution for mostly scan-centric workloads.Max HW tune-up for most DW scenarios.
3 Most flexible architecture for handling all DW scenarios.
ScaleComplexityHA by defaultSW-HW integration
SQL Server 2012Fast Track
SQL Server 2012
SQL Server 2012PDW
1
2
3
PDW with Distributed
Data Architecture
SQL Server 2012 Parallel Data Warehouse Mai 2013
7

2012 © Trivadis
Built For Big DataNext-generation Performance At Scale
Scale-Out Architecture on Industry Standard
Hardware
SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 Parallel Data Warehouse Mai 2013
8

2012 © Trivadis
Agenda
1. Positioning Parallel Data Warehouse (PDW)2. Architecture3. Working with PDW4. PDW and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 2013
9

2012 © Trivadis
Parallel Data Warehouse at a glance
Shared-nothing parallel database system Massively parallel processing (MPP) A “Control” server that accepts user queries,
generates a plan, and distributes operations in parallel to compute nodes
Multiple “Compute” servers running SQL ServerDelivered as an appliance Balanced and pre-configured hardware and software Scales from 2 to 56 Nodes Fastest Time to Market
SQL Server 2012 Parallel Data Warehouse Mai 2013
10

2012 © Trivadis
Base Unit (1/4 Rack)
All hosts and VM’s run Windows Server 2012 Standard All Fabric and workload
activity happens in Hyper-V virtual machines, with Fabric VM’s sharing 1 server
Failover is handled by Hyper-V
PDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workload
Windows Storage Spaces handles mirroring and spares
Host 1
Host 0
Host 2
Host 3
Storage Spaces
IB &Ethernet Direct attached
SAS
Control FAB AD
Compute 1
Compute 2
SQL Server 2012 Parallel Data Warehouse Mai 2013
11
VMM

2012 © Trivadis
Current Limitations: Performance and Scale Up today
Scale Up (and pay)
Old server will be obsolete if bigger system is required
$$$ $$$ $$$
Mai 2013
12SQL Server 2012 Parallel Data Warehouse
? ?

2012 © Trivadis
SCALING FROM 2 TO 56 NODES
Appliance can grow with increasing workload
Provisioning consists of 3 phases:
Bare metal provisioning of new nodes
Provisioning of workload VMs and ‘hooking up’ to other workload VMs
Redistribution of data
Host 1
Host 0
Host 2
Host 3
Storage Spaces
IB &Ethernet Direct attached
SAS
Control FAB AD
Compute 1
Compute 2
Host 2
Host 3
Storage Spaces
Compute 1
Compute 2
SQL Server 2012 Parallel Data Warehouse Mai 2013
13
VMM

2012 © Trivadis4 8 12 16 20 24 28 32 36 40SQL Server Compute Nodes
Sys
tem
Thr
ough
put
Regular SQL Server
( 1 Node)
Seamless Scalability
5 Nodes
10 Nodes
20 Nodes
30 Nodes
30 Nodes
Mai 2013
14SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
High Availability
Storage Spaces manages the physical disks in the disk enclosures
Failover: One cluster across the whole
appliance VMs are automatically
migrated on host failure Affinity and anti-affinity maps
enforce rules
Host 1
Host 0
Host 2
Host 3
Storage Spaces
IB &Ethernet Direct attached
SAS
Control FAB AD
Compute 1
Compute 2
SQL Server 2012 Parallel Data Warehouse Mai 2013
15
VMM

2012 © Trivadis
Agenda
1. Positioning Parallel Data Warehouse (PDW)2. Architecture3. Working with PDW4. Introduction to Big Data5. Microsoft and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 2013
16

2012 © Trivadis
Distribution and Replication of Data: Replicate
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim ID
Store NameStore MgrStore Size
Product DimProd Dim ID
Prod CategoryProd Sub CatProd Desc
MktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End
SQL
SQL
SQL
SQL
TD
PD
SD
MD
TD
PD
SD
MD
TD
PD
SD
MD
Smaller (<5GB ) Dimension Tables are Replicated on Every
Compute NodeTD
PD
SD
MD
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
Result: Fact -Dimension Joins can be performed
locally
SQL Server 2012 Parallel Data Warehouse Mai 2013
17

2012 © Trivadis
Create Replicated TableCREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) )WITH (CLUSTERED COLUMNSTORE INDEX);
Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group.
Data Compression is automatically turned on
SQL Server 2012 Parallel Data Warehouse Mai 2013
18

2012 © Trivadis
TD
PD
SD
MD
TD
PD
SD
MD
TD
PD
SD
MD
TD
PD
SD
MD
Distribution and Replication of Data: Distribute
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1SQL
SQL
SQL
SQL
Larger (> 10 GB) Fact Table is Hash
Distributed Across All Compute Nodes
SF-1
SF-2
SF-3
SF-4
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim ID
Store NameStore MgrStore Size
Product DimProd Dim ID
Prod CategoryProd Sub CatProd Desc
MktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
SQL Server 2012 Parallel Data Warehouse Mai 2013
19

2012 © Trivadis
Distribution on a PDW
PDW Node 1Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h
8 Tables per Node
PDW Node 2Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h
PDW Node 8Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h
PDW Node …
Final Result:64 individual tables across a 8 node (1 data rack HP) appliance
CREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));
Mai 2013
20SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
xVelocity gives next-gen performanceLightning Fast Data Query Processing
Custom
er
Sales
Country
Supplier
Products
Columnstore Provides Dramatic Performance• Updateable and clustered xVelocity
columnstore• Stores data in columnar format• Memory-optimized for next-generation
performance• Updateable to support bulk and/or trickle
loadingSave Timeand Costs
Real-TimeDW
Up to50X Faster
Up to 15x compression
Mai 2013
21SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
Web-Based Management Dashboard
SQL Server 2012 Parallel Data Warehouse Mai 2013
22

2012 © Trivadis
Web-Based Management Dashboard (2)
SQL Server 2012 Parallel Data Warehouse Mai 2013
23

2012 © Trivadis
Web-Based Management Dashboard (3)
SQL Server 2012 Parallel Data Warehouse Mai 2013
24

2012 © Trivadis
PDW Querying 1 Petabyte of data in 1 second
Mai 2013
25SQL Server 2012 Parallel Data Warehouse
294‘000‘000‘000 rows
http://www.sqlpass.org/summit/2012/DayOneKeynote.aspx

2012 © Trivadis
Reference Case: Today’s process flow / Building blocks
DB_GSAPOP
DB_MasterTables
DB_ReportTables
FinanceCube
Baseline : Once data extracted from SAP:Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies)
DW_FinanceTransactions
MasterFinance table population
6 hours 21min
6 hours 1 hour
Suspicious words
Reports
3hr21min
SAP
Mai 2013
26SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
Reference Case: Audit Process with PDW
DB_GSAPOP
DB_MasterTables
DB_ReportTables
FinanceCube
Once data is extracted from SAP:Creating 5 CM Reports & FSCP Finance Cube;Time taken: 30 Minutes
DW_FinanceTransactions
MasterFinance table population 8m50sec
load from FlatFile23min
10m10sec 11 min
All 5 Reportswithin 6min
(80) (80) (80)
(80)
SAP
Mai 2013
27SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
Agenda
1. Positioning Parallel Data Warehouse (PDW)2. Architecture3. Working with PDW4. PDW and Big Data
SQL Server 2012 Parallel Data Warehouse Mai 2013
28

2012 © Trivadis
Introducing PolyBaseSeamless integration of two worlds
Single Query; Structured and Unstructured• Query and join Hadoop tables
with Relational Tables• Use Standard SQL language • Select, From Where
Existing SQLSkillset
Save Timeand CostsDatabase HDFS
(Hadoop)
SQL Server 2012 PDW Powered by PolyBase
SQL
Analyze AllData Types
Mai 2013
29SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
Additional Resources
SQL Server Parallel Data Warehouse (PDW) Landing Page:
www.microsoft.com/PDWIntroduction to Polybase:
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
Price/TB comparison:http://
www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf
HP QuickSpecshttp://
h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html
http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.pdf
Mai 2013
30SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis
Conclusion
SQL Server 2012 Parallel Data Warehouse is an MPP based appliance optimized for data warehouse workload Very Similar to a regular SQL Server,
but T-SQL not 100 % identical Support for development via SQL Server Data Tools
Scalable, balanced platform Not just storage (data + CPU power)
Central part of Microsoft BI stack Well integrated in Microsoft management tools
SQL Server 2012 Parallel Data Warehouse Mai 2013
31

2012 © Trivadis
Visit us, win a price and be prepared for your next adventure
SQL Server 2012 Parallel Data Warehouse Mai 2013
32

2012 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
Let‘s go.Wettbewerb