SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

SQL Server 2012 Parallel Data Warehouse

Meinrad WeissPrincipal ConsultantTrivadis AG

http://www.msrd.io/logo/msrd-logo-96px-alpha.png

2012 © Trivadis

Trivadis solution portfolio and competences

Mai 2013SQL Server 2012 Parallel Data Warehouse

2

IT SOLUTIONS, SERVICES & PRODUCTS

TECHNOLOGIESOracle, Microsoft, IBM, Open Source

Integration, Application Performance Management, Security

InfrastructureEngineering

Application Development

ITdepartment

s

Businessdepartment

s

CUSTOMER

Business Intelligence

Business Integration Services

Training

Managed Services

2012 © Trivadis

Trivadis facts & figures

dateTrivadis – the company

3

11 Trivadis locations with more than 600 employees

Financially independent and sustainably profitable

Key figures 2011

Revenue CHF 104 / EUR 84 Mio.

Services for more than 800 clients in over 1,900 projects

200 Service Level Agreements

More than 4,000 training participants

Research and development budget: CHF 5.0 / EUR 4 Mio.

Hamburg

Dusseldorf

Frankfurt

Stuttgart

MunichFreiburg

Vienna

Basel

Bern

Zurich

Lausanne~380 employees

~200 employees

~30 employees

Mai 2013

3SQL Server 2012 Parallel Data Warehouse

2012 © Trivadis

Visit us, win a price and be prepared for your next adventure

SQL Server 2012 Parallel Data Warehouse Mai 2013

4

http://www.swisssolarcharger.com/dynapage.asp?mpage=SOLARLADEGERAETE-TABELLE

2012 © Trivadis

Agenda

1. Positioning Parallel Data Warehouse (PDW)2. Architecture3. Working with PDW4. PDW and Big Data


5

2012 © TrivadisSQL Server 2012 Parallel Data Warehouse Mai 2013

6

2012 © Trivadis

Data Warehouse – Products Positioning

1

2

Balanced solution for mostly scan-centric workloads.Max HW tune-up for most DW scenarios.

3 Most flexible architecture for handling all DW scenarios.

ScaleComplexityHA by defaultSW-HW integration

SQL Server 2012Fast Track

SQL Server 2012

SQL Server 2012PDW

1

2

3

PDW with Distributed

Data Architecture


7

2012 © Trivadis

Built For Big DataNext-generation Performance At Scale

Scale-Out Architecture on Industry Standard

Hardware

SQL Server 2012 Parallel Data Warehouse


8

2012 © Trivadis

Agenda



9

2012 © Trivadis

Parallel Data Warehouse at a glance

Shared-nothing parallel database system Massively parallel processing (MPP) A “Control” server that accepts user queries,

generates a plan, and distributes operations in parallel to compute nodes

Multiple “Compute” servers running SQL ServerDelivered as an appliance Balanced and pre-configured hardware and software Scales from 2 to 56 Nodes Fastest Time to Market


10

2012 © Trivadis

Base Unit (1/4 Rack)

All hosts and VM’s run Windows Server 2012 Standard All Fabric and workload

activity happens in Hyper-V virtual machines, with Fabric VM’s sharing 1 server

Failover is handled by Hyper-V

PDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workload

Windows Storage Spaces handles mirroring and spares

Host 1

Host 0

Host 2

Host 3

Storage Spaces

IB &Ethernet Direct attached

SAS

Control FAB AD

Compute 1

Compute 2


11

VMM

2012 © Trivadis

Current Limitations: Performance and Scale Up today

Scale Up (and pay)

Old server will be obsolete if bigger system is required

$$$ $$$ $$$

Mai 2013


? ?

2012 © Trivadis

SCALING FROM 2 TO 56 NODES

Appliance can grow with increasing workload

Provisioning consists of 3 phases:

Bare metal provisioning of new nodes

Provisioning of workload VMs and ‘hooking up’ to other workload VMs

Redistribution of data

Host 1

Host 0

Host 2

Host 3

Storage Spaces


SAS

Control FAB AD

Compute 1

Compute 2

Host 2

Host 3

Storage Spaces

Compute 1

Compute 2


13

VMM

2012 © Trivadis4 8 12 16 20 24 28 32 36 40SQL Server Compute Nodes

Sys

tem

Thr

ough

put

Regular SQL Server

( 1 Node)

Seamless Scalability

5 Nodes

10 Nodes

20 Nodes

30 Nodes

30 Nodes

Mai 2013


2012 © Trivadis

High Availability

Storage Spaces manages the physical disks in the disk enclosures

Failover: One cluster across the whole

appliance VMs are automatically

migrated on host failure Affinity and anti-affinity maps

enforce rules

Host 1

Host 0

Host 2

Host 3

Storage Spaces


SAS

Control FAB AD

Compute 1

Compute 2


15

VMM

2012 © Trivadis

Agenda

1. Positioning Parallel Data Warehouse (PDW)2. Architecture3. Working with PDW4. Introduction to Big Data5. Microsoft and Big Data


16

2012 © Trivadis

Distribution and Replication of Data: Replicate

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store DimStore Dim ID

Store NameStore MgrStore Size

Product DimProd Dim ID

Prod CategoryProd Sub CatProd Desc

MktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End

SQL

SQL

SQL

SQL

TD

PD

SD

MD

TD

PD

SD

MD

TD

PD

SD

MD

Smaller (<5GB ) Dimension Tables are Replicated on Every

Compute NodeTD

PD

SD

MD

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

Result: Fact -Dimension Joins can be performed

locally


17

2012 © Trivadis

Create Replicated TableCREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) )WITH (CLUSTERED COLUMNSTORE INDEX);

Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group.

Data Compression is automatically turned on


18

2012 © Trivadis

TD

PD

SD

MD

TD

PD

SD

MD

TD

PD

SD

MD

TD

PD

SD

MD

Distribution and Replication of Data: Distribute

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1SQL

SQL

SQL

SQL

Larger (> 10 GB) Fact Table is Hash

Distributed Across All Compute Nodes

SF-1

SF-2

SF-3

SF-4

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store DimStore Dim ID

Store NameStore MgrStore Size

Product DimProd Dim ID

Prod CategoryProd Sub CatProd Desc

MktgCampaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold


19

2012 © Trivadis

Distribution on a PDW

PDW Node 1Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h

8 Tables per Node



PDW Node …

Final Result:64 individual tables across a 8 node (1 data rack HP) appliance

CREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));

Mai 2013


2012 © Trivadis

xVelocity gives next-gen performanceLightning Fast Data Query Processing

Custom

er

Sales

Country

Supplier

Products

Columnstore Provides Dramatic Performance• Updateable and clustered xVelocity

columnstore• Stores data in columnar format• Memory-optimized for next-generation

performance• Updateable to support bulk and/or trickle

loadingSave Timeand Costs

Real-TimeDW

Up to50X Faster

Up to 15x compression

Mai 2013


2012 © Trivadis

Web-Based Management Dashboard


22

2012 © Trivadis

PDW Querying 1 Petabyte of data in 1 second

Mai 2013


294‘000‘000‘000 rows

http://www.sqlpass.org/summit/2012/DayOneKeynote.aspx


http://www.youtube.com/watch?feature=player_detailpage&v=8_EXwvNtch8


2012 © Trivadis

Reference Case: Today’s process flow / Building blocks

DB_GSAPOP

DB_MasterTables

DB_ReportTables

FinanceCube

Baseline : Once data extracted from SAP:Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies)

DW_FinanceTransactions

MasterFinance table population

6 hours 21min

6 hours 1 hour

Suspicious words

Reports

3hr21min

SAP

Mai 2013


2012 © Trivadis

Reference Case: Audit Process with PDW

DB_GSAPOP

DB_MasterTables

DB_ReportTables

FinanceCube

Once data is extracted from SAP:Creating 5 CM Reports & FSCP Finance Cube;Time taken: 30 Minutes

DW_FinanceTransactions

MasterFinance table population 8m50sec

load from FlatFile23min

10m10sec 11 min

All 5 Reportswithin 6min

(80) (80) (80)

(80)

SAP

Mai 2013


2012 © Trivadis

Introducing PolyBaseSeamless integration of two worlds

Single Query; Structured and Unstructured• Query and join Hadoop tables

with Relational Tables• Use Standard SQL language • Select, From Where

Existing SQLSkillset

Save Timeand CostsDatabase HDFS

(Hadoop)

SQL Server 2012 PDW Powered by PolyBase

SQL

Analyze AllData Types

Mai 2013


2012 © Trivadis

Additional Resources

SQL Server Parallel Data Warehouse (PDW) Landing Page:

www.microsoft.com/PDWIntroduction to Polybase:

http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx

Price/TB comparison:http://

www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf

HP QuickSpecshttp://

h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html

http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.pdf

Mai 2013


http://www.microsoft.com/PDW




http://www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf



http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html






2012 © Trivadis

Conclusion

SQL Server 2012 Parallel Data Warehouse is an MPP based appliance optimized for data warehouse workload Very Similar to a regular SQL Server,

but T-SQL not 100 % identical Support for development via SQL Server Data Tools

Scalable, balanced platform Not just storage (data + CPU power)

Central part of Microsoft BI stack Well integrated in Microsoft management tools


31

2012 © Trivadis

Visit us, win a price and be prepared for your next adventure


32

http://www.swisssolarcharger.com/dynapage.asp?mpage=SOLARLADEGERAETE-TABELLE

2012 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

Let‘s go.Wettbewerb

SQL Server 2012 Parallel Data Warehouse

Documents

Transcript of SQL Server 2012 Parallel Data Warehouse