Welcome

24
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Vorstellung Parallel Data Warehouse 1 Vorstellung Parallel Data Warehouse November 2012 Meinrad Weiss

description

Welcome. Vorstellung Parallel Data Warehouse November 2012 Meinrad Weiss. Data Warehouse – Products Positioning. PDW with Distributed Data Architecture. Minimal HW tune-up/optimization; supports mixed workloads. 1. Scale Complexity HA by default SW-HW integration. 4. - PowerPoint PPT Presentation

Transcript of Welcome

Page 1: Welcome

2012 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

Welcome

November 2012Vorstellung Parallel Data Warehouse

1

Vorstellung Parallel Data Warehouse

November 2012Meinrad Weiss

Page 2: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse2

Data Warehouse – Products Positioning

1

2

3

Minimal HW tune-up/optimization; supports mixed workloads

Balanced solution for mostly scan-centric workloads.

Max HW tune-up for most DW scenarios.

4 Most flexible architecture for handling all DW scenarios.

ScaleComplexityHA by defaultSW-HW integration

SQL Server 2008 R2Fast Track

SQL Server 2008 R2Enterprise

PDWSQL Server 2008 R2Data Center

1

2

3

4

PDW with Distributed

Data Architecture

November 2012

Page 3: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse3

Microsoft Data Warehousing Solutions

Tier 1 Offerings

Scalable and reliable platform for data

warehousing on any hardware

Reference Architectures offering best price

performance for data warehousing

Scalable and reliable platform for data

warehousing on any hardware

Appliance for high-end data warehousing requiring highest

scalability, performance, or complexity

Ideal for data marts or small to mid-sized EDWs

Ideal for data marts or small to mid-sized DWs

with scan- centric workloads

Ideal for large data marts or mid-sized EDWs

Offers flexibility in hardware and architecture

Software only Reference Architectures (software and hardware)

Software onlyDW appliance

(fully integrated software and hardware)

Scale-up DW Scale-up DW Scale-up DW Scale-out DW with MPP

10s of TB 2 – 80 TB 10s of TB 10s - 100s of TB

November 2012

Page 4: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse4

Data Warehouse – Products Positioning

100% SQL Server 2008 R2 Compatibility

ScaleComplexityHA by defaultSW-HW integration

SQL Server 2008 R2with Fast Track

Reference Architecture

SQL Server 2008 R2Enterprise

PDWSQL Server 2008 R2Data Center

PDW with Distributed Data

Architecture

November 2012

Page 5: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

MPP vs. SMP

November 2012

5

MPP - Massively Parallel Processing Uses many separate CPUs running in parallel to execute a single

program

Each CPU has its own memory and disks

High-speed communications between nodes

Applications must be segmented

SMP MPP SMP - Symmetric Multiprocessing

Multiple CPUs used to complete individual processes simultaneously

All CPUs share the same memory, disks, and network controllers

All SQL Server implementations up until now have been SMP

Page 6: Welcome

2012 © Trivadis

Two hardware vendors: HP and Dell

November 2012Vorstellung Parallel Data Warehouse

6

Microsoft+DellParallel Data Warehouse Appliance

Microsoft+HP Enterprise Data

Warehouse Appliance

Page 7: Welcome

2012 © Trivadis

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Control Node

Management Node

Landing Zone

Backup Node

Control Rack Data Rack(s)

November 2012

7Vorstellung Parallel Data Warehouse

Page 8: Welcome

2012 © Trivadis

SQL

Control Node

Management Node

Landing Zone

Backup Node

Control Rack Data Rack(s)

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Client connections always go through the control node

Windows Failover Cluster for Availability

Contains no persistent user data Processes SQL requests

Prepares execution plan

Orchestrates distributed execution

Local SQL Server processes final query plan and aggregates results

November 2012

8Vorstellung Parallel Data Warehouse

Page 9: Welcome

2012 © Trivadis

SQL

Control Node

Management Node

Landing Zone

Backup Node

Control Rack Data Rack(s)

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Provides Support and Patching for the Appliance

Holds image for re-deployment of compute node

Holds Active Directory

November 2012

9Vorstellung Parallel Data Warehouse

Page 10: Welcome

2012 © Trivadis

SQL

Control Node

Management Node

Landing Zone

Backup Node

Control Rack Data Rack(s)

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Provides high-capacity storage for data files from ETL processes

Is available as a sandbox for other applications and scripts that run on the internal network

Provides SQL Server Integration Services

SourceLanding

Zone Files

Data Loader

Compute Nodes

DWLoader or SQL Server Integration

Services

November 2012

10Vorstellung Parallel Data Warehouse

Page 11: Welcome

2012 © Trivadis

SQL

Control Node

Management Node

Landing Zone

Backup Node

Control Rack Data Rack(s)

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL Provides Integrated Backup Solution

Integrates with 3rd party backup products

Orderable in different sizesNovember 2012

11Vorstellung Parallel Data Warehouse

Page 12: Welcome

2012 © Trivadis

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Control Node

Management Node

Landing Zone

Backup Node

Control Rack Data Rack(s)

SQL

Data Rack Servers 5/10 active + 1 passive per Rack

InfiniBand, FC and Ethernet switching

Expansion Grow from 1/2–4 data racks, storage options, test/dev system

Consists of COMPUTE NODES and STORAGE NODES

Shared Nothing

Spare Node provides failover in case of node failure

November 2012

12Vorstellung Parallel Data Warehouse

Page 13: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Connectivity and Tools

Nexus Query Chameleon

DWSQL

November 2012

13

Page 14: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Creating a Database

CREATE DATABASE PDW WITH (AUTOGROW = ON, REPLICATED_SIZE = 1024 GB, -- (per Node) DISTRIBUTED_SIZE = 16384 GB, -- (whole System) LOG_SIZE = 1024 GB);

November 2012

14

Page 15: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Distribution and Replication of Data: Replicate

November 2012

15

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store Dim

Store Dim IDStore NameStore MgrStore Size

Product Dim

Prod Dim IDProd CategoryProd Sub CatProd Desc

MktgCampaign Dim

Mktg Camp IDCamp NameCamp MgrCamp StartCamp End

SQL

SQL

SQL

SQL

TD

PD

SD

MD

TD

PD

SD

MD

TD

PD

SD

MD

Smaller (<5GB ) Dimension Tables are Replicated on Every

Compute Node

TD

PD

SD

MD

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-2

SF-3

SF-4

Result: Fact -Dimension Joins can be performed

locally

Page 16: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Create Replicated Table

November 2012

16

CREATE TABLE DimProduct(ProductId BIGINT NOT NULL,Description VARCHAR(50),CategoryId INT NOT NULL,ListPrice DECIMAL(12,2))

WITH (DISTRIBUTION = REPLICATE);

Creates tables on each of the individual compute nodes and

assigns them to the REPLICATED file group.

Data Compression is automatically turned on

Page 17: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Distribution and Replication of Data: Distribute

November 2012

17

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

SF-1

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold

SQL

SQL

SQL

SQL

Larger (> 10 GB) Fact Table is Hash

Distributed Across All Compute Nodes

SF-1

SF-2

SF-3

SF-4

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store Dim

Store Dim IDStore NameStore MgrStore Size

Product Dim

Prod Dim IDProd CategoryProd Sub CatProd Desc

MktgCampaign Dim

Mktg Camp IDCamp NameCamp MgrCamp StartCamp End

Page 18: Welcome

2012 © Trivadis

Vorstellung Parallel Data WarehouseNovember 2012

Distribution on a PDW

PDW Node 1Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h

8 Tables per Node

PDW Node 2Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h

PDW Node 10Create Table <myTable GUID>_aCreate Table <myTable GUID>_b…Create Table <myTable GUID>_h

PDW Node …

Final Result:80 individual tables across a 10 node (1 data rack) appliance

CREATE TABLE myTable (column Defs)WITH (DISTRIBUTION = HASH (id));

18

Page 19: Welcome

2012 © Trivadis

Reference Case: Today’s process flow / Building blocks

DB_GSAPOP

DB_MasterTables

DB_ReportTables

FinanceCube

Baseline : Once data extracted from SAP:Time taken to create end-end Reports and Cubes insights 13+ hours (In production typical 20+ hours with multiple companies)

DW_FinanceTransactions

MasterFinance table population

6 hours 21min

6 hours 1 hour

Suspicious words

Reports

3hr21min

SAP

Page 20: Welcome

2012 © Trivadis

Reference Case: Audit Process with PDW

DB_GSAPOP

DB_MasterTables

DB_ReportTables

FinanceCube

Once data is extracted from SAP:Creating 5 CM Reports & FSCP Finance Cube;Time taken: 30 Minutes

DW_FinanceTransactions

MasterFinance table population 8m50sec

load from FlatFile23min

10m10sec 11 min

All 5 Reportswithin 6min

(80) (80) (80)

(80)

SAP

Page 21: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Appliance Update AU3

November 2012

21

Performance – up to 10x improvement Data Movement Services New cost based Query Optimizer New Data Movement Service

1/2 rack appliances from HP and Dell

System Center 2012 Integration (SCOM pack)

And YES … Support for Stored Procedures (subset)

Collations: Full support for international data

Native SQL Server drivers

Page 22: Welcome

2012 © Trivadis

Vorstellung Parallel Data Warehouse

Landing Zone

ETL Tools

Hub and SpokeDepartmental

Reporting

RegionalReporting

High-Performance ReportingCentral EDW Hub

RegionalReporting with

Business Decision

Appliance

Third-Party

RDBMS

Third-PartyData

Integration

Mobile Applicatio

ns

November 2012

22

Page 23: Welcome

2012 © Trivadis4 8 12 16 20 24 28 32 36 40

SQL Server Compute Nodes

Sys

tem

Thr

ough

put

Regular SQL Server

( 1 Node)

Seamless Scalability

Half Rack PDW

( 5 Nodes)

Full Rack PDW

( 10 Nodes)

2 Rack PDW( 20 Nodes)

3 Rack PDW( 30 Nodes)

4 Rack PDW( 30 Nodes)

November 2012

25Vorstellung Parallel Data Warehouse

Page 24: Welcome

2012 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN

Let‘s go.

November 2012Vorstellung Parallel Data Warehouse

26

Wettbewerb