DMDW Lesson 03 - Data Warehouse Theory

25
STAATLICH ANERKANNTE FACHHOCHSCHULE Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 18.03.2011 STUDIEREN UND DURCHSTARTEN.

description

I'm planning to give you a detailed introduction to the concepts of the data warehouse world. We will also see why data mining and data warehouses are closely connected to each other.

Transcript of DMDW Lesson 03 - Data Warehouse Theory

Page 1: DMDW Lesson 03 - Data Warehouse Theory

STAATLICHANERKANNTEFACHHOCHSCHULE

Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter HomeisterDate: 18.03.2011

STUDIERENUND DURCHSTARTEN.

Page 2: DMDW Lesson 03 - Data Warehouse Theory

STAATLICHANERKANNTEFACHHOCHSCHULE

Data Warehouse

Author I: Dip.-Inf. (FH) Johannes HoppeAuthor II: M.Sc. Johannes HofmeisterAuthor III: Prof. Dr. Dieter Homeister Date: 18.03.2011

Page 3: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

01

3

Page 4: DMDW Lesson 03 - Data Warehouse Theory

Definition DW

4

“A data warehouse is a single source for key, corporate information needed to enable business decisions .”

Dieter Homeister (his DM Script)

Page 5: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

OLTP and DSS Defined

› An application that updates is called an on-line transaction processing (OLTP) application

› An application that issues queries to the readonly database is called a decision support system (DSS)

5

Page 6: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Stovepipe vs. Integration

› When systems stand by themselves they are often referred to as stovepipes

› Systems that easily share data are called well integrated systems

6

Page 7: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Problems with Stovepipe Architecture (1/2)

› Problems› Users who wish to access data must query several different DSS

to find it › Data may have fundamental conflicts between DSS › a department code table in one DSS may differ in another DSS › a measurement may be stored in meters in one DSS and yards in

another

7

Page 8: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Problems with Stovepipe Architecture (2/2)

› Solution:› Use a data warehouse, where data is integrated from the

several different stovepipe systems › Data warehouse is really sharing-lite -- you don’t have to

co-ordinate as much when applications are built and you still reap the benefits of data sharing

8

Page 9: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Data Warehouse Solution

› A data warehouse is an attempt to integrate separate DSS so that users can query one place to find the answers to their questions

› A data warehouse has the key, corporate data in the organization

› A data warehouse tracks historical data

9

Page 10: DMDW Lesson 03 - Data Warehouse Theory

Selling the Data Warehouse

02

10

Page 11: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Selling the Data Warehouse (1/2)

› A data warehouse project will fail without corporate sponsorship

› Preferably, the project should be sponsored by the CEO › The CEO must be sold on the value to the business to improve

competitive advantage by deploying a data warehouse

11

Page 12: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Selling the Data Warehouse (2/2)

› If an active, corporate sponsor does not exist, data sources will be very difficult to identify

› Only add data to the warehouse that will answer key, corporate questions asked by the corporate sponsor. Otherwise, you will have a data dump

12

Page 13: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Building a Useful Data Warehouse

You really need: › strong executive sponsorship › good knowledge of the data › sound software engineering › stability from source systems › users who want a success › A 75 percent failure rate is often cited

13

Page 14: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Enterprise Information System

› An EIS (Enterprise Information System) allows users to query data in a data warehouse

› Now users can access key, corporate data in the data warehouse

14

Page 15: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Users of an Enterprise Information System (1/3)

› multiple EIS (or different graphical interfaces) are needed to satisfy different types of users

General users› want a tool that provides detailed data, but is easy to use › Want access to the data warehouse to do routine tasks

such as Find me Joes phone number, etc. › Simple application, not focused on large reports

15

Page 16: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Users of an Enterprise Information System (2/3)

Executives› Want a high-level, summary data (and a simple tool)› Must be easy to use, users want to click a few buttons and

get data they want › Results must be graphs › Users should be able to drill-down into key areas.

16

Page 17: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Users of an Enterprise Information System (3/3)

Analysts› want a flexible, more detailed tool › Often very knowledgeable about the data › Willing to do more work to learn about the data › Sometimes even learn SQL to issue their own ad-hoc

queries

17

Page 18: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Need for Data Warehouses

› Data warehouses provide a single place to store key corporate data › users can go one place to find this key data using an enterprise

information system (EIS) › also a place to store and access historical data

› Users measure performance goals for their company over a period of time› Company statistics are available › Data not stored in the same place is difficult to locate and compare, easily lost › Single query can be used to access key data

18

Page 19: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Security & Data Warehouses

› Building a data warehouse does increase security risk because key, corporate information are all in one place

› Risk reduction: database system components can be used to protect the data warehouse. These include › Views › Access control › Security Administration › Encryption › Audit (logging of all accesses)

19

Page 20: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Moving Data into the Data Warehouse

› Moving data from source OLTP systems to the data warehouse is one of the hardest tasks in data warehousing

› Updates to the data warehouse are performed periodically › weekly , nightly, monthly …

› Occasionally, real-time data is needed in a data warehouse, but this is not very common

see the document about ETL, too!

20

Page 21: DMDW Lesson 03 - Data Warehouse Theory

Data Warehouse

Data Mart

› A data mart is a subset of the data warehouse that may make it simpler for users to access key corporate data

› Sometimes, users only need a piece of data from the data warehouse

› The data mart is typically fed from the data warehouse

21

Page 22: DMDW Lesson 03 - Data Warehouse Theory

References

Data Warehouse Books and References

Ralph Kimball, Margy Rossl: The Data Warehouse Toolkit, 2nd Ed., John Wiley & Sons 2002 (Lists of pitfalls, very detailed for several applications like CRM, HR, Insurances)

W. H. Inmon: Building the Data Warehouse, 3rd Ed., John Wiley & Sons 2002 (DW design, migration, techical details)

Claudia Imhoff, Nicholas Galemmo, Jonathan G. Geiger: Mastering Data Warehouse Design, John Wiley & Sons 2003 (Technical and business view, design, optimization)

Donald K. Burleson, W. H. Inmon, Joseph Hudicka: The Data Warehouse eBusiness DBA Handbook, BMC Software and DBAzine/Rampant Techpress 2003 (Available as eBook, technical details, eBusiness, focus on Oracle, DB/DW administration, tools)

22

Page 23: DMDW Lesson 03 - Data Warehouse Theory

References

Data Warehouse Books and References

Maria Sueli Almeida, Missao Ishikawa, Joerg Reinschmidt, Torsten Roeber: Getting Started with DataWarehouse and Business Intelligence, www.redbooks.ibm.com, 1999 (eBook from IBM, focus on DB2, very technical)

Mark W. Humphries, Michael W. Hawkins, Michelle C. Dy: Data Warehousing, Pearson Education, 1998 (Very technical, incl. project mgmt., architecture, hardware and parallel computing)

Chris Todman: Designing a Data Warehouse, Prentice Hall 2000 (Introduction, not very detailed)

23

Page 24: DMDW Lesson 03 - Data Warehouse Theory

THANK YOUFOR YOUR ATTENTION

24

Page 25: DMDW Lesson 03 - Data Warehouse Theory

References

Data Warehouse Books and References

David Grossman, Ophir Frieder: Introduction to Data Warehouse, Illinois Institute of Technology 2005

Dr. Andreas Geppert, Credit Suisse: Data Warehousing - Data-Warehouse-Entwurf, 2006, http://arvo.ifi.unizh.ch/dbtg/Classes/DWH/Slides/dwh-04-sl.pdf (p31: Explaination of star/snowflake/galaxy scheme, in German)

Carmela R. Balassiano: Data Warehouse Design Feb. 2007, http://academic.brooklyn.cuny.edu/cis/cbalassiano/CIS717-2%20course%20documents/week2/Data%20Warehouse%20primer.ppt (p12, p18: Explaination of star/snowflake/galaxy scheme, in English)

25