Intelligent Data Lake mit Informatica Teil 2/3
-
Upload
mhp-a-porsche-company -
Category
Technology
-
view
155 -
download
0
Transcript of Intelligent Data Lake mit Informatica Teil 2/3
© 2015 Mieschke Hofmann und Partner Gesellschaft für Management- und IT-Beratung mbH
Big Data – vom Datensumpf zum „Intelligent Data Lake“ (IDL)
Intelligent Data Lake mit Informatica Teil 2/3
Sascha Dorner & Sören Eickhoff | MHPBoxenstopp: 14.02.2017
© 2017 MHP – A Porsche Company
© 2017 MHP – A Porsche Company 2
21.02.2017 SAP Solution Manager 7.2 – Verwendung in der Anforderungsanalyse von
Requirements Management Rollout Projekten
07.03.2017 Data Governance mit Informatica Teil 3/3 Anforderungen und Chancen für den Einsatz moderner
Datenwerkzeuge
21.03.2017 Mobilität im urbanen Raum von Morgen Herausforderungen an die Smart City
Weitere MHPBoxenstopps www.mhp.com/events
Fahrplan Zu Beginn sind alle Teilnehmer auf stumm geschalten.
1. MHPBoxenstopp Vortrag Sascha Dorner, Sören Eickhoff (Informatica)
2. Pressekonferenz (Fragen & Antworten) Sie können bereits während der Web Session über die
Chatfunktion im rechten Fenster Fragen einreichen.
www.youtube.de/MHPProzesslieferant www.slideshare.net/MHPInsights
MHPBoxenstopp verpasst? Alle vergangenen MHPBoxenstopps finden Sie hier:
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
© 2017 MHP – A Porsche Company 3
Sascha Dorner
Manager
BIG DATA & IoT Technologies
Consulting (MHP)
Produktentwicklung KIS
Dipl. Informatiker (FH)
Manager (MHP)
Fahrerprofil
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
Sören Eickhoff (Informatica GmbH)
Sales Consultant Big Data
Service Management Solution Architect (IBM)
Senior Technical Sales Professional (IBM)
Dipl. Wirtschaftsinformatiker
Sales Consultant Big Data Management
© 2017 MHP – A Porsche Company 4
1. Everybody talks Big Data…
2. The Lake – Solution Informatica Data Lake (IDL)
3. Use Cases: Data Lake
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
© 2017 MHP – A Porsche Company 5
1. Everybody talks Big Data…
2. The Lake – Solution Informatica Data Lake (IDL)
3. Use Cases: Data Lake
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
© 2017 MHP – A Porsche Company 6
#1 in 6 Data Categories …
1. Everybody talks Big Data…
Big Data
Management
Cloud Data
Management
Data
Integration
Master Data
Management
Data
Quality
Data
Security
© 2017 MHP – A Porsche Company 7
Big Data Related Business Initiatives
1. Everybody talks Big Data…
• Fraud Detection
• Risk & Portfolio Analysis
• Investment
Recommendations
• Customer analytics
Financial Services
• Proactive Customer
Engagement
• Location Based
Services
Retail & Telco Media & Entertainment
• Online & In-Game Behavior
• Customer X/Up-Sell
• Connected Vehicle
• Predictive Maintenance
Manufacturing
• Predicting Patient Outcomes
• Total Cost of Care
• Drug Discovery
Healthcare & Pharma • Health Insurance Exchanges
• Public Safety
• Tax Optimization
• Fraud Detection
Public Sector
© 2017 MHP – A Porsche Company 8
Big Data Journey in Phases
1. Everybody talks Big Data…
Machine Device,
Cloud
Documents and
Emails
Relational,
Mainframe
Social Media,
Web Logs
Dri
ven
by IT
D
rive
n b
y B
usi
ness
Data Warehouse
Optimization
Lower infrastructure
costs
Data Discovery
& Analytics
Discover new
insights to drive
business value
Real-Time
Operational
Intelligence
Manage data assets
for new & better
services
Lower Infrastructure Cost Added Business Value
First Pilot(s)
Prove out initial
use-cases
Intelligent Data Lake
Increase
Customer
Loyalty
Reduce
Security Risk
Improve
Predictive
Maintaince
Increase
Operational
Efficiency
© 2017 MHP – A Porsche Company 9
Use Case: Data Lake / Data Platform Reference Architecture
1. Everybody talks Big Data…
Machine Device,
Cloud
Documents and
Emails
Relational,
Mainframe
Social Media, Web
Logs
Data Lake
Landing Zone Structured and unstructured enterprise and external data is landed in its raw
form, normalized and ready for use
Discovery Zone User sandbox for self-serve access to data for exploration, data blending,
hypothesis testing, analytics, and collaboration
Production Zone Sanitized transactional, master, and reference data & enriched data models
certified for enterprise use
Data Platform
Data Modeler Data Scientist Data Analyst Data Steward Data Engineer Business Increase
Customer
Loyalty
Improve Fraud
Detection
Reduce
Security Risk
Improve
Predictive
Maintenance
Increase
Operational
Efficiency
© 2017 MHP – A Porsche Company 10
Challenges Faced by the Business and IT Today
1. Everybody talks Big Data…
Can’t easily find trusted data
Limited access to the data
Frustrated by slow response from IT due to
long backlog
Constrained by disparate desktop tools,
manual steps
No way to collaborate, share, and update
curated datasets
Can’t cope with growing demand from the
business
No visibility into what the business is
doing with the data
Struggling to deliver value to the business
Loosing the ability to govern and manage
data as an asset
IT Data Analysts
© 2017 MHP – A Porsche Company 11
1. Everybody talks Big Data…
2. The Lake – Solution Informatica Data Lake (IDL)
3. Use Cases: Data Lake
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
© 2017 MHP – A Porsche Company 12
Informatica Data Lake Management
2. The Lake – Solution Informatica Data Lake (IDL)
Data Lake Management
Enterprise Information
Catalog
Intelligent
Data Lake
Secure@Source
TITAN Blaze
Big Data
Management
Intelligent
Streaming
Live Data Map
(metadata integration)
Big Data Management
(data integration)
Data Architect /
Steward Data Scientist / Analyst InfoSec Analyst Data Engineer Data Engineer
© 2017 MHP – A Porsche Company 13
Enterprise Information Catalog
2. The Lake – Solution Informatica Data Lake (IDL)
Unified view into enterprise information assets
• Business-user oriented solution
• Semantic search with dynamic facets
• Detailed Lineage and Impact Analysis
• Business Glossary Integration
• Relationships discovery
• High level data profiling
• Automatic Classifications with Data domains
• Business classifications with Custom Attributes
• Broad metadata source connectivity
• Big data scale
© 2017 MHP – A Porsche Company 14
Intelligent Data Lake
2. The Lake – Solution Informatica Data Lake (IDL)
Self-service data preparation with collaborative data governance
Collaborative project workspaces Automated data ingestion Search data asset catalog Rapid blend of datasets Crowd-sourced data asset, tagging & data
sharing Automated data asset discovery &
Recommendations Rapid ‘industrialization’ of preparation steps
into re-usable workflows Complete tracking of usage, lineage, and
security Easily support Data Discovery Platforms
© 2017 MHP – A Porsche Company 15
Big Data Management
2. The Lake – Solution Informatica Data Lake (IDL)
Easily integrate more data faster from more data sources
Visual development interface accelerates
developer productivity
Near universal data connectivity
Complex data parsing on Hadoop
Data profiling on Hadoop
High-speed data ingestion and extraction
Process and deliver data at scale on Hadoop
Dynamic schemas and mapping templates
Data Quality and Data Governance on
Hadoop
Smart Executor
Informatica Big Data Management
Informatica Data
Transformation
Engine on
dedicated DI
servers
Data
Connectivity
Data
Connectivity Data
Connectivity
Data
Connectivity
Data
Connectivity
© 2017 MHP – A Porsche Company 16
Informatica Intelligent Streaming
2. The Lake – Solution Informatica Data Lake (IDL)
Collect, ingest and process data in realtime and streaming
Streaming analytics capability into the
Intelligent Data Platform
Unified UI with multiple engines underneath
the covers
Frictionless integration conversion/extension
of batch mappings into streaming context
Abstracted from runtime framework
© 2017 MHP – A Porsche Company 17
Intelligent Data Lake
2. The Lake – Solution Informatica Data Lake (IDL)
Data Analyst / Scientist
Prepare & Publish
Search & Discover
Share and Collaborate
Who?
© 2017 MHP – A Porsche Company 18
Intelligent Data Lake - Terminology
2. The Lake – Solution Informatica Data Lake (IDL)
Data Asset
• Data you work with as a unit
Project
• A project contains
data assets and worksheets.
Recipe
• The steps taken to prepare
data in a worksheet.
Data Preparation
• The process of combining, cleansing,
transforming, and structuring data from one
or more data assets so that it is ready For
analysis.
Data Publication
• the process of making prepared
data available in the data lake
© 2017 MHP – A Porsche Company 19
Search and Discovery - Data discovery through a powerful search engine to
find relevant data
2. The Lake – Solution Informatica Data Lake (IDL)
Semantic
search
Fact filtering by
asset, resource
Type, latest , size,
custom
attributes…
© 2017 MHP – A Porsche Company 20
Data Asset Overview - Overview with asset attributes and integrated profiling
stats
2. The Lake – Solution Informatica Data Lake (IDL)
Add data asset To
Project from any
exploration views
Column profiling stats
including
Null/Unique/Duplicate
percentages, Inferred
data types and data
domains.
Details stats include
value and pattern
distributions
Asset attributes
enriched by users
to add business
context
Asset attributes
collected from the
source system
© 2017 MHP – A Porsche Company 21
Data Lineage - Interactively trace data origin through summarized lineage
views for analysts
2. The Lake – Solution Informatica Data Lake (IDL)
Use Lineage and Impact Sliders to drill
down to desired lineage levels on either
side of the seed object.
© 2017 MHP – A Porsche Company 22
Relationship View - Shows ecosystem of the asset in the enterprise based on
association to other assets
2. The Lake – Solution Informatica Data Lake (IDL)
Get a 360 Degree View of
data asset using the
relationship view. Includes
related tables, views,
domains and reports,
users etc.
Ability to Zoom,
find specific
assets in the view
and filter by asset
types
Expand
relationship
circles to get
more details on
relationship
types and
objects.
© 2017 MHP – A Porsche Company 23
Data Preparation continued… - Excel-based data preparation on Sample data
2. The Lake – Solution Informatica Data Lake (IDL)
Advanced
functionality such
as Join, Merge,
Aggregate, Filter,
Sort etc.
New values are
calculated and
shown right away
Large number of
functions
available for all
types of data
string, numeric,
date, statistical,
Math etc.
New
formula
definition
with type-
ahead
© 2017 MHP – A Porsche Company 24
Data Preparation continued… - Excel-based data preparation on Sample data
2. The Lake – Solution Informatica Data Lake (IDL)
Column level Suggestions
Data
preparation
steps
captured as
“Recipe”
Column value
distributions
Column level
summary
© 2017 MHP – A Porsche Company 25
Data Publication - Execution of data preparation steps on actual data using
Infa mapping
2. The Lake – Solution Informatica Data Lake (IDL)
Publish the output of data
preparation steps back to
the lake
Users credentials are used
to access the underlying
database.
Recipe steps are translated
into Informatica mapping
Informatica mapping is
handed over to BDM
platform for execution on
actual data sources
BDM platform uses either
Map/Reduce, Blaze or
Spark to execute the
mapping
Mapping is available to
the ETL specialists to open
in Informatica Developer
tool to operationalize
© 2017 MHP – A Porsche Company 26
1. Everybody talks Big Data…
2. The Lake – Solution Informatica Data Lake (IDL)
3. Use Cases: Data Lake
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
© 2017 MHP – A Porsche Company 30
Use Cases: Data Lake
3. Use Cases: Data Lake
Machine Device, Cloud
Documents
and Emails
Relational,
Mainframe
Social Media,
Web Logs
Dashboards
& Mobile
Apps
1. Load or archive
batch data
2. Replicate
change data
3. Stream real-time
data
4. Discover & profile data
Visualization &
Analytics 5. Mask
sensitive data 6. Govern &
metadata
7. Prepare data for analysis –
curate data
Data Integration
Hub
8. Subscribe to
datasets
© 2017 MHP – A Porsche Company 31
Organizations need ONE solution that helps them…
3. Use Cases: Data Lake
Easily Find &
Catalog Data &
Discover
Relationships
Rapidly Prepare &
Share Data Exactly
When it is Needed
Get instant Access
to Trusted & Secure
Data for Advanced
Analytics
Ingest, Cleanse, Integrate & protect data at scale
34 © 2017 MHP – A Porsche Company
MHPBoxenstopp: Intelligent Data Lake mit Informatica Teil 2/3
Gesellschaft für Management- und IT-Beratung mbH
MHP – A PORSCHE COMPANY
Sascha Dorner
Manager
Business Intelligence
Mobil: +4915120301647
E-Mail: [email protected]
© 2017 MHP – A Porsche Company 35
MHPBoxenstopp SAP Solution
Manager 7.2
21.02.17 – 11:00Uhr
MHPBoxenstopp
Data Governance
mit Informatica
Teil 3
07.03.17 – 11:00 Uhr
MHPBoxenstopp
Mobilität im
urbanen Raum von
Morgen
21.03.17 – 11:00 Uhr
MHPBoxenstopp MHPBoxenstopp verpasst? Kein Problem!
Mitschnitte und Videos:
http://www.youtube.com/MHPProzesslieferant
Präsentationsunterlagen:
http://de.slideshare.net/MHPInsights
MHPBoxenstopp: Timetable 2017