Andrea Ghiglietti Francesca Ieva Milano, September 12, 2013 · a . . . a lloottooff data, more than...

Post on 30-May-2020

2 views 0 download

Transcript of Andrea Ghiglietti Francesca Ieva Milano, September 12, 2013 · a . . . a lloottooff data, more than...

Milano, September 12, 2013 Andrea Ghiglietti

Francesca Ieva

BigBig DataData meansmeans……

ExampleExample II:: BigBig DataData andand SocialSocial NetworksNetworks

ExampleExample IIII:: BigBig DataData inin HealthCareHealthCare ContextContext

TableTable overviewoverview

ExampleExample IIII:: BigBig DataData inin HealthCareHealthCare ContextContext

�� �� �� discussiondiscussion �� �� ��

ProfileProfile MonitoringMonitoring,, ProcessProcess MonitoringMonitoring && MultichannelMultichannelDataData AnalysisAnalysis inin ManufacturingManufacturing ApplicationsApplications

�� �� �� discussiondiscussion �� �� ��

Big Data Big Data meansmeans . . . . . 3V. 3V

Volume Volume VarietyVariety VelocityVelocity

http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//

. . . a . . . a lotlot ofof

data, more data, more

than can than can bebe

easilyeasily bebe

Big Data Big Data meansmeans . . . . . 3V. 3V

easilyeasily bebe

handledhandled byby a a

single single

database, database,

computer or computer or

spreadsheetspreadsheet

http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//

. . . . . . differentdifferent

kindskinds ofof

information, information,

lackinglacking

inherentinherent

Big Data Big Data meansmeans . . . . . 3V. 3V

inherentinherent

structurestructure or or

predictablepredictable

sizesize, rate , rate ofof

arivalarival, ,

transformatiotransformatio

nn or or analysisanalysis

whenwhen

processedprocessed

http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//

. . . . . . ProcessProcess

incomingincoming

data and data and getget

answersanswers

quicklyquickly

Big Data Big Data meansmeans . . . . . 3V. 3V

quicklyquickly

enoughenough,,asas to to

notnot delaydelay

researchresearch or or

decisiondecision

makingmaking . . .. . .

Big Data Big Data meansmeans . . . . . 3V. 3V

Big Data Big Data meansmeans . . . . . more . more VsVs??

http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//

--------> FROM STATISTICIAN TO DATA SCIENTIST > FROM STATISTICIAN TO DATA SCIENTIST <<--------

Article “THE DEATH OF STATISTICIAN”

http://www.analyticbridge.com/profiles/blogs/the-death-of-the-statistician

Do the Do the statisticianstatistician rolerole havehave to to changechange??

WhichWhich skillsskills are are requiredrequired todaytoday to a data to a data scientistscientist??

Big Data Big Data meansmeans . . . . . more . more VsVs??

EstablishEstablish a a methodmethod

forfor assessingassessing the the

viabilityviability ofof informainforma--

tiontion, , regardlessregardless ofof

fieldfield typetype and and sizesize

ofof datadata

EstablishEstablish a a methodmethod

thatthat isis quickquick and and

http://www.pros.com/http://www.pros.com/bigbig--vsvs--bigbig--datadata//

thatthat isis quickquick and and

costcost--effectiveeffective

ConfirmConfirm a a variable’svariable’s

relevancerelevance beforebefore

investinginvesting in the in the

creationcreation ofof a a fullyfully

formedformed modelmodel

PREPRE--PROCESSING!!!PROCESSING!!!

Big Data Big Data meansmeans . . . . . NEW data. NEW data

Big data Big data

“proxies” “proxies”

of social of social

lifelifelifelife

Big Data Big Data meansmeans . . . . . NEW data. NEW data

Big data Big data originate from originate from Big data Big data originate from originate from

clinical practiceclinical practice

and support clinical practiceand support clinical practice

Big Data Big Data meansmeans . . . . . NEW . NEW questionsquestions

"You don't know what question you're going to answer tomorrow, and when you ask it,

you'll be relieved that you kept the data"

http://www.informationweek.com/big-data/news/big-data-analytics/big-datas-big-question-what-to-keep/240158277

“Who exactly is going to help us make sense of all this data and do we need to recruit

new people or re-train existing staff?”

http://www.itpro.co.uk/business-intelligence/20200/big-data-creates-equally-big-questions#ixzz2dpoxN6ythttp://www.itpro.co.uk/business-intelligence/20200/big-data-creates-equally-big-questions#ixzz2dpoxN6yt

CRITICAL QUESTIONS FOR BIG DATACRITICAL QUESTIONS FOR BIG DATA

Provocations for a cultural, technological, and scholarly PhenomenonProvocations for a cultural, technological, and scholarly Phenomenon

�� Big Data changes the definition of knowledge?Big Data changes the definition of knowledge?

�� Claims to objectivity and accuracy are misleading?Claims to objectivity and accuracy are misleading?

�� Bigger data are always better data?Bigger data are always better data?

�� Just because it is accessible does not make it ethical?Just because it is accessible does not make it ethical?

Big Data Big Data meansmeans . . . . . NEW . NEW questionsquestions

Ask the right QuestionsAsk the right Questions

Don’t get bogged down by Big Data

Big data is massive and messy, and it’s coming at you

fast. These characteristics pose a problem for data

storage and processing, but focusing on these factors

has resulted in a lot navel-gazing and an unnecessary

emphasis on technology.

http://www.thoughtworks.com/big-data-analytics

emphasis on technology.

It’s not about Data. It’s about Insight and Impact

The potential of Big Data is in its ability to solve

problems and provide new opportunities.

So to get the most from your Big Data investments, focus

on the questions you’d love to answer for your business.

This simple shift can transform your perspective,

changing big data from a technological problem to a

business solution.

Big Data Big Data meansmeans . . . . . NEW . NEW questionsquestions

The value of data is only The value of data is only

realisedrealised through insight.through insight.And insight is useless until it’s And insight is useless until it’s

turned into action. turned into action.

Finding the right questions will Finding the right questions will

lead you to the well.lead you to the well.

To strike upon insight, you first To strike upon insight, you first

need to know where to dig.need to know where to dig.

Big Data Big Data meansmeans . . . . . New . New answersanswers fromfrom newnew ““usus””

Data scientist, the sexiest job of 21st century

… … a a new kind of new kind of

professionalprofessional has has

emerged, the emerged, the

data scientistdata scientistdata scientistdata scientist

who who combines the combines the

skillsskills of of software software

programmer, programmer,

statistician statistician andand

storyteller/artist storyteller/artist to to

extract the nuggets of extract the nuggets of

gold hidden under gold hidden under

mountains of data. mountains of data.

ThereThere isis no no pointpoint in in

bringingbringing data data intointo the the

datawarehousedatawarehouse

withoutwithout integratingintegrating itit. .

IfIf the data the data arrivesarrives at the at the

Big Data Big Data meansmeans . . . . . New . New answersanswers fromfrom newnew ““usus””

IfIf the data the data arrivesarrives at the at the

datawarehousedatawarehouse in in anan

unintegratedunintegrated state, state, itit

cannotcannot bebe usedused to to supportsupport

a corporate a corporate viewview ofof data.data.

And a corporate And a corporate viewview ofof

data data isis the the essenceessence ofof the the

architectedarchitected environmentenvironment..

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

more than 950 million users spending on avarage

6.5 hours per month generates every day…

“If you aren’t taking advantage of big data, then you

don’t have big data, you have just a pile of data”

Is this Big Data or just a pile of data?

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

Facebook on Big Data analytics: an insider’s view

(http://www.informationweek.com/cloud-computing/platform/facebook-on-big-

data-analytics-an-inside/240150902?pgno=1)

- very interesting interview to Jay Parikh.

said Jay Parikh,

VP of infrastructure

of Facebook.

How to deal with this huge amount of data?

- a sweeping software platform

for processing and analyzing an

epic amounts of data. -

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

In terms of raw Hadoop capacity, Facebook has reached the upper limit:

the company owns the world's largest Hadoop cluster, weighing in at 100 petabytes.

..and yet, the company says,

that's not big enough!

..but Hadoop is not perfect...

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

“single point of failure” :

if a master server

overseeing the cluster

went down, the whole

cluster went down.

Facebook has solved the

problem with Corona.

Traditionally, Hadoop used

a single “job tracker” to

manage tasks across a

cluster of servers, but

Corona creates multiple

job trackers.

..but Facebook will soon outgrow this cluster!

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

It’s not possible to run

Hadoop across

geographically separate

facilities because

network packets

couldn’t travel between

the servers fast enough.

Prism replicates and

moves data wherever

it’s needed across a

vast network of

computing facilities.

• What can Facebook do with that

amount of data?

Descriptive statistics

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

• Which techniques can Facebook used?

A/B testing

• ..and what else?

The human face of Big Data

(https://www.facebook.com/FaceOfBigData)

interesting links on Facebook and Big Data you may like to explore..

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

Big Data: Facebook’s next big idea

(http://www.zdnet.com/big-data-facebooks-

next-big-idea-7000001983/)

Traditional EDW vs Big Data

(http://blog.prabasiva.com/2012/04/09/traditi

onal-edw-vs-big-data/)

Most Data isn’t big, but businesses

are wasting money pretending it is

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

(http://qz.com/81661/most-data-isnt-big-and-businesses-are-wasting-money-pretending-it-is/)

Big Data! If you don’t have it, you better get yourself some!

“If your data is little, your rivals are going to kick sand

in your face steal your girlfriend”

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

Even web giants like Facebook and Yahoo generally arn’t dealing

with big data, and the application of Google-style tools is

inappropriate.

Is more data Is more data alwaysalways better? better?

_ Big data has become a synonym for “data analysis,” which is con-

fusing and counter-productive.

_ Supersizing your data is going to cost you and may yield very little.

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

Is more data Is more data alwaysalways better? better?

if you’re looking for

correlations gathering more

data could actually hurt you.

In some cases, big data is

as likely to confuse as it is

to enlighten.

Does your business need data?

But buying into something as faddish as

the supposed importance of the size of

one’s data is the kind of thing only pointy-

ExampleExample I: I: BigDataBigData in Social in Social NetworksNetworks

one’s data is the kind of thing only pointy-

haired Dilbert bosses would do.

The important thing is gathering

the right data, not gathering some

arbitrary quantity of it.

ExampleExample II: II: BigDataBigData in in HealthCareHealthCare contextcontext

OneOne ofof the major challenge the major challenge forfor statisticsstatistics appliedapplied to to clinicalclinical practicepractice isis to to

destildestil causalcausal conclusionconclusion fromfrom observationalobservational data, data, wheneverwhenever available…available…

Long Long complexcomplex sequencessequences ofof eventsevents and and measurementsmeasurements

willwill becomebecome increasinglyincreasingly availableavailable in medicine.in medicine.

VOLUMEVOLUME VARIETYVARIETY

MainMain issuesissues::

-- EnhancingEnhancing information information availableavailable in in routinelyroutinely collectedcollected datadata

-- GainingGaining insightsinsights ofof economiceconomic burdensburdens ofof diseasesdiseases

-- CausalityCausality relationshipsrelationships betweenbetween covariatescovariates and and complexcomplex outcomesoutcomes

-- HeterogeneityHeterogeneity betweenbetween individualsindividuals ((frailtyfrailty and and riskrisk assessmentassessment))

-- ChangesChanges in in structuresstructures overover timetime

-- ProvidersProviders’ ’ profilingprofiling

VOLUMEVOLUME VARIETYVARIETY

BirthBirth

VisitsVisits and and controlscontrols

LetLet’s ’s thinkthink about…about…..

… … howhow manymany timetime duringduring youryour life YOU life YOU maymay contactcontact the National the National HealthHealth ServiceService

ExampleExample II: II: BigDataBigData in in HealthCareHealthCare contextcontext

VisitsVisits and and controlscontrols

((bloodblood, , dentaldental care, screening,…)care, screening,…)

HospitalizationsHospitalizations

((orthopedyorthopedy, …), …)

DrugsDrugs

((headacheheadache, , stomachachestomachache, …), …)

AllAll thesethese data data

are are routinelyroutinely

storedstored and and

collectedcollected in in

healthcarehealthcare

databasedatabase

… … thatthat thisthis happenshappens forfor youryour PARENTS, PARENTS, youryour FRIENDS, FRIENDS, youryour FELLOW CITIZENS , …FELLOW CITIZENS , …

forfor allall theirtheir life.life.

LetLet’s ’s thinkthink about…about…..

ExampleExample II: II: BigDataBigData in in HealthCareHealthCare contextcontext

Millions of people interact

every day with the

national health service

ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext

ThereThere isis no no pointpoint in in

bringingbringing data data intointo the the

datawarehousedatawarehouse

withoutwithout integratingintegrating itit. .

IfIf the data the data arrivesarrives at the at the datawarehousedatawarehouse in in

anan unintegratedunintegrated state, state, itit cannotcannot bebe usedused to to

supportsupport a corporate a corporate viewview ofof data.data.

And a corporate And a corporate viewview ofof data data isis the the essenceessence

ofof the the architectedarchitected environmentenvironment..

ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext

… CLINICIANS … CLINICIANS

whowho take care take care ofof people people affectedaffected byby a a diseasedisease

NowNow, , letlet’s ’s changechange perspectiveperspective and and thinkthink about…about…..

2010 MDC 01: 108199

2010 MDC 04: 101739

2010 MDC 05: 191255

2011 MDC 01: 106791

2011 MDC 04: 100680

2011 MDC 05: 186742

… HEALTHCARE … HEALTHCARE governmentgovernment, ,

whichwhich hashas to to quantifyquantify the the burdenburden ofof suchsuch diseasedisease

2010 MDC 05: 191255

2010 MDC 11: 61633

2.448.111

hospitalizations for heart failures between 2000 and 2012

1.424.106

Hospitalizations (tot.) 2010

1.398.318

Hospitalization (tot.) 2011

2011 MDC 05: 186742

2011 MDC 11: 60983

ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext

ID Date of Date of Date of Gender …..

ChronicChronic HeartHeart FailureFailure (CHF)(CHF)

ChronicChronic HeartHeart FailureFailure is a degenerative

disease of the cardiovascular system.

Starting from the AdministrativeAdministrative DatabaseDatabase of a Regional District, it

can be defined by epidemiologists and clinicians using MDCMDC codescodes

It is of interest the joint modelling of the deathdeath outcomeoutcome andand thethe

hospitalizationshospitalizations processprocess.

Hospitalizations represent the observable process

of the latent degenerative disease evolution.

Major Diagnostic Category

for identification of cases

01 - Nervous System,

04 - Respiratory System,

05 - Circulatory System

11 - Kidney

List of ICD-9-CM codes referred

15.298 patients (35.224 records)

with first admission ending in 2006 (pts with admission date = discharge date =

death date have been removed)

4 years follow up

(up to December 31st 2010)

ID Date of

admission

Date of

discharge

Date of

death

Gender …..

1 15/09/2006 31/09/2006 NA F

1 21/02/2007 23/02/2007 NA F

1 31/3/2007 04/04/2007 NA F

1 10/11/2007 18/11/2007 NA F

2 10/01/2008 15/01/2008 23/10/2009 M

2 16/06/2009 01/07/2009 23/10/2009 M

3 11/04/2008 28/04/2008 3/07/1010 F

…. … … … … …

List of ICD-9-CM codes referred

to HF has been created as the

union of codes from

“Heart failure mortality rate” by

AHRQ-IQI and from CMS-HCC

Model Category 80.

ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext

ExampleExample: :

identificationidentification ofof clinicalclinical patternspatterns ofof patientspatients affectedaffected byby chronicchronic or acute or acute heartheart diseasesdiseases

birth

X X XX

infarction

R-410 AD VC

Start of follow-up End of follow-up

SDO PH

X

time

Xinfarction

X=clinical event

birth

X X X X X

death

diabetes

R-250 E-P20F-A10AVC PS

ExampleExample II: II: BigDataBigData in in HealthCarreHealthCarre contextcontext

S1 S2 S3 S4

Hosp 1 Hosp 2 Hosp 3 Death

Hosp 1 Hosp 2Hosp 3 Hosp 4Hosp 5Hosp 6/death

Hidden Markov Process

representing the latent disease progression

generating patients’ trajectories

Record Record LinkageLinkage

EpidemiologyEpidemiology

Performance Performance AssessmentsAssessments

SurvivalSurvival AnalysisAnalysis

ContinuityContinuity ofof carecare

CostCost--EffectivenessEffectiveness analysisanalysis

MainMain IssuesIssues::

ReferencesReferences and and linkslinks

All the material will be available soon on the website

(Please, interact and send links or papers you’d like to add)

Contact us @ Polimi!

andrea.ghiglietti@polimi.it

francesca.ieva@polimi.it