[WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...

Neue Methoden für Customer Lifetime Value Modellierung

Prof. Dr. Markus Löcher

How to teach an old dog new tricks

• Gastvorträge, Bachelor/Master Arbeiten

• Gemeinsame Drittmittel-Projekte (RTB, Data Mining, etc.), in Beantragung

• Externer Partner für Studenten-Projekte

• Neuer BIPM master degree der HWR

• Scientific Data Mining for Webtrekk Kunden

HWR <-> Webtrekk

• Multiple testing• Curse of high dimensions• Wide and tall data• Variable selection• Bias and variance• Regularization

My Teaching Goals

„Small Data“

• Galton, 1894• Reckless Optimization

„Small And Big Data“

Grippe Medikamente Grippe Erkrankungen

How can this become a “big data” problem?

„Big Data“

“Our database of queries contains 50 million of the most common search queries on all possible topics, without pre-filtering. Billions of queries occurred infrequently and were excluded.

“In total, we fit 450 million different models to test each of the candidate queries.”

Pitfalls of Big Data

• Seasonal Correlations• Nonstationarity• Google constantly changing ist algorithms

„Big Data“

• 40M rows, 24 cols• Tall Data?

Example: Click Through Prediction

• 24 columns sounds manageable• But most of these are categorical variables!• Dummy coding leads to 924 columns

(tossing out high-level vars)• Including a few selected interactions, the matrix

quickly grows to > 10,000 columns

Example: Click Through Prediction

Tall Wide Data!

Classification/Regression Trees

+ Interpretability high+ Fast to build+ Automatic variable selection

- Prediction Accuracy not great- No linear relationship possible- Everything is an interaction!- No pooling of strength- High variance

Overfitting

• Single parameter controls the flexibility of the model

Back to the Basics

• SVMs, Bagging, Boosting, Random Forests

• p > n NOT ALLOWED IN REGRESSION!• Lots of spurious correlations• Collinearities lead to wildly varying coefficients

Bad things happens for large p

• How can linear models overfit? What is the model flexibility parameter?

• Number of variables!• Variable Selection avoids overfitting.

Bad things happens for large p

• Ridge Regression („L2“)

• „L1-Regularization“ (LASSO)

Regularization to the Rescue

• Lasso does variable selection and shrinkage while ridge only shrinks• Now we can run regression with p > n

Choosing lambda

• Ten-fold cross-validation on simulated data. We have 1000 observations and 100 predictors, but the response depends on only 10 predictors.

• Ten-fold cross-validation on kaggle click data. We scored a 0.4 on the leaderboard with this method.

Elastic Net

• The elastic net for correlated variables, which uses a penalty that is part L1, part L2.

• Compromise between the ridge regression penalty (α = 0) and the lasso penalty (α = 1).

• This penalty is particularly useful in the p >> N situation, or any situation where there are many correlated predictor variables.

α=1α=0.4

1. “lasso and elastic-net regularized generalized linear models are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the glmnet package in R.”

2. „For black box prediction ensembles of decision trees have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the randomForest package in R.”

Regression Revival

Kaggle chief scientist Jeremy Howard:

• Elastic Net potentially great addition for Webtrekk‘s toolbox1. Churn Probability 2. Conversion Probability 3. Next Basket Value/ Next 30 Days Value / Lifetime Value

RTA Bidding/ Profit Margin

Outlook

[WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...

Data & Analytics

Transcript of [WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...

Plate Lines Reduce Lifetime of Wake Vortices During Final ...

Ergebnisse WUC Radsport 2008 In Niederlanden · WUC Radsport 2008 In Niederlanden . Results World University Championship Cycling 2008 Cross Country Women 22 May Nijmegen The Netherlands

B The opportunity of a lifetime

Development and Prediction of Liquid Chromatographic ...

MONO/d LIFETIME EUROPE

Prediction of Human Drug Clearance and Anticipation of ......Prediction of Human Drug Clearance and Anticipation of Clinical Drug-Drug Interaction Potential from In Vitro Drug Transport

Positron Annihilation Lifetime Spectroscopy Studies of ...

[WUC Workshop Day 2015] Florian Richter, Director Implementation & Client Training, Webtrekk | Segmentieren mit dem Webtrekk URM

September MILIZ - Bundesheer02. – Fr 23. 02. 2007 Heeresversorgungsschule Kanzleidienst WUC Mo 11. 06. – Fr 29. 06. 2007 Heeresversorgungsschule Nachschub-, Feldzeug- und Geräteunteroffizier

1 Promotor Prediction Programms (PPP) Christian Ehrlich & Falko Krause Evolution eukaryontischer Promotorsequenzen.

Probabilistic Online Prediction of Robot Action Results ...ediss.sub.uni-hamburg.de/volltexte/2016/7667/pdf/Dissertation.pdf · Probabilistic Online Prediction of Robot Action Results

Prediction Markets als Instrument zur Prognose auf ...

An overview of numerical weather prediction models · implementation of operational weather forecasting, and creates the Joint Numerical Weather Prediction Unit (JNWP). The quasi-geostrophic

HART-11: Prediction of Blade-Vortex Interaction Loading

Learning Footstep Prediction For Humanoid Robots Mihai S uteu · Learning Footstep Prediction For Humanoid Robots Mihai S, uteu Supevisor: Prof. Dr. Raúl Rojas Advisor: Dr. Hamid

Profil Sonja Dirr - apricot.at · * Customer Value Steering. Profil ... * Customer Journey Design * Business Case ... * Customer Lifetime Management Optimierung

Mit Personalisierung die Kundenerwartungen übertreffen und den Customer Lifetime Value steigern

[WUC 2015] Markus Schäfer, Business Development Manager, MaTelSo | User Calling. Call Tracking für ein vollständiges Bild des Nutzers

Corporate Quality – Corporate Approvals CRN …media.automation24.com/approval/09AR-00072-20110311-CRN-OF202… · Corporate Quality – Corporate Approvals ... D-1x-x D-1x WUC-1x

Prediction of Novel Inhibitors of the Main Protease (M-pro) of … · 2020. 6. 2. · International Journal of Molecular Sciences Article Prediction of Novel Inhibitors of the Main