[WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...

Post on 10-Aug-2015

75 views 5 download

Transcript of [WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...

Neue Methoden für Customer Lifetime Value Modellierung

Prof. Dr. Markus Löcher

How to teach an old dog new tricks

2

• Gastvorträge, Bachelor/Master Arbeiten

• Gemeinsame Drittmittel-Projekte (RTB, Data Mining, etc.), in Beantragung

• Externer Partner für Studenten-Projekte

• Neuer BIPM master degree der HWR

• Scientific Data Mining for Webtrekk Kunden

HWR <-> Webtrekk

• Multiple testing• Curse of high dimensions• Wide and tall data• Variable selection• Bias and variance• Regularization

My Teaching Goals

„Small Data“

4

• Galton, 1894• Reckless Optimization

„Small And Big Data“

5

Grippe Medikamente Grippe Erkrankungen

How can this become a “big data” problem?

„Big Data“

6

“Our database of queries contains 50 million of the most common search queries on all possible topics, without pre-filtering. Billions of queries occurred infrequently and were excluded.

“In total, we fit 450 million different models to test each of the candidate queries.”

Pitfalls of Big Data

7

• Seasonal Correlations• Nonstationarity• Google constantly changing ist algorithms

„Big Data“

8

9

10

• 40M rows, 24 cols• Tall Data?

Example: Click Through Prediction

11

• 24 columns sounds manageable• But most of these are categorical variables!• Dummy coding leads to 924 columns

(tossing out high-level vars)• Including a few selected interactions, the matrix

quickly grows to > 10,000 columns

Example: Click Through Prediction

Tall Wide Data!

12

13

14

Classification/Regression Trees

15

Classification/Regression Trees

+ Interpretability high+ Fast to build+ Automatic variable selection

- Prediction Accuracy not great- No linear relationship possible- Everything is an interaction!- No pooling of strength- High variance

16

Overfitting

17

Overfitting

• Single parameter controls the flexibility of the model

18

Back to the Basics

• SVMs, Bagging, Boosting, Random Forests

19

• p > n NOT ALLOWED IN REGRESSION!• Lots of spurious correlations• Collinearities lead to wildly varying coefficients

Bad things happens for large p

20

• How can linear models overfit? What is the model flexibility parameter?

• Number of variables!• Variable Selection avoids overfitting.

Bad things happens for large p

21

• Ridge Regression („L2“)

• „L1-Regularization“ (LASSO)

Regularization to the Rescue

22

Regularization to the Rescue

• Lasso does variable selection and shrinkage while ridge only shrinks• Now we can run regression with p > n

23

Choosing lambda

• Ten-fold cross-validation on simulated data. We have 1000 observations and 100 predictors, but the response depends on only 10 predictors.

• Ten-fold cross-validation on kaggle click data. We scored a 0.4 on the leaderboard with this method.

Elastic Net

• The elastic net for correlated variables, which uses a penalty that is part L1, part L2.

• Compromise between the ridge regression penalty (α = 0) and the lasso penalty (α = 1).

• This penalty is particularly useful in the p >> N situation, or any situation where there are many correlated predictor variables.

α=1α=0.4

1. “lasso and elastic-net regularized generalized linear models are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the glmnet package in R.”

2. „For black box prediction ensembles of decision trees have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the randomForest package in R.”

Regression Revival

Kaggle chief scientist Jeremy Howard:

• Elastic Net potentially great addition for Webtrekk‘s toolbox1. Churn Probability 2. Conversion Probability 3. Next Basket Value/ Next 30 Days Value / Lifetime Value

RTA Bidding/ Profit Margin

Outlook