[WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...
-
Upload
webtrekk-gmbh -
Category
Data & Analytics
-
view
75 -
download
5
Transcript of [WUC 2015] Prof. Dr. Markus Löcher, Professor für Mathematik und Statistik, Hochschule für...
Neue Methoden für Customer Lifetime Value Modellierung
Prof. Dr. Markus Löcher
How to teach an old dog new tricks
2
• Gastvorträge, Bachelor/Master Arbeiten
• Gemeinsame Drittmittel-Projekte (RTB, Data Mining, etc.), in Beantragung
• Externer Partner für Studenten-Projekte
• Neuer BIPM master degree der HWR
• Scientific Data Mining for Webtrekk Kunden
HWR <-> Webtrekk
• Multiple testing• Curse of high dimensions• Wide and tall data• Variable selection• Bias and variance• Regularization
My Teaching Goals
„Small Data“
4
• Galton, 1894• Reckless Optimization
„Small And Big Data“
5
Grippe Medikamente Grippe Erkrankungen
How can this become a “big data” problem?
„Big Data“
6
“Our database of queries contains 50 million of the most common search queries on all possible topics, without pre-filtering. Billions of queries occurred infrequently and were excluded.
“In total, we fit 450 million different models to test each of the candidate queries.”
Pitfalls of Big Data
7
• Seasonal Correlations• Nonstationarity• Google constantly changing ist algorithms
„Big Data“
8
9
10
• 40M rows, 24 cols• Tall Data?
Example: Click Through Prediction
11
• 24 columns sounds manageable• But most of these are categorical variables!• Dummy coding leads to 924 columns
(tossing out high-level vars)• Including a few selected interactions, the matrix
quickly grows to > 10,000 columns
Example: Click Through Prediction
Tall Wide Data!
12
13
14
Classification/Regression Trees
15
Classification/Regression Trees
+ Interpretability high+ Fast to build+ Automatic variable selection
- Prediction Accuracy not great- No linear relationship possible- Everything is an interaction!- No pooling of strength- High variance
16
Overfitting
17
Overfitting
• Single parameter controls the flexibility of the model
18
Back to the Basics
• SVMs, Bagging, Boosting, Random Forests
19
• p > n NOT ALLOWED IN REGRESSION!• Lots of spurious correlations• Collinearities lead to wildly varying coefficients
Bad things happens for large p
20
• How can linear models overfit? What is the model flexibility parameter?
• Number of variables!• Variable Selection avoids overfitting.
Bad things happens for large p
21
• Ridge Regression („L2“)
• „L1-Regularization“ (LASSO)
Regularization to the Rescue
22
Regularization to the Rescue
• Lasso does variable selection and shrinkage while ridge only shrinks• Now we can run regression with p > n
23
Choosing lambda
• Ten-fold cross-validation on simulated data. We have 1000 observations and 100 predictors, but the response depends on only 10 predictors.
• Ten-fold cross-validation on kaggle click data. We scored a 0.4 on the leaderboard with this method.
Elastic Net
• The elastic net for correlated variables, which uses a penalty that is part L1, part L2.
• Compromise between the ridge regression penalty (α = 0) and the lasso penalty (α = 1).
• This penalty is particularly useful in the p >> N situation, or any situation where there are many correlated predictor variables.
α=1α=0.4
1. “lasso and elastic-net regularized generalized linear models are fast, work on huge data sets, and avoid over-fitting automatically. They are available in the glmnet package in R.”
2. „For black box prediction ensembles of decision trees have been the most successful general-purpose algorithm in modern times. For instance, most Kaggle competitions have at least one top entry that heavily uses this approach. This algorithm is very simple to understand, and is fast and easy to apply. It is available in the randomForest package in R.”
Regression Revival
Kaggle chief scientist Jeremy Howard:
• Elastic Net potentially great addition for Webtrekk‘s toolbox1. Churn Probability 2. Conversion Probability 3. Next Basket Value/ Next 30 Days Value / Lifetime Value
RTA Bidding/ Profit Margin
Outlook