Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms...
Transcript of Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms...
![Page 1: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/1.jpg)
Approximation Algorithms for Stochastic Optimization Kamesh Munagala Duke University
![Page 2: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/2.jpg)
Markov Decision Process • Set S of states of the system • Set A of actions
• If action a taken in state s: � Reward Ra(s) � System transitions to state q with probability pa(s,q)
s Action a
q
Reward = Ra(s)
![Page 3: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/3.jpg)
Markov Decision Process • Set S of states of the system • Set A of actions
• If action a taken in state s: � Reward Ra(s) drawn from known distributions � System transitions to state q with probability pa(s,q)
• Input: � Rewards and state transition matrices for each action � Start state s � Time horizon T
![Page 4: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/4.jpg)
Policy for an MDP
• Maximize expected reward over T steps � Expectation over stochastic nature of rewards and
state transitions
• Policy: Mapping from states S to actions A � Specifies optimal action for each observed state
• Dynamic Programming [Bellman ‘54]
� Optimal policy computable in time poly(|S|,|A|,T)
![Page 5: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/5.jpg)
This talk
• For many problems: � |S| is exponentially large in problem parameters … or |A| is exponentially large � Many examples to follow
• Simpler decision policies? � Approximately optimal in a provable sense � Efficient to compute and execute
![Page 6: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/6.jpg)
Talk Overview
![Page 7: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/7.jpg)
Classes of Decision Problems Stochastic Optimization
Covering/Ordering Problems
Scheduling Problems
Set Cover Variants
Multi-stage Optimization
Knapsack, Matchings,
Bandits
Machine Scheduling
Bayesian Auctions
Inventory Management
![Page 8: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/8.jpg)
Classes of Decision Problems Stochastic Optimization
Covering/Ordering Problems
Scheduling Problems
Set Cover Variants
Multi-stage Optimization
Knapsack, Matchings,
Bandits
Machine Scheduling
Bayesian Auctions
Inventory Management
Linear Programming Relaxations!
![Page 9: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/9.jpg)
Part 1. Maximum Value Problem • Really simple decision problem
� Illustrate basic concepts � Adaptive vs. Non-adaptive policies
• Non-adaptive policies � Submodularity and the Greedy algorithm
• Adaptive policies � LP Relaxation and “Weak Coupling” � Rounding using Markov’s Inequality
• Duality � Simple structure of LP optimum � Gap between adaptive and non-adaptive policies
![Page 10: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/10.jpg)
Part 2. Weakly Coupled LPs
• General technique via LP and Duality � LP relaxation has very few constraints � Dual yields infeasible policies with simple structure
• Examples � Stochastic knapsack � Stochastic matching � Bayesian multi-item pricing
![Page 11: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/11.jpg)
Part 3. Sampling Scenarios • Exponential sized LP over all possible “scenarios” of
underlying distributions
• Solve LP or its Lagrangian by sampling the scenarios
• Examples: � 2-stage vertex cover � Stochastic Steiner trees � Bayesian auctions � Solving LPs online
![Page 12: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/12.jpg)
Part 4. Stochastic Scheduling
• New aspect of timing the actions
• Two techniques: ▫ Stronger LP relaxations than weak coupling � Stochastic scheduling on identical machines � Stochastic knapsack (not covered)
▫ Greedy policies � Gittins index theorem
![Page 13: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/13.jpg)
Important Disclaimer
By no means is this comprehensive!
![Page 14: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/14.jpg)
Part 1. The Maximum Value Problem [Guha, Munagala ’07, ’09, Dean, Goemans, Vondrak ’04]
![Page 15: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/15.jpg)
The Maximum Value Problem
• There is a gambler who is shown n boxes
▫ Box j has reward drawn from distribution Xj
▫ Gambler knows Xj but box is closed
▫ All distributions are independent
![Page 16: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/16.jpg)
The Maximum Value Problem
X2 X3 X4 X5 X1
• Gambler knows all the distributions
• Distributions are independent
![Page 17: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/17.jpg)
The Maximum Value Problem
X1 X3 X4 X5 20
Open some box, say Box 2
![Page 18: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/18.jpg)
The Maximum Value Problem Open another box based on observing X2 = 20
Can open at most k boxes: • Payoff = Maximum reward observed in these k boxes Adaptivity: • Gambler can choose next box to open based on observations so far
X1 X3 X4 X5 20
![Page 19: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/19.jpg)
Example: Bernoulli Boxes
X1
X2
50 with probability ½
60 with probability 1/3
X3 25 with probability 1
Gambler can open k = 2 boxes
![Page 20: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/20.jpg)
Optimal Decision Policy
X1
0 with prob ½
X3 has expected payoff 25
X2 has expected payoff 60/3 = 20
X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)
![Page 21: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/21.jpg)
Optimal Decision Policy
X1
0 with prob ½
X3
25
X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)
![Page 22: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/22.jpg)
Optimal Decision Policy
X1
0 with prob ½ 50 with prob ½
X3
25
Guaranteed payoff = 50 So it is pointless to open X3
X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)
![Page 23: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/23.jpg)
Optimal Decision Policy
X1
0 with prob ½ 50 with prob ½
X3
25
X2
2/3 1/3
50 60
Guaranteed payoff of 50
X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)
![Page 24: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/24.jpg)
Optimal Decision Policy
X1
0 with prob ½ 50 with prob ½
X3
25
X2
2/3 1/3
50 60
Guaranteed payoff of 50
Expected Payoff = 25/2 + 50/3 + 60/6 = 39.167
X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1)
![Page 25: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/25.jpg)
Can Gambler be Non-adaptive? • Choose k boxes upfront before opening them
� Open these boxes and obtain maximum value
• Best solution = Pick X1 and X3 upfront
� Payoff = ½ ×50 + ½ ×25 = 37.5 < 39.167
� Adaptively choosing next box after opening X1 is better!
![Page 26: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/26.jpg)
Can Gambler be Non-adaptive? • Choose k boxes upfront before opening them
� Open these boxes and obtain maximum value
• Best solution = Pick X1 and X3 upfront
� Payoff = ½ ×50 + ½ ×25 = 37.5 < 39.167
� Adaptively choosing next box after opening X1 is better!
� Subtler point: It’s not that much better…
![Page 27: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/27.jpg)
Benchmark
• Value of optimal decision policy (decision tree) � Call this value OPT � Optimal decision tree can have size exponential in k
• Can we design a: � Polynomial time algorithm � … that produces poly-sized decision tree � … that approximates OPT?
![Page 28: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/28.jpg)
Outline for Part 1
• Approximation algorithms for Maximum Value � Non-adaptive policy � Linear programming relaxation � Duality and “adaptivity gap”
▫ Please ignore the constant factors!
• Later on: “Weakly coupled” decision systems � Applications to matching, pricing, scheduling, …
![Page 29: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/29.jpg)
Non-adaptive Algorithm Submodularity [Kempe, Kleinberg, Tardos ’03, …]
![Page 30: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/30.jpg)
Non-adaptive Problem
• For any subset S of boxes, if gambler opens S non-adaptively, the payoff observed is
• Goal: � Find S such that |S| ≤ k � Maximize f(S)
f(S) = E
max
i2SXi
�
![Page 31: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/31.jpg)
Submodularity of Set Functions
S1 S1 S2 t
f (S1 [ {t})� f (S1) � f (S2 [ {t})� f (S2)
Also need non-negativity and monotonicity: f(S2) � f(S1) � 0
![Page 32: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/32.jpg)
The Greedy Algorithm
S �
While |S| ≤ k : t argmaxq/2S (f(S [ {q})� f(S))
S S [ {t}
Output S
![Page 33: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/33.jpg)
Classical Result [Nemhauser, Wolsey, Fisher ‘78]
• Greedy is a 1 – 1/e ≈ 0.632 approximation to the value of the optimal subset of size k
• Similar results hold even when: � Different elements have different costs and there is a
budget on total cost of chosen set S � General matroid constraints on chosen set S
![Page 34: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/34.jpg)
Maximum Value is Submodular • Let D = Joint distribution of X1, X2, …, Xn
• Consider any sample r drawn from D � Yields a sample of values v1r, v2r, ..., vnr � Let � Easy to check this is submodular
• f(S) is the expectation over samples r of f(S,r) � Submodularity preserved under taking expectation!
• Note: Do not need independence of variables!
f(S, r) = max
i2Svir
![Page 35: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/35.jpg)
More things that are Submodular • Payoff from many opened boxes [Guha, Munagala ‘07]
f(S) = E
"max
~x2[0,1]n;P
i2S sixiB
X
i2T
Xi
#
![Page 36: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/36.jpg)
More things that are Submodular • Payoff from many opened boxes [Guha, Munagala ‘07]
• Payoff = Minimizing the minimum value [Goel, Guha, Munagala ‘06]
f(S) = � logE
min
i2SXi
�
f(S) = E
"max
~x2[0,1]n;P
i2S sixiB
X
i2T
Xi
#
![Page 37: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/37.jpg)
More things that are Submodular • Payoff from many opened boxes [Guha, Munagala ‘07]
• Payoff = Minimizing the minimum value [Goel, Guha, Munagala ‘06]
• Spread of epidemic with seed set S [Kempe, Kleinberg, Tardos ‘03]
• Discrete entropy of joint distribution of S [Krause, Guestrin ‘05]
f(S) = � logE
min
i2SXi
�
f(S) = E
"max
~x2[0,1]n;P
i2S sixiB
X
i2T
Xi
#
![Page 38: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/38.jpg)
Adaptive Algorithms Linear Programming [Dean, Goemans, Vondrak ’04; Guha, Munagala ‘07]
![Page 39: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/39.jpg)
Linear Programming
Consider optimal decision policy � Adaptively opens at most k boxes � Obtains payoff from one opened box
yj = Pr[Box j is opened]
zjv = Pr[Policy’s payo↵ is from box j^ Xj = v]
![Page 40: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/40.jpg)
Example from before…
X1
0 with prob ½ 50 with prob ½
X3
25
X2
2/3 1/3
50 60
X1 = B(50,1/2) X2 = B(60,1/3) X3 = B(25, 1) y1 = 1
y2 = ½ y3 = ½
z1,50 = 1/3 z2,60 = 1/6 z3,25 = ½
![Page 41: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/41.jpg)
Basic Idea • LP captures behavior of policy
� Use yj and zjv as the variables
• These variables are insufficient to capture entire structure of optimal policy
� What we end up with will be a relaxation
• Steps: � Understand structure of relaxation � Convert solution to a feasible policy for gambler � Bound the adaptivity gap
![Page 42: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/42.jpg)
Constraints
Let Z = Identity of box from which payoff is finally obtained
zjv = Pr[Z = j ^Xj = v]
![Page 43: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/43.jpg)
Constraints
Let Z = Identity of box from which payoff is finally obtained
zjv = Pr[Z = j ^Xj = v]
For this event to happen, the following events must have happened: • Box j was opened by the policy • Box j has value Xj = v
![Page 44: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/44.jpg)
Constraints
Let Z = Identity of box from which payoff is finally obtained
zjv = Pr[Z = j ^Xj = v]
For this event to happen, the following events must have happened: • Box j was opened by the policy • Box j has value Xj = v
These two events are independent since all the X’s are independent!
![Page 45: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/45.jpg)
Constraints
zjv = Pr[Z = j ^Xj = v]
Pr[Box j opened]⇥ Pr[Xj = v]
= yj ⇥ fj(v) Use independence here
![Page 46: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/46.jpg)
Constraints
Pj,v zjv 1Any policy obtains payoff from one box:
zjv yj ⇥ fj(v)Can only get payoff from opened box:
Expected number of boxes from which payoff is obtained
Relaxation: Only encode expected number of boxes from which payoff is obtained
![Page 47: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/47.jpg)
Constraints
Pj,v zjv 1Any policy obtains payoff from one box:
zjv yj ⇥ fj(v)Can only get payoff from opened box:
Any policy opens at most k boxes: P
j yj k
Expected number of boxes opened
Relaxation: Only encode expected number of boxes opened and not for every decision path
![Page 48: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/48.jpg)
Constraints
Pj,v zjv 1Any policy obtains payoff from one box:
zjv yj ⇥ fj(v)Can only get payoff from opened box:
Any policy opens at most k boxes: P
j yj k
yj is a probability value: yj 2 [0, 1]
![Page 49: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/49.jpg)
LP Relaxation of Optimal Policy
Pj,v zjv 1Any policy obtains payoff from one box:
zjv yj ⇥ fj(v)Can only get payoff from opened box:
Any policy opens at most k boxes: P
j yj k
yj is a probability value: yj 2 [0, 1]
Pj,v v ⇥ zjvMaximize Payoff =
![Page 50: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/50.jpg)
Simple Example: Open all boxes
Xa
Xb
2 with probability ½
0 with probability ½
1 with probability ½
0 with probability ½
k = 2
ya = yb = 1
![Page 51: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/51.jpg)
LP Relaxation
Xa Xb
2 with probability ½
0 with probability ½
1 with probability ½
0 with probability ½
Maximize 2⇥ za2 + 1⇥ zb1
za2 + zb1 1
za2 2 [0, 1/2]zb1 2 [0, 1/2]
za2 zb1
![Page 52: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/52.jpg)
LP Optimum
Xa Xb
2 with probability ½
0 with probability ½
1 with probability ½
0 with probability ½
Maximize 2⇥ za2 + 1⇥ zb1
za2 + zb1 1
za2 2 [0, 1/2]zb1 2 [0, 1/2]
za2 = 1/2 zb1 = 1/2
LP optimal payoff = 1.5
![Page 53: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/53.jpg)
Optimal Decision Policy?
Xa Xb
2 with probability ½
0 with probability ½
1 with probability ½
0 with probability ½
Maximize 2⇥ za2 + 1⇥ zb1
za2 + zb1 1
za2 2 [0, 1/2]zb1 2 [0, 1/2]
za2 = 1/2 zb1 = 1/4
Optimal payoff = 1.25
![Page 54: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/54.jpg)
What do we do with LP solution?
• Will convert it into a feasible policy
• Bound the payoff in terms of LP optimum � LP Optimum upper bounds optimal payoff
![Page 55: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/55.jpg)
LP Variables yield Single-box Policy Pj
Xj
v with probability fj(v)
Open j with probability yj If Xj = v then Take this payoff with probability zjv /(yj fj(v))
zjv yj
![Page 56: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/56.jpg)
Simpler Notation for Policy Pj
O(Pj) = Pr[j opened] = yj
C(Pj) = Pr[Payo↵ of j chosen] =
Pv zjv
R(Pj) = E[Reward from j] =
Pv v ⇥ zjv
![Page 57: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/57.jpg)
LP Relaxation
LP yields collection of Single Box Policies!
Maximize
Pj,v v · zjv
Pv zjv 1
Pj yj k
zjv yj · fj(v) 8j, v
yj 2 [0, 1] 8j
Maximize
Pj R(Pj)
Pj C(Pj) 1
Pj O(Pj) k
Each Pj feasible
![Page 58: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/58.jpg)
What does LP give us?
• LP yields single box policies such that� Σi R(Pi) ≥ OPT � Σi C(Pi) ≤ 1 � Σi O(Pi) ≤ k
• To convert to a feasible policy:
� Step 1: Order boxes arbitrarily as 1,2,3,… � Consider boxes in this order
![Page 59: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/59.jpg)
Final Algorithm • When box j encountered:
� With probability ¾ skip this box � With probability ¼, execute policy Pj
![Page 60: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/60.jpg)
Final Algorithm • When box j encountered:
� With probability ¾ skip this box � With probability ¼, execute policy Pj
• Policy Pj probabilistically decides to open j, and if opened, take its payoff
![Page 61: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/61.jpg)
Final Algorithm • When box j encountered:
� With probability ¾ skip this box � With probability ¼, execute policy Pj
• Policy Pj probabilistically decides to open j, and if opened, take its payoff
• If policy decides to take payoff from j: � Take this payoff and STOP
• Else move to box j+1
![Page 62: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/62.jpg)
Final Algorithm • When box j encountered:
� With probability ¾ skip this box � With probability ¼, execute policy Pj
• Policy Pj probabilistically decides to open j, and if opened, take its payoff
• If policy decides to take payoff from j: � Take this payoff and STOP
• Else move to box j+1
• If k boxes already opened, then STOP
![Page 63: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/63.jpg)
Box-by-box Accounting
• Let Oj = 1 if policy Pj opens j
• Let Cj = 1 if policy Pj chooses payoff from j
• Policy reaches box i iff: P
j<i Cj < 1
Pj<i Oj < k
Let’s lower bound this probability
![Page 64: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/64.jpg)
Markov’s Inequality
PrhP
j<i Cj < 1i
� 1�P
j<i E[Cj ]
PrhP
j<i Oj < ki
� 1�P
j<i E[Oj ]
k
![Page 65: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/65.jpg)
Union Bounds
PrhP
j<i Cj < 1 andP
j<i Oj < ki
� 1�⇣P
j<i E[Cj ] +P
j<i E[Oj ]
k
⌘
Policy reaches box i
![Page 66: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/66.jpg)
Use Independence of Boxes
E[Cj ] E [Cj | Box j not skipped]⇥ Pr [ Box j not skipped]
C(Pj)⇥ 14
E[Oj ] E [Oj | Box j not skipped]⇥ Pr [ Box j not skipped]
O(Pj)⇥ 14
![Page 67: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/67.jpg)
Putting it together Policy reaches box i
PrhP
j<i Cj < 1 andP
j<i Oj < ki
� 1�⇣P
j<i E[Cj ] +P
j<i E[Oj ]
k
⌘
� 1� 14
⇣Pj<i C(Pj) +
Pj<i O(Pj)
k
⌘
� 1� 14 ⇥ (1 + 1) = 1
2
![Page 68: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/68.jpg)
� Pr [ Box i is reached]⇥E [Reward from i]
� 12 ⇥ Pr [ Box i is not skipped ]⇥R(Pi)
� R(Pi)8
8-approximation Expected contribution to reward from Pi
![Page 69: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/69.jpg)
Adaptivity Gap Duality [Guha, Munagala ‘09]
![Page 70: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/70.jpg)
Recall LP Relaxation Maximize
Pj R(Pj)
Pj C(Pj) 1
Pj O(Pj) k
Each Pj feasible
Policy obtains payoff from one box
Single-box policy is feasible
Any policy opens at most k boxes
Maximize Payoff
![Page 71: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/71.jpg)
Relaxed LP Maximize
Pj R(Pj)
Pj
⇣C(Pj) +
O(Pj)k
⌘ 2
Each Pj feasible
![Page 72: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/72.jpg)
Maximize
Pj R(Pj)
Pj
⇣C(Pj) +
O(Pj)k
⌘ 1
Each Pj feasible
Scale down variables by factor 2
![Page 73: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/73.jpg)
Lagrangian
Dual variable = w
Maximize
Pj R(Pj)
Pj
⇣C(Pj) +
O(Pj)k
⌘ 1
Each Pj feasible
Max. w +
Pj
�R(Pj)� w ⇥ C(Pj)� w
k O(Pj)�
Each Pj feasible
![Page 74: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/74.jpg)
Interpretation of Lagrangian
• Decouples into a separate optimization per box! • Can open and choose payoff from many boxes
Max. w +
Pj
�R(Pj)� w ⇥ C(Pj)� w
k O(Pj)�
Each Pj feasible
![Page 75: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/75.jpg)
Optimization Problem for Box j
• Net value from choosing j: � If j opened, then pay cost = w/k � If we choose payoff of j, then pay cost = w � If we choose payoff of j, obtain that reward
• Net value = Reward minus cost paid
Max. R(Pj)� w ⇥ C(Pj)� wk O(Pj)
Pj feasible
![Page 76: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/76.jpg)
Optimal Solution to Lagrangian
• For box j, choose solution with better value
• Solution 1: Don’t open box � Net value = 0
• Solution 2: Open box � Pay cost = w/k � If Reward > w, then choose this reward, pay cost w � Net value = E[ Reward – Cost ]
• Decision to open any box is deterministic!
![Page 77: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/77.jpg)
Strong Duality (roughly speaking)
Choose Lagrange multiplier w such that
Lag(w) =X
j
Rj + w ⇥
0
@1�X
j
✓Cj +
Oj
k
◆1
A
Pj
⇣Cj +
Oj
k
⌘= 1
)P
j Rj � OPT2
![Page 78: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/78.jpg)
Non-adaptive Policy • Since Oj is either 0 or 1
� LP optimum opens at most k boxes deterministically! � Suppose we open all these boxes
• The expected maximum payoff of these boxes is at least the value of rounding the LP
� But rounding has value at least OPT/16
• Therefore, the adaptivity gap is at most 16! � Better choice of w improves this to factor 3
[Guha, Munagala, Shi ‘09]
![Page 79: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/79.jpg)
Takeaways…
• LP-based proof oblivious to non-linear closed form for max
• Automatically yields policies with right “form” � Adaptivity gap follows from duality
• Needs independence of random variables � Weakly coupled linear program and rounding � More on weak and strong relaxations in next half!
![Page 80: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/80.jpg)
Part 2. Weakly Coupled Relaxations
![Page 81: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/81.jpg)
Weakly Coupled Decision Systems Independent decision spaces
Few constraints coupling decisions across spaces
[Singh & Cohn ’97; Meuleau et al. ‘98]
![Page 82: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/82.jpg)
General Recipe
• Write LP with constraints on expected values � Important: Constant number of such constraints � Stronger relaxations are sometimes needed
• Solve LP and use Markov’s inequality to round
• Dual typically yields more structured solution � For instance, threshold policies and adaptivity gaps
![Page 83: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/83.jpg)
Maximum Value Setting
• Each box defines its own decision space � Payoffs of boxes are independent
• Coupling constraints (write in expectation): � At most k boxes opened � At most one box’s payoff finally chosen
• LP yields a threshold policy: � Choose payoff if value > dual multiplier w
![Page 84: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/84.jpg)
Stochastic Knapsack [Dean, Goemans, Vondrak ’04; Bhalgat, Goel, Khanna ‘11] • Size of item i drawn from distribution Xi
� Learn actual size only after placing i in knapsack � Sizes of items independent � Any size at most knapsack capacity B
• Adaptive policy for placing items in knapsack � If knapsack capacity violated, then STOP
• Maximize expected reward
![Page 85: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/85.jpg)
Weakly Coupled Relaxation
Pr[j placed in knapsack]
Expected reward
Maximize
Pj Rjyj
Pj yj ·E[Xj ] 2B
yj 2 [0, 1]
![Page 86: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/86.jpg)
Stochastic Matching pij
i j
Men Women
• Can send some man i and some woman j on date • Date succeeds with probability pij and yields reward rij
• Successful match removes i and j from graph • Failed match deletes edge (i,j)
rij
![Page 87: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/87.jpg)
Stochastic Matching [Chen et al. ’09; Bansal et al. ‘10]
• Input: Matrix of pij and rij
• Decision policy: � Adaptive order of setting up dates
• Goal: � Maximize expected reward of successful matches
![Page 88: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/88.jpg)
LP Relaxation
Expected number of successful matches per man and woman at most 1
Pr[ i goes on a date with j]
Maximize
Pi,j rijpijxij
Pj pijxij 1 8i
Pi pijxij 1 8j
xij 2 [0, 1] 8i, j
![Page 89: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/89.jpg)
Bayesian Pricing
vj ~ Xj n items
![Page 90: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/90.jpg)
Unit Demand Setting [Chawla, Hartline, Kleinberg ’07; Chawla et al. ‘10; Bhattacharya et al. ‘10]
• One agent and n items � Agent wants only one item
• Value vj follows independent distribution Xj � Exact value known only to agent � Seller only knows distribution
![Page 91: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/91.jpg)
Item Pricing Scheme
Xj pj
Posted prices
Buyer chooses item that maximizes vj - pj
![Page 92: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/92.jpg)
Revenue Maximization
• Bayesian Pricing: � Post prices pj for each item j based on knowing Xj � Agent chooses that item that maximizes vj – pj � Seller earns the price pj
• Seller’s goal: � Maximize Revenue = Expected price earned
![Page 93: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/93.jpg)
LP Variables
xjp = Pr [Price of j = p]
yjp(v) = Pr [Price of j = p ^Xj = v ^ j is bought]
LP Constraints:
• Every item has exactly one price
• Agent buys at most one item
• Agent only buys item if value is larger than price
![Page 94: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/94.jpg)
LP Relaxation
Pr[Xj = v]
E[Items bought] is at most 1
One price for each j
Maximize
Pj,p,v p · yjp(v)
Pj,p,v yjp(v) 1
Pp xjp 1 8j
yjp(v) xjpfj(v) 8j, p, v � p
![Page 95: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/95.jpg)
Lagrangian decouples across items!
Integral variable
Maximize
Pj,p,v (p� �) · yjp(v)
Pp xjp 1 8j
yjp(v) xjpfj(v) 8j, p, v
For each j, Lagrangian chooses one price pj
![Page 96: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/96.jpg)
Lagrangian optimum is simple
p⇤j (�) = argmaxp�� ((p� �) · Pr [Xj � p])
LP optimum chooses λ so that expected number of items bought is exactly 1
![Page 97: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/97.jpg)
Lagrangian Optimum for Item j
Price p
1
0 pj λ
Pr[Xj ≥ p]
![Page 98: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/98.jpg)
Some Complexity Results • Bayesian Pricing ▫ (Q)PTAS for “reasonable” distributions [Cai Daskalakis ‘11]
▫ NP-complete in general [Chen et al. ’13]
▫ Correlated distributions � Hard to approximate beyond logarithmic factors [Briest ‘11]
• Stochastic Knapsack � PTAS [Bhalgat, Goel, Khanna ‘11]
![Page 99: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/99.jpg)
Part 3. Sampling-based Approaches
![Page 100: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/100.jpg)
Overview • MDPs with small number of “stages”
• Exponential sized LP over all possible “scenarios” of underlying distribution
� Solve LP or its Lagrangian by sampling the scenarios
• Examples: � 2-stage vertex cover � Stochastic Steiner trees (combinatorial algorithm) � Bayesian auctions � Solving LPs online
![Page 101: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/101.jpg)
Multi-stage Vertex Cover
Vertex v costs cv
Distribution D over possible edge sets that can be realized
![Page 102: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/102.jpg)
Stage 1: Buy some vertices cheaply
Buy some vertices only knowing D
Vertex v costs cv
Pay cost cv
![Page 103: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/103.jpg)
Stage 2: Edge set realized
Vertex v costs cv
u
Total cost = cv + λ cu
Need to buy vertices at scaled up price to cover realized edges
![Page 104: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/104.jpg)
Multi-stage Covering Problems [Kleywegt, Shapiro, Homem-de-Mello ‘01; Shmoys, Swamy ‘04; Charikar, Chekuri, Pal ‘05] • Decision Policy:
� What vertices should we buy in Stage 1? � Knowing only D, costs, and scaling factor λ > 1
• Minimize total expected cost of vertices � Expectation over realization of edges from D
![Page 105: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/105.jpg)
LP when |D| is small
Rounding similar to vertex cover Randomized rounding yields tight 2 approximation
Generalizes to multi-stage vertex cover
Maximize
Pv xv + � ·E�2D [
Pv yv(�)]
xu + xv + yu(�) + yv(�) � 1 8�, e 2 E(�)
![Page 106: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/106.jpg)
Black Box Access to D
• Sample Average Approximation � Draw poly many samples; solve LP on these samples � Approximation results carry over with small loss
• Combinatorial “boosted sampling” [Gupta et al.’04]
� Draw a set of samples from D in Stage 1 � Solve covering problem on union of these samples � Augment this solution with the realization in stage 2
![Page 107: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/107.jpg)
Stochastic Steiner Tree
Root r
i
j
d(i,j)
Distribution D over vertices V
![Page 108: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/108.jpg)
Stochastic Steiner Tree [Garg et al. ‘08]
• K vertices arrive one at a time � Drawn i.i.d. from distribution D
• Goal: � Construct online Steiner tree connecting arriving
vertices to r
• Technique: Sampling from D
![Page 109: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/109.jpg)
T
Algorithm: Offline Stage 1 • Draw K samples from D • Construct 2-approximate Steiner tree T on samples • Expected cost at most 2OPT
� Samples statistically identical to online input
Root
K samples from D
![Page 110: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/110.jpg)
Algorithm: Online Stage 2
• When input vertex v arrives online � Connect v by shortest path to T
T
Root
v
![Page 111: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/111.jpg)
Sampling Analysis • K points in Stage 1 and v together are a random sample
of size K+1 from D. � Therefore, expected cost of connecting v most 2OPT/K
• Overall cost at most 4 OPT!
T
Root
v
![Page 112: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/112.jpg)
Bayesian Multi-item Auctions
n items
![Page 113: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/113.jpg)
Bayesian Setting [Cai, Daskalakis Weinberg, ‘12-’15, Bhalgat, Gollapudi, Munagala ‘13]
• Many bidders and items � Constraints on possible allocations
• Bidder j’s valuation vector follows distribution σj � Exact value known only to bidder � Distributions for different bidders independent � Auctioneer only knows distribution
• Assume: Single bidder’s distribution σj is poly-size
![Page 114: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/114.jpg)
Auction Design • Design auction maximizing expected revenue (or
total price charged) ▫ Auction = (Allocations, Prices) given revealed bids
![Page 115: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/115.jpg)
Auction Design • Design auction maximizing expected revenue (or
total price charged)
• Bayesian Incentive Compatibility: � Revealing true value maximizes expected utility of bidder � Expectation is over distribution of other agents
![Page 116: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/116.jpg)
Auction Design • Design auction maximizing expected revenue (or
total price charged)
• Bayesian Incentive Compatibility: � Revealing true value maximizes expected utility of bidder � Expectation is over distribution of other agents
• Individual Rationality: � Charge prices so that utility of any agent is non-negative � Constraint could be per scenario and not in expectation
![Page 117: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/117.jpg)
Why is this easier than Pricing?
• We allow “lotteries” � Randomized menu of allocations and prices � Incentive compatibility in expectation � Lotteries can be encoded by an LP
• Deterministic menus are hard to approximate! [Briest ‘11]
![Page 118: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/118.jpg)
Two types of LP variables Expected value (marginal) variables
Expectation over valuations of other agents
Per-scenario variables
Xj(~vj) = E [ Allocation to j|�j = ~vj ]
Pj(~vj) = E [ Price for j|�j = ~vj ]
~x(⌘) = Allocations | Valuations = ⌘
~p(⌘) = Prices | Valuations = ⌘
Exponentially many
scenarios!
![Page 119: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/119.jpg)
LP Constraints • Expected value constraints for every agent j and
valuation vector vj: � Bayesian incentive compatibility � Maximize expected revenue
![Page 120: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/120.jpg)
LP Constraints • Expected value constraints for every agent j and
valuation vector vj: � Bayesian incentive compatibility � Maximize expected revenue
• Per-scenario constraints (exponentially many): � Allocations and prices are feasible for every scenario η � Individual rationality
![Page 121: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/121.jpg)
LP Constraints • Expected value constraints for every agent j and
valuation vector vj: � Bayesian incentive compatibility � Maximize expected revenue
• Per-scenario constraints (exponentially many): � Allocations and prices are feasible for every scenario η � Individual rationality
• Coupling constraints: Xj(~vj) =P
⌘|⌘j= ~vjPr[⌘] · xj(⌘)
Pj(~vj) =P
⌘|⌘j= ~vjPr[⌘] · pj(⌘)
Exponentially large summation!
![Page 122: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/122.jpg)
Key Idea: Sample Scenarios • Take Lagrangian of coupling constraints
� One Lagrange multiplier for each agent and its value � Poly-many multipliers or “virtual welfares”
Xj(~vj) =P
⌘|⌘j= ~vjPr[⌘] · xj(⌘)
Pj(~vj) =P
⌘|⌘j= ~vjPr[⌘] · pj(⌘)
![Page 123: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/123.jpg)
Key Idea: Sample Scenarios • Take Lagrangian of coupling constraints
� One Lagrange multiplier for each agent and its value � Poly-many multipliers or “virtual welfares”
• Lagrangian decouples into two separate problems: � LP over expected value variables � Separate maximization problem for each scenario η and take
expectation over scenarios � Estimate this expectation by sampling the scenarios!
![Page 124: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/124.jpg)
Key Idea: Sample Scenarios • Take Lagrangian of coupling constraints
� One Lagrange multiplier for each agent and its value � Poly-many multipliers or “virtual welfares”
• Lagrangian decouples into two separate problems: � LP over expected value variables � Maximization problem for each scenario η and take
expectation over scenarios � Estimate this expectation by sampling scenarios!
• Given efficient oracle for solving Lagrangian � Solve LP using no-regret learning, Ellipsoid, …
![Page 125: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/125.jpg)
“Online” Algorithms [Agarwal, Devanur ‘14]
• Suppose scenarios arrive i.i.d. from unknown distribution
• Need to solve some LP over expected allocations
� But with feasibility constraints per scenario � Motivation: Budgeted allocations, envy-freeness, …
• Arriving scenarios can be treated as samples! � Implies overall LP can be solved online via Lagrangian � Need not even know distribution upfront!
![Page 126: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/126.jpg)
Part 4. Scheduling Problems
![Page 127: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/127.jpg)
Overview
• New aspect of timing the actions � So far, we have ignored timing completely!
• Two techniques: ▫ Stronger LP relaxations than weak coupling � Stochastic scheduling on identical machines � Stochastic knapsack (not covered)
▫ Greedy policies � Gittins index theorem
![Page 128: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/128.jpg)
Stochastic Scheduling
m parallel machines
pj ~ Xj
Jobs
![Page 129: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/129.jpg)
Stochastic Scheduling [Mohring, Schulz, Uetz ‘96]
• Realize exact length only after job is scheduled � No preemption or release dates
• Adaptive policy: � Adaptive ordering of jobs and machines to assign
them to
• Goal: � Minimize expected sum of completion times
![Page 130: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/130.jpg)
Adaptive Policy
m parallel machines
Jobs
![Page 131: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/131.jpg)
LP-based Reduction to Determinism
• Write LP assuming job lengths are deterministic
• Variables are start times Sj of jobs Minimize
Pj(pj + Sj)
Pj2A pjSj � 1
2m
Pi 6=j2A pipj � m�1
2m
Pj2A p2j
8 subsets A of jobs
![Page 132: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/132.jpg)
LP for Stochastic Case • Take expectations over job lengths
� Note job length independent of start time
• Rounding: Schedule jobs in increasing order of LP objective
Minimize
Pj (E[Sj ] + µj)
Pj2A µjE[Sj ] � 1
2m
⇣Pj2A µj
⌘2� 1
2
Pj2A µ2
j � m�12m
Pj2A �2
j
8 subsets A of jobs
![Page 133: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/133.jpg)
Multi-armed Bandits [Gittins and Jones ’74, Tsitsiklis ‘80]
• n independent bandit arms � Each arm defines its own Markov decision space � Only two actions per arm: “PLAY” or “STOP”
a
b
d
c
pab ra State space of an arm
![Page 134: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/134.jpg)
• n independent bandit arms � Each arm defines its own Markov decision space � Only two actions per arm: “PLAY” or “STOP”
• At each step, can play at most one arm
a
b
q
w
pab ra Play arm 1
Arms
1
2
3
Arm’s state changes only when played
Multi-armed Bandits [Gittins and Jones ’74, Tsitsiklis ‘80]
![Page 135: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/135.jpg)
• Rt = Reward at time t
• ϒ = Discount factor < 1
• Find policy that maximizes discounted reward:
E
" 1X
t=0
�tRt
#
Multi-armed Bandits [Gittins and Jones ’74, Tsitsiklis ‘80]
![Page 136: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/136.jpg)
What is a policy?
• Given current state of each arm � Which arm to play next?
• “State space” is exponential in number of arms
• Surprising but non-trivial result: � A greedy policy is optimal! � Polynomial time computable and executable!
![Page 137: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/137.jpg)
Why is this non-trivial?
• Playing arm whose current state has highest reward may be sub-optimal
� Arm can have low reward right now, but playing it yields state with high reward
� But this can happen two states down the road, …
• This means policy needs to take entire future behavior of arm into account!
![Page 138: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/138.jpg)
Single Arm Problem via Duality • Fix penalty (or dual cost) λ
• Focus on some state s of some arm i � Suppose this is the start state
• Suppose arm i was only arm in system � At each step, can play arm i by paying penalty λ � Or can STOP and exit
• Vi(s, λ) = Optimal discounted payoff � Easy to compute by dynamic programming
![Page 139: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/139.jpg)
The Gittins Index
• For state s of arm i, Gittins index: Largest penalty λ such that Vi(s, λ) = 0
• Same as:
� Expected discounted per-step reward if we keep playing i as long as state is “at least as good as” s
• “At least as good as” = Larger Gittins index!
![Page 140: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/140.jpg)
Intuition
• A state has large Gittins index if either: ▫ State itself has high reward � So play in this state and then STOP ▫ State leads to states with large reward � So long-term per-step reward is large
• In either case, this state is a “good” state to play
![Page 141: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/141.jpg)
Gittins index policy
• At each step, play the arm whose current state has largest Gittins index ▫ Optimal!
• Proof of optimality � Exchange argument similar to greedy analyses
![Page 142: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/142.jpg)
Other Problems and Approaches • Stochastic makespan, Bin packing
[Kleinberg, Rabani, Tardos ’97]
• Inventory management [Levi, Pal, Roundy, Shmoys ‘04]
• Stochastic set cover and probing problems [Etzioni et al., ‘96; Munagala, Srivastava, Widom ‘06; Liu et al., ’08; Gupta-Nagarajan ’15 …]
• Techniques: � Analysis of greedy policies � Discretizing distributions and dynamic programming
![Page 143: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/143.jpg)
Open Questions • How far can we push LP based techniques?
� Can we encode adaptive policies more generally? � For instance, bandits with matroid constraints?
• Several problem classes poorly understood � Stochastic machine scheduling � Auctions with budget constraints
• What if we don’t have full independence? � Some success in auction design � In general, need tractable models of correlation
![Page 144: Approximation Algorithms for Stochastic Optimization · 2020-01-03 · • Approximation algorithms for Maximum Value Non-adaptive policy Linear programming relaxation Duality and](https://reader033.fdokument.com/reader033/viewer/2022042911/5f4187e81c25e8370d251eb3/html5/thumbnails/144.jpg)
Thanks!