Lasso Slides Tibsharani
Lasso Slides Tibsharani
The lasso:
some novel algorithms and
applications
Robert Tibshirani
Stanford University
From MyHeritage.Com
5
β2 ^
β
. β2 ^
β
.
β1 β1
8
lcavol
0.6
0.4
svi
lweight
pgg45
Coefficients
0.2
lbph
0.0
gleason
age
−0.2
lcp
Shrinkage Factor s
9
Emerging themes
Outline
Soft-thresholding
β1
λ β2
β4
β3
14
• Turns out that this is coordinate descent for the lasso criterion
X X X
2
(yi − xij βj ) + λ |βj |
i j
• Start with large value for λ (very sparse model) and slowly
decrease it
• most coordinates that are zero never become non-zero
• coordinate descent code for Lasso is just 73 lines of
Fortran!
17
Extensions
Logistic regression
Logistic regression
Multiclass classification
X
minimize (yi − ŷi )2 subject to ŷ1 ≤ ŷ2 . . .
Solved by Pool Adjacent Violators algorithm.
• Near-isotonic regression:
n n−1
1X 2
X
βλ = argmin β∈Rn (yi − βi ) + λ (βi − βi+1 )+ ,
2 i=1 i=1
Numerical approach
No improvement No improvement
Improvement
25
26
Toy example
λ=0 λ = 0.25
λ = 0.7 λ = 0.77
29
lam= 0 , ss= 0 , viol= 5.6 lam= 0.3 , ss= 0.68 , viol= 0.5
0.4
0.4
0.2
0.2
Temperature anomalies
Temperature anomalies
0.0
0.0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
1850 1900 1950 2000 1850 1900 1950 2000
Year Year
lam= 0.6 , ss= 0.88 , viol= 0.3 lam= 1.8 , ss= 1.39 , viol= 0
0.4
0.4
0.2
0.2
Temperature anomalies
Temperature anomalies
0.0
0.0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
Year Year
31
minimize rank(Z)
subject to Zij = Xij , ∀(i, j) ∈ Ω (1)
Not convex!
gs
in an r
er om tte n et
th Po io
of W ct ll lv
rd ty ar
ry Fi Bi ve
et lp ill ue
Lo Pr H
Pu K Bl
Daniela 5 5 4 1 1 1
4 5 4 2 1
Genevera ?
Larry 1 5
? 2 5 4
Jim ? ? 2 4 3 5
1 1 3 ? ? 5
Andy
32
33
minimize kZk∗
subject to Zij = Xij , ∀(i, j) ∈ Ω (2)
minimize kZk∗
X
subject to (Zij − Xij )2 ≤ δ (3)
(i,j)∈Ω
34
Idea of Algorithm
Notation
Algorithm
Properties of Algorithm
1
minimize kPΩ (X) − PΩ (Z)k2F + λkZk∗ . (7)
Z 2
which is equivalent to the bound version (3),
39
Timings
(105 , 105 ) 104 15 10 (5, 14, 32, 62) (37, 74.5, 199.8, 653)
Accuracy
0.85 0.7
0.8 0.6
0.75 0.5
0.7 0.4
0.65 0.3
0.6 0.2
0.55 0.1
0.5 0
0 1000 2000 0 1000 2000
Nuclear Norm Nuclear Norm
41
Discussion
Some challenges
References
[CCS08] Jian-Feng Cai, Emmanuel J. Candes, and Zuowei Shen. A sin-
gular value thresholding algorithm for matrix completion, 2008.
[CR08] Emmanuel Candès and Benjamin Recht. Exact matrix com-
pletion via convex optimization. Foundations of Computa-
tional Mathematics, 2008.
[CT09] Emmanuel J. Candès and Terence Tao. The power of convex
relaxation: Near-optimal matrix completion, 2009.
[Faz02] M. Fazel. Matrix Rank Minimization with Applications.
PhD thesis, Stanford University, 2002.