0% found this document useful (0 votes)

29 views44 pages

Lasso Slides Tibsharani

The document discusses novel algorithms and applications of the lasso. It describes a new fast algorithm for lasso called pathwise coordinate descent. It also provides examples of using lasso for tasks like linear regression, classification, and matrix completion.

Uploaded by

Jose Pedro Montenegro Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views44 pages

Lasso Slides Tibsharani

Uploaded by

Jose Pedro Montenegro Santos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

1

The lasso:
some novel algorithms and
applications
Robert Tibshirani
Stanford University

Dept. of Statistics, Purdue, February 2011

Collaborations with Trevor Hastie, Jerome Friedman, Holger

Hoefling, Rahul Mazumber, Ryan Tibshirani
Email:[email protected]
http://www-stat.stanford.edu/~tibs
2
3

Jerome Friedman Trevor Hastie

From MyHeritage.Com
5

Jerome Friedman Trevor Hastie

Linear regression via the Lasso (Tibshirani, 1995)

• Outcome variable yi , for cases i = 1, 2, . . . n, features xij ,

j = 1, 2, . . . p
• Minimize
n
X X p
X
(yi − xij βj )2 + λ |βj |
i=1 j j=1

• Equivalent to minimizing sum of squares with constraint

P
|βj | ≤ s.
P 2
• Similar to ridge regression, which has constraint j βj ≤ t
• Lasso does variable selection and shrinkage; ridge only shrinks.
• See also “Basis Pursuit” (Chen, Donoho and Saunders, 1998).
7

Picture of Lasso and Ridge regression

β2 ^
β
. β2 ^
β
.

β1 β1
8

Example: Prostate Cancer Data

yi = log (PSA), xij measurements on a man and his prostate

lcavol

0.6
0.4

svi
lweight
pgg45
Coefficients

0.2

lbph
0.0

gleason

age
−0.2

lcp

0.0 0.2 0.4 0.6 0.8 1.0

Shrinkage Factor s
9

Emerging themes

• Lasso (ℓ1 ) penalties have powerful statistical and

computational advantages
• ℓ1 penalties provide a natural to encourage/enforce sparsity
and simplicity in the solution.
• “Bet on sparsity principle” (In the Elements of Statistical
learning). Assume that the underlying truth is sparse and use
an ℓ1 penalty to try to recover it. If you’re right, you will do
well. If you’re wrong— the underlying truth is not sparse—,
then no method can do well. [Bickel, Buhlmann, Candes,
Donoho, Johnstone,Yu ...]
• ℓ1 penalties are convex and the assumed sparsity can lead to
significant computational advantages
10

Outline

• New fast algorithm for lasso- Pathwise coordinate descent

• Three examples of applications/generalizations of the lasso:
• Logistic/multinomial for classification. Example later of
classification from microarray data
• Near-isotonic regression - a modern take on an old idea
• The matrix completion problem
• Not covering: sparse multivariate methods- Principal
components, canonical correlation, clustering (Daniela
Witten’s thesis). Google ’Daniela Witten’ − > “Penalized
matrix decomposition”
11

Algorithms for the lasso

• Standard convex optimizer

• Least angle regression (LAR) - Efron et al 2004- computes
entire path of solutions. State-of-the-Art until 2008
• Pathwise coordinate descent- new
12

Pathwise coordinate descent for the lasso

• Coordinate descent: optimize one parameter (coordinate) at a

time.
• How? suppose we had only one predictor. Problem is to
minimize X
(yi − xi β)2 + λ|β|
i

• Solution is the soft-thresholded estimate

sign(β̂)(|β̂| − λ)+
where β̂ is usual least squares estimate.
• Idea: with multiple predictors, cycle through each predictor in
P
turn. We compute residuals ri = yi − j6=k xij β̂k and applying
univariate soft-thresholding, pretending that our data is
(xij , ri ).
13

Soft-thresholding

β1

λ β2

β4

β3
14

• Turns out that this is coordinate descent for the lasso criterion
X X X
2
(yi − xij βj ) + λ |βj |
i j

• like skiing to the bottom of a hill, going north-south, east-west,

north-south, etc. [Show movie]
• Too simple?!
15

A brief history of coordinate descent for the lasso

• 1997: Tibshirani’s student Wenjiang Fu at University of

Toronto develops the “shooting algorithm” for the lasso.
Tibshirani doesn’t fully appreciate it
• 2002 Ingrid Daubechies gives a talk at Stanford, describes a
one-at-a-time algorithm for the lasso. Hastie implements it,
makes an error, and Hastie +Tibshirani conclude that the
method doesn’t work
• 2006: Friedman is the external examiner at the PhD oral of
Anita van der Kooij (Leiden) who uses the coordinate descent
idea for the Elastic net. Friedman wonders whether it works for
the lasso. Friedman, Hastie + Tibshirani start working on this
problem. See also Wu and Lange (2008)!
16

Pathwise coordinate descent for the lasso

• Start with large value for λ (very sparse model) and slowly
decrease it
• most coordinates that are zero never become non-zero
• coordinate descent code for Lasso is just 73 lines of
Fortran!
17

Extensions

• Pathwise coordinate descent can be generalized to many other

models: logistic/multinomial for classification, graphical lasso
for undirected graphs, fused lasso for signals.
• Its speed and simplicity are quite remarkable.
• glmnet R package now available on CRAN
18

Logistic regression

• Outcome Y = 0 or 1; Logistic regression model

P r(Y = 1)
log( ) = β0 + β1 X1 + β2 X2 . . .
1 − P r(Y = 1)

• Criterion is binomial log-likelihood +absolute value penalty

• Example: sparse data. N = 50, 000, p = 700, 000.
• State-of-the-art interior point algorithm (Stephen Boyd,
Stanford), exploiting sparsity of features : 3.5 hours for 100
values along path
19

Logistic regression

• Outcome Y = 0 or 1; Logistic regression model

P r(Y = 1)
log( ) = β0 + β1 X1 + β2 X2 . . .
1 − P r(Y = 1)

• Criterion is binomial log-likelihood +absolute value penalty

• Example: sparse data. N = 50, 000, p = 700, 000.
• State-of-the-art interior point algorithm (Stephen Boyd,
Stanford), exploiting sparsity of features : 3.5 hours for 100
values along path
• Pathwise coordinate descent: 1 minute
20

Multiclass classification

Microarray classification: 16,000 genes, 144 training samples 54 test

samples, 14 cancer classes. Multinomial regression model.
Methods CV errors Test errors # of
out of 144 out of 54 genes used

1. Nearest shrunken centroids 35 (5) 17 6520

2. L2 -penalized discriminant analysis 25 (4.1) 12 16063
3. Support vector classifier 26 (4.2) 14 16063
4. Lasso regression (one vs all) 30.7 (1.8) 12.5 1429
5. K-nearest neighbors 41 (4.6) 26 16063
6. L2 -penalized multinomial 26 (4.2) 15 16063
7. Lasso-penalized multinomial 17 (2.8) 13 269
8. Elastic-net penalized multinomial 22 (3.7) 11.8 384
21

Near Isotonic regression

Ryan Tibshirani, Holger Hoefling, Rob Tibshirani (2010)

• generalization of isotonic regression: data sequence
y1 , y2 , . . . yn .

X
minimize (yi − ŷi )2 subject to ŷ1 ≤ ŷ2 . . .
Solved by Pool Adjacent Violators algorithm.
• Near-isotonic regression:
n n−1
1X 2
X
βλ = argmin β∈Rn (yi − βi ) + λ (βi − βi+1 )+ ,
2 i=1 i=1

with x+ indicating the positive part, x+ = x · 1(x > 0).

Near-isotonic regression- continued

• Convex problem. Solution path β̂i = yi at λ = 0 and

culminates in usual isotonic regression as λ → ∞. Along the
way gives near monotone approximations.
23

Numerical approach

How about using coordinate descent?

• Surprise! Although criterion is convex, it is not differentiable,
and coordinate descent can get stuck in the “cusps”
24

No improvement No improvement

Improvement
25
26

When does coordinate descent work?

Paul Tseng (1988), (2001)

If
X
f (β1 . . . βp ) = g(β1 . . . βp ) + hj (βj )
where g(·) is convex and differentiable, and hj (·) is convex, then
coordinate descent converges to a minimizer of f .

Non-differential part of loss function must be separable

Solution: devise a path algorithm

• Simple algorithm that computes the entire path of solutions, a

modified version of the well-known pool adjacent violators
• Analogous to LARS algorithm for lasso in regression
• Bonus: we show that the degrees of freedom is the number of
“plateaus” in the solution. Using results from Ryan
Tibshirani’s PhD work with Jonathan Taylor
28

Toy example

λ=0 λ = 0.25

λ = 0.7 λ = 0.77
29

Global warming data

lam= 0 , ss= 0 , viol= 5.6 lam= 0.3 , ss= 0.68 , viol= 0.5

0.4

0.4
0.2

0.2
Temperature anomalies

Temperature anomalies
0.0

0.0
−0.2

−0.2
−0.4

−0.4
−0.6

−0.6
1850 1900 1950 2000 1850 1900 1950 2000

Year Year

lam= 0.6 , ss= 0.88 , viol= 0.3 lam= 1.8 , ss= 1.39 , viol= 0
0.4

0.4
0.2

0.2
Temperature anomalies

Temperature anomalies
0.0

0.0
−0.2

−0.2
−0.4

−0.4
−0.6

−0.6

1850 1900 1950 2000 1850 1900 1950 2000

Year Year
31

The matrix completion problem

• Data Xm×n , for which only a relatively small number of entries

are observed. The problem is to “complete” or impute the
matrix based on the observed entries. Eg the Netflix database
(see next slide).
• For a matrix Xm×n let Ω ⊂ {1, . . . , m} × {1, . . . , n} denote the
indices of observed entries. Consider the following optimization
problem:

minimize rank(Z)
subject to Zij = Xij , ∀(i, j) ∈ Ω (1)

Not convex!
gs
in an r
er om tte n et
th Po io
of W ct ll lv
rd ty ar
ry Fi Bi ve
et lp ill ue
Lo Pr H
Pu K Bl
Daniela 5 5 4 1 1 1
4 5 4 2 1
Genevera ?
Larry 1 5
? 2 5 4
Jim ? ? 2 4 3 5
1 1 3 ? ? 5
Andy

32
33

• The following seemingly small modification to (1)

minimize kZk∗
subject to Zij = Xij , ∀(i, j) ∈ Ω (2)

makes the problem convex [Faz02]. Here kZk∗ is the nuclear

norm, or the sum of the singular values of Z.
• This criterion is used by [CT09, CCS08, CR08]. Fascinating
work! See figure.
• But this criterion requires the training error to be zero. This is
too harsh and can overfit!
• Instead we use the criterion:

minimize kZk∗
X
subject to (Zij − Xij )2 ≤ δ (3)
(i,j)∈Ω
34

Nuclear norm is like L1 norm for matrices

Idea of Algorithm

1. impute the missing data with some initial values

2. compute the SVD of the current matrix, and soft-threshold the
singular values
3. reconstruct the SVD and hence obtain new imputations for
missing values
4. repeat steps 2,3 until convergence
36

Notation

• Define a matrix PΩ (X) (with dimension n × m)


 X if (i, j) ∈ Ω
ij
PΩ (X) (i, j) = (4)
 0 if (i, j) ∈
/ Ω,

which is a projection of the matrix X onto the observed entries.

• Let

Sλ (W ) ≡ U Dλ V ′ with Dλ = diag [(d1 − λ)+ , . . . , (dr − λ)+ ] , (5)

where U DV ′ is the singular value decomposition of W ,

Algorithm

1. Initialize Z old = 0 and create a decreasing grid Λ of values

λ1 > . . . > λK .
2. For every fixed λ = λ1 , λ2 , . . . ∈ Λ iterate till convergence:
Compute Z new ← Sλ (PΩ (X) + PΩ⊥ (Z old ))
3. Output the sequence of solutions Ẑλ1 , . . . , ẐλK .

It X is sparse, then at each step the non-sparse matrix has the

structure:

X = XSP (Sparse) + XLR (Low Rank) (6)

Can apply Lanczos methods to compute the SVD efficiently.

Properties of Algorithm

We show this iterative algorithm converges to the solution to

1
minimize kPΩ (X) − PΩ (Z)k2F + λkZk∗ . (7)
Z 2
which is equivalent to the bound version (3),
39

Timings

(m, n) |Ω| true rank SNR effective rank time(s)

(3 × 104 , 104 ) 104 15 1 (13, 47, 80) (41.9, 124.7, 305.8)

(105 , 105 ) 104 15 10 (5, 14, 32, 62) (37, 74.5, 199.8, 653)

(105 , 105 ) 105 15 10 (18, 80) (202, 1840)

(5 × 105 , 5 × 105 ) 104 15 10 11 628.14

(5 × 105 , 5 × 105 ) 105 15 1 (3, 11, 52) (341.9, 823.4, 4810.75)

(106 , 106 ) 105 15 1 80 8906

Accuracy

50% missing entries with SNR=1, true rank =10

Test error Training error
1 1
L1
0.95 0.9 L1−U
L1−L0
0.9 0.8 C

0.85 0.7

0.8 0.6

0.75 0.5

0.7 0.4

0.65 0.3

0.6 0.2

0.55 0.1

0.5 0
0 1000 2000 0 1000 2000
Nuclear Norm Nuclear Norm
41

Discussion

• lasso penalties are useful for fitting a wide variety of models to

large datasets; pathwise coordinate descent enables to fit these
models to large datasets for the first time
• In CRAN: coordinate descent in R: glmnet- linear regression,
logistic, multinomial, Cox model, Poisson
• Also: LARS, nearIso, cghFLasso, glasso
• Matlab software for glm.net and matrix completion
http://www-stat.stanford.edu/∼ tibs/glmnet-matlab/
http://www-stat.stanford.edu/∼rahulm/SoftShrink
42

Ongoing work in lasso/sparsity

• grouped lasso (Yuan and Lin) and many variations (Peng,

Zhu...Wang “RemMap”)
• multivariate- principal components, canonical correlation,
clustering (Witten and others)
• matrix-variate normal (Genevera Allen)
• graphical models, graphical lasso (Yuan+Lin, Friedman,
Hastie+Tibs, Peng, Wang et al- “SPACE”)
• Compressed sensing (Candes and co-authors)
• “Strong rules” (Tibs et al 2010) provide a 5-80 fold speedup in
computation, with no loss in accuracy
43

Some challenges

• develop tools and theory that allow these methods to be used

in statistical practice: standard errors, p-values and confidence
intervals that account for the adaptive nature of the estimation.
• while it’s fun to develop these methods, as statisticians, our
ultimate goal is to provide better answers to scientific questions
43-1

References
[CCS08] Jian-Feng Cai, Emmanuel J. Candes, and Zuowei Shen. A sin-
gular value thresholding algorithm for matrix completion, 2008.
[CR08] Emmanuel Candès and Benjamin Recht. Exact matrix com-
pletion via convex optimization. Foundations of Computa-
tional Mathematics, 2008.
[CT09] Emmanuel J. Candès and Terence Tao. The power of convex
relaxation: Near-optimal matrix completion, 2009.
[Faz02] M. Fazel. Matrix Rank Minimization with Applications.
PhD thesis, Stanford University, 2002.

LASSO Book Tibshirani PDF
No ratings yet
LASSO Book Tibshirani PDF
362 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
Lec13
No ratings yet
Lec13
54 pages
Polymer by ADMET 1
No ratings yet
Polymer by ADMET 1
49 pages
0460 Geography: MARK SCHEME For The May/June 2010 Question Paper For The Guidance of Teachers
No ratings yet
0460 Geography: MARK SCHEME For The May/June 2010 Question Paper For The Guidance of Teachers
14 pages
Bundle Test Bank Guide to the Code of Ethics for Nurses Interpretation and Application 2nd Edition eBook and TestBank Bundle Instant Download
No ratings yet
Bundle Test Bank Guide to the Code of Ethics for Nurses Interpretation and Application 2nd Edition eBook and TestBank Bundle Instant Download
404 pages
Avionics System Hawker 800XP
100% (4)
Avionics System Hawker 800XP
730 pages
wainwrightslides2
No ratings yet
wainwrightslides2
77 pages
14 Regularization 2
No ratings yet
14 Regularization 2
18 pages
2022Lectures1-8_Optimization_for_DataScience
No ratings yet
2022Lectures1-8_Optimization_for_DataScience
35 pages
The Field Guide of Data Science Part 1
No ratings yet
The Field Guide of Data Science Part 1
210 pages
zouhastie05
No ratings yet
zouhastie05
20 pages
l1_ext_slides
No ratings yet
l1_ext_slides
21 pages
Elastic Net
No ratings yet
Elastic Net
29 pages
Sparse Linear Regression
No ratings yet
Sparse Linear Regression
45 pages
SLS Hastie
No ratings yet
SLS Hastie
50 pages
LAD_Lasso
No ratings yet
LAD_Lasso
15 pages
AML L2 Logistic regression
No ratings yet
AML L2 Logistic regression
37 pages
hw2
No ratings yet
hw2
10 pages
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
No ratings yet
Concise - Lecture - Notes - On - Optimization - Methods - 1722728042 2024-08-03 23 - 34 - 09
258 pages
SLS Corrected 1.4.16 PDF
No ratings yet
SLS Corrected 1.4.16 PDF
362 pages
BA Tutorial
No ratings yet
BA Tutorial
37 pages
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
No ratings yet
Tối Ưu Hóa Cho Khoa Học Dữ Liệu
64 pages
Acceleration Scribed
No ratings yet
Acceleration Scribed
8 pages
Slides 2
No ratings yet
Slides 2
27 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
No ratings yet
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
86 pages
sparsitNERANETEE NEURA RMNURLA NER WERK MANUA KPAPER PD PDF BENE
No ratings yet
sparsitNERANETEE NEURA RMNURLA NER WERK MANUA KPAPER PD PDF BENE
8 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Tikhonov Regularization
No ratings yet
Tikhonov Regularization
8 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
(Rajaratman) LASSO Multicoll
No ratings yet
(Rajaratman) LASSO Multicoll
57 pages
Extensions Beyond Linear Regression: Topics in Data Science
No ratings yet
Extensions Beyond Linear Regression: Topics in Data Science
66 pages
Lecture 13 - Reguralization
No ratings yet
Lecture 13 - Reguralization
33 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Module 3
No ratings yet
Module 3
35 pages
Lasso Linear Regression
No ratings yet
Lasso Linear Regression
8 pages
Convex Cardinality Optimization
No ratings yet
Convex Cardinality Optimization
26 pages
lecture03d_ridge
No ratings yet
lecture03d_ridge
13 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
25 Coord Desc
No ratings yet
25 Coord Desc
29 pages
02_lecturenote_GD
No ratings yet
02_lecturenote_GD
10 pages
Norm Methods For Convex-Cardinality Problems
No ratings yet
Norm Methods For Convex-Cardinality Problems
31 pages
Lecture 7 Loss Function and Regularization
No ratings yet
Lecture 7 Loss Function and Regularization
38 pages
Euclid Aos 1083178935
No ratings yet
Euclid Aos 1083178935
93 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Final Copy of 5 Project Report
No ratings yet
Final Copy of 5 Project Report
4 pages
Sparse Inverse Covariance Estimation With The Graphical Lasso
No ratings yet
Sparse Inverse Covariance Estimation With The Graphical Lasso
14 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
A Note On The Group Lasso and A Sparse Group Lasso PDF
No ratings yet
A Note On The Group Lasso and A Sparse Group Lasso PDF
9 pages
Forward Stagewise Regression and The Monotone Lasso: Trevor Hastie
No ratings yet
Forward Stagewise Regression and The Monotone Lasso: Trevor Hastie
29 pages
n25 PDF
No ratings yet
n25 PDF
8 pages
Design Practice: Introduction
No ratings yet
Design Practice: Introduction
3 pages
Homework 2
No ratings yet
Homework 2
5 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Lecture 24
No ratings yet
Lecture 24
8 pages
AI Image Generator PPT-1
No ratings yet
AI Image Generator PPT-1
15 pages
Barangay Sta Purok
No ratings yet
Barangay Sta Purok
3 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Regression Shrinkage and Selection Via The Lasso: A Retrospective
No ratings yet
Regression Shrinkage and Selection Via The Lasso: A Retrospective
10 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
A-24.2 Determination of Cortisol in Serum: Key Words
No ratings yet
A-24.2 Determination of Cortisol in Serum: Key Words
4 pages
Basic Principles of Taxation
No ratings yet
Basic Principles of Taxation
15 pages
MS51957D Machine Screw Pan Head
No ratings yet
MS51957D Machine Screw Pan Head
7 pages
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
No ratings yet
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
22 pages
Lab Experiment For Synchro Transmitter and Receiver PDF
100% (1)
Lab Experiment For Synchro Transmitter and Receiver PDF
8 pages
Air-Launch-To-Orbit: Jump To Navigation Jump To Search
No ratings yet
Air-Launch-To-Orbit: Jump To Navigation Jump To Search
13 pages
6th Set of Slides
No ratings yet
6th Set of Slides
45 pages
L R U D: Due: 5 PM, November 10Th (Friday)
No ratings yet
L R U D: Due: 5 PM, November 10Th (Friday)
2 pages
700.1. Pharmacy Operation Management Print 1
No ratings yet
700.1. Pharmacy Operation Management Print 1
11 pages
Biotechnology
No ratings yet
Biotechnology
57 pages
Smash The Stack
100% (1)
Smash The Stack
29 pages
Comparison Contrast Analogy Paragraph Samples
No ratings yet
Comparison Contrast Analogy Paragraph Samples
3 pages
Board Problem Template
No ratings yet
Board Problem Template
59 pages
Beta Decay for Pedestrians
From Everand
Beta Decay for Pedestrians
Harry J. Lipkin
No ratings yet
Physical and Chemical Changes Powerpoint
100% (1)
Physical and Chemical Changes Powerpoint
34 pages
VX Check Goodwin IOM General
No ratings yet
VX Check Goodwin IOM General
8 pages
Elster AL425 Diaphragm Meter
No ratings yet
Elster AL425 Diaphragm Meter
2 pages
Philippines Environmental Laws and Policies
No ratings yet
Philippines Environmental Laws and Policies
3 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Qualitative Biomechanical Analysis To UNDERSTAND INJURY DEVELOPMENT
No ratings yet
Qualitative Biomechanical Analysis To UNDERSTAND INJURY DEVELOPMENT
17 pages
Ch. 1 4 - Audicisenv Reviewer
No ratings yet
Ch. 1 4 - Audicisenv Reviewer
17 pages
Curative Calculation
100% (1)
Curative Calculation
2 pages
Electrical Machine by Ashfaq Hussain PDF
14% (7)
Electrical Machine by Ashfaq Hussain PDF
14 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Surface Texture Measurement Fundamentals For Metrology Center Open House
No ratings yet
Surface Texture Measurement Fundamentals For Metrology Center Open House
54 pages
EASA SIB in Flight Fuel Management
No ratings yet
EASA SIB in Flight Fuel Management
4 pages

Lasso Slides Tibsharani

Uploaded by

Lasso Slides Tibsharani

Uploaded by

1

Dept. of Statistics, Purdue, February 2011

Collaborations with Trevor Hastie, Jerome Friedman, Holger

Jerome Friedman Trevor Hastie

Jerome Friedman Trevor Hastie

Linear regression via the Lasso (Tibshirani, 1995)

• Outcome variable yi , for cases i = 1, 2, . . . n, features xij ,

• Equivalent to minimizing sum of squares with constraint

Picture of Lasso and Ridge regression

Example: Prostate Cancer Data

yi = log (PSA), xij measurements on a man and his prostate

0.0 0.2 0.4 0.6 0.8 1.0

• Lasso (ℓ1 ) penalties have powerful statistical and

• New fast algorithm for lasso- Pathwise coordinate descent

Algorithms for the lasso

• Standard convex optimizer

Pathwise coordinate descent for the lasso

• Coordinate descent: optimize one parameter (coordinate) at a

• Solution is the soft-thresholded estimate

• like skiing to the bottom of a hill, going north-south, east-west,

A brief history of coordinate descent for the lasso

• 1997: Tibshirani’s student Wenjiang Fu at University of

Pathwise coordinate descent for the lasso

• Pathwise coordinate descent can be generalized to many other

• Outcome Y = 0 or 1; Logistic regression model

• Criterion is binomial log-likelihood +absolute value penalty

• Outcome Y = 0 or 1; Logistic regression model

• Criterion is binomial log-likelihood +absolute value penalty

Microarray classification: 16,000 genes, 144 training samples 54 test

1. Nearest shrunken centroids 35 (5) 17 6520

Near Isotonic regression

Ryan Tibshirani, Holger Hoefling, Rob Tibshirani (2010)

with x+ indicating the positive part, x+ = x · 1(x > 0).

Near-isotonic regression- continued

• Convex problem. Solution path β̂i = yi at λ = 0 and

How about using coordinate descent?

When does coordinate descent work?

Paul Tseng (1988), (2001)

Non-differential part of loss function must be separable

Solution: devise a path algorithm

• Simple algorithm that computes the entire path of solutions, a

Global warming data

1850 1900 1950 2000 1850 1900 1950 2000

The matrix completion problem

• Data Xm×n , for which only a relatively small number of entries

• The following seemingly small modification to (1)

makes the problem convex [Faz02]. Here kZk∗ is the nuclear

Nuclear norm is like L1 norm for matrices

1. impute the missing data with some initial values

• Define a matrix PΩ (X) (with dimension n × m)

which is a projection of the matrix X onto the observed entries.

Sλ (W ) ≡ U Dλ V ′ with Dλ = diag [(d1 − λ)+ , . . . , (dr − λ)+ ] , (5)

where U DV ′ is the singular value decomposition of W ,

1. Initialize Z old = 0 and create a decreasing grid Λ of values

It X is sparse, then at each step the non-sparse matrix has the

X = XSP (Sparse) + XLR (Low Rank) (6)

Can apply Lanczos methods to compute the SVD efficiently.

We show this iterative algorithm converges to the solution to

(m, n) |Ω| true rank SNR effective rank time(s)

(3 × 104 , 104 ) 104 15 1 (13, 47, 80) (41.9, 124.7, 305.8)

(105 , 105 ) 105 15 10 (18, 80) (202, 1840)

(5 × 105 , 5 × 105 ) 104 15 10 11 628.14

(5 × 105 , 5 × 105 ) 105 15 1 (3, 11, 52) (341.9, 823.4, 4810.75)

(106 , 106 ) 105 15 1 80 8906

50% missing entries with SNR=1, true rank =10

• lasso penalties are useful for fitting a wide variety of models to

Ongoing work in lasso/sparsity

• grouped lasso (Yuan and Lin) and many variations (Peng,

• develop tools and theory that allow these methods to be used

You might also like