PCA中total variance的解释

探讨了PCA中如何选择目标维度,通过保留95%的Total Variance来确定最佳降维数量。解释了Total Variance的概念及其在PCA中的意义。
AI助手已提取文章相关产品:

             最近在看LMNN的论文时, 发现作者做实验的起始步骤中首先用PCA对高纬度sample features进行降维处理时, 提到如何选取目标低纬度的值, 其提供的方法是: "account for 95% of its total variance." 


这里的total variance是啥意思呢? google了一下, 以下这篇文章有很好的解释:

http://support.sas.com/publishing/pubcat/chaps/55129.pdf


其中有这样一段话:

What is meant by “total variance” in the data set?  To understand the meaning of “total
variance” as it is used in a principal component analysis, remember that the observed
variables are standardized in the course of the analysis.  This means that each variable is
transformed so that it has a mean of zero and a variance of one.  The “total variance” in the
data set is simply the sum of the variances of these observed variables.  Because they have
been standardized to have a variance of one, each observed variable contributes one unit of
variance to the “total variance” in the data set.  Because of this, the total variance in a
principal component analysis will always be equal to the number of observed variables
being analyzed.  For example, if seven variables are being analyzed, the total variance will
equal seven.  The components that are extracted in the analysis will partition this variance: 
perhaps the first component will account for 3.2 units of total variance; perhaps the second
component will account for 2.1 units.  The analysis continues in this way until all of the
variance in the data set has been accounted for.


其中指出: the total variance in a principal component analysis will always be equal to the number of observed  variables。 从后面提供的例子也可以知道, 其考虑的是特征根排序后的权值大小, 比如原始输入的feature space维度为20, 经过PCA后可以计算所有的20个特征根(降序排列), 然后找出前N个总和刚好大于所有20个特征根总和的95%, 此时的N就是所需要降维的目标值。

您可能感兴趣的与本文相关内容

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值