@@ -1049,6 +1049,18 @@ from sklearn.preprocessing import StandardScaler
10491049 - 参数估计:![ $$ {u_1},{u_2}, \cdots ,{u_n};\sigma _1^2,\sigma _2^2 \cdots ,\sigma _n^2 $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%7Bu_1%7D%2C%7Bu_2%7D%2C%20%5Ccdots%20%2C%7Bu_n%7D%3B%5Csigma%20_1%5E2%2C%5Csigma%20_2%5E2%20%5Ccdots%20%2C%5Csigma%20_n%5E2%24%24 )
10501050 - 计算` p(x) ` ,若是` P(x)<ε ` 则认为异常,其中` ε ` 为我们要求的概率的临界值` threshold `
10511051- 这里只是** 单元高斯分布** ,假设了` feature ` 之间是独立的,下面会讲到** 多元高斯分布** ,会自动捕捉到` feature ` 之间的关系
1052+ - ** 参数估计** 实现代码
1053+ ```
1054+ # 参数估计函数(就是求均值和方差)
1055+ def estimateGaussian(X):
1056+ m,n = X.shape
1057+ mu = np.zeros((n,1))
1058+ sigma2 = np.zeros((n,1))
1059+
1060+ mu = np.mean(X, axis=0) # axis=0表示列,每列的均值
1061+ sigma2 = np.var(X,axis=0) # 求每列的方差
1062+ return mu,sigma2
1063+ ```
10521064
10531065### 3、评价` p(x) ` 的好坏,以及` ε ` 的选取
10541066- 对** 偏斜数据** 的错误度量
@@ -1064,6 +1076,56 @@ from sklearn.preprocessing import StandardScaler
10641076
10651077- ` ε ` 的选取
10661078 - 尝试多个` ε ` 值,使` F1Score ` 的值高
1079+ - 实现代码
1080+ ```
1081+ # 选择最优的epsilon,即:使F1Score最大
1082+ def selectThreshold(yval,pval):
1083+ '''初始化所需变量'''
1084+ bestEpsilon = 0.
1085+ bestF1 = 0.
1086+ F1 = 0.
1087+ step = (np.max(pval)-np.min(pval))/1000
1088+ '''计算'''
1089+ for epsilon in np.arange(np.min(pval),np.max(pval),step):
1090+ cvPrecision = pval<epsilon
1091+ tp = np.sum((cvPrecision == 1) & (yval == 1)).astype(float) # sum求和是int型的,需要转为float
1092+ fp = np.sum((cvPrecision == 1) & (yval == 0)).astype(float)
1093+ fn = np.sum((cvPrecision == 1) & (yval == 0)).astype(float)
1094+ precision = tp/(tp+fp) # 精准度
1095+ recision = tp/(tp+fn) # 召回率
1096+ F1 = (2*precision*recision)/(precision+recision) # F1Score计算公式
1097+ if F1 > bestF1: # 修改最优的F1 Score
1098+ bestF1 = F1
1099+ bestEpsilon = epsilon
1100+ return bestEpsilon,bestF1
1101+ ```
1102+
1103+ ### 4、选择使用什么样的feature(单元高斯分布)
1104+ - 如果一些数据不是满足高斯分布的,可以变化一下数据,例如` log(x+C),x^(1/2) ` 等
1105+ - 如果` p(x) ` 的值无论异常与否都很大,可以尝试组合多个` feature ` ,(因为feature之间可能是有关系的)
1106+
1107+ ### 5、多元高斯分布
1108+ - 单元高斯分布存在的问题
1109+ - 如下图,红色的点为异常点,其他的都是正常点(比如CPU和memory的变化)
1110+ ![ enter description here] [ 50 ]
1111+ - x1对应的高斯分布如下:
1112+ ![ enter description here] [ 51 ]
1113+ - x2对应的高斯分布如下:
1114+ ![ enter description here] [ 52 ]
1115+ - 可以看出对应的p(x1)和p(x2)的值变化并不大,就不会认为异常
1116+ - 因为我们认为feature之间是相互独立的,所以如上图是以** 正圆** 的方式扩展
1117+ - 多元高斯分布
1118+ - ![ $$ x \in {R^n} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24x%20%5Cin%20%7BR%5En%7D%24%24 ) ,并不是建立` p(x1),p(x2)...p(xn) ` ,而是统一建立` p(x) `
1119+ - 其中参数:![ $$ \mu \in {R^n},\Sigma \in {R^{n \times {\rm{n}}}} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24%5Cmu%20%5Cin%20%7BR%5En%7D%2C%5CSigma%20%5Cin%20%7BR%5E%7Bn%20%5Ctimes%20%7B%5Crm%7Bn%7D%7D%7D%7D%24%24 ) ,` Σ ` 为** 协方差矩阵**
1120+ - ![ $$ p(x) = {1 \over {{{(2\pi )}^{{n \over 2}}}|\Sigma {|^{{1 \over 2}}}}}{e^{ - {1 \over 2}{{(x - u)}^T}{\Sigma ^{ - 1}}(x - u)}} $$ ] ( http://latex.codecogs.com/png.latex?%5Cfn_cm%20%24%24p%28x%29%20%3D%20%7B1%20%5Cover%20%7B%7B%7B%282%5Cpi%20%29%7D%5E%7B%7Bn%20%5Cover%202%7D%7D%7D%7C%5CSigma%20%7B%7C%5E%7B%7B1%20%5Cover%202%7D%7D%7D%7D%7D%7Be%5E%7B%20-%20%7B1%20%5Cover%202%7D%7B%7B%28x%20-%20u%29%7D%5ET%7D%7B%5CSigma%20%5E%7B%20-%201%7D%7D%28x%20-%20u%29%7D%7D%24%24 )
1121+ - 同样,` |Σ| ` 越小,` p(x) ` 越尖
1122+ - 例如:
1123+ ![ enter description here] [ 53 ] ,
1124+ 表示x1,x2** 正相关** ,即x1越大,x2也就越大
1125+ ![ enter description here] [ 54 ]
1126+ 若:
1127+ ![ enter description here] [ 55 ] ,
1128+ 表示x1,x2** 负相关**
10671129
10681130
10691131 [ 1 ] : ./images/LinearRegression_01.png " LinearRegression_01.png "
@@ -1114,4 +1176,10 @@ from sklearn.preprocessing import StandardScaler
11141176 [ 46 ] : ./images/PCA_06.png " PCA_06.png "
11151177 [ 47 ] : ./images/PCA_07.png " PCA_07.png "
11161178 [ 48 ] : ./images/PCA_08.png " PCA_08.png "
1117- [ 49 ] : ./images/AnomalyDetection.png " AnomalyDetection.png "
1179+ [ 49 ] : ./images/AnomalyDetection_01.png " AnomalyDetection_01.png "
1180+ [ 50 ] : ./images/AnomalyDetection_04.png " AnomalyDetection_04.png "
1181+ [ 51 ] : ./images/AnomalyDetection_02.png " AnomalyDetection_02.png "
1182+ [ 52 ] : ./images/AnomalyDetection_03.png " AnomalyDetection_03.png "
1183+ [ 53 ] : ./images/AnomalyDetection_05.png " AnomalyDetection_05.png "
1184+ [ 54 ] : ./images/AnomalyDetection_07.png " AnomalyDetection_07.png "
1185+ [ 55 ] : ./images/AnomalyDetection_06.png " AnomalyDetection_06.png "
0 commit comments