二元交叉熵损失(logistic 损失)定义如下:
Llogistic(y^,y)=−ylogy^−(1−y)log(1−y^)
L_{\text{logistic}}(\hat{y},y)
=-ylog\hat{y}-(1-y)log(1-\hat{y})
Llogistic(y^,y)=−ylogy^−(1−y)log(1−y^)
其中
y∈{0,1}y\in\{0,1\}y∈{0,1}, y^=σ(yˉ)\hat{y}=\sigma(\bar{y})y^=σ(yˉ), σ(yˉ)=11+e−yˉ\sigma(\bar{y})=\dfrac{1}{1+e^{-\bar{y}}}σ(yˉ)=1+e−yˉ1, ∂σ∂yˉ=σ(1−σ)\dfrac{\partial \sigma}{\partial \bar{y}}=\sigma(1-\sigma)∂yˉ∂σ=σ(1−σ),
yˉ=w⋅x+b=∑jwjxj+b\bar{y}=\bold{w}\cdot \bold{x} + b =\displaystyle\sum_jw_jx_j+byˉ=w⋅x+b=j∑wjxj+b, ∂yˉ∂wj=xj\dfrac{\partial \bar{y}}{\partial w_j}=x_j∂wj∂yˉ=xj
样本x=(x1,⋯ ,xj,⋯ ,xn)\bold{x}=(x_1,\cdots,x_j,\cdots,x_n)x=(x1,⋯,xj,⋯,xn)共包含n个特征,权重向量w=(w1,⋯ ,wj,⋯ ,wn)\bold{w}=(w_1,\cdots,w_j,\cdots,w_n)w=(w1,⋯,wj,⋯,wn)共包含n个权重,与特征一一对应,则
∂L∂wj=−y1σ∂σ∂yˉ∂yˉ∂wj−(1−y)−11−σ∂σ∂yˉ∂yˉ∂wj=σ−yσ(1−σ)∂σ∂yˉ∂yˉ∂wj=σ−yσ(1−σ)σ(1−σ)xj=(σ−y)xj=[σ(w⋅x+b)−y]xj
\begin{aligned}
\dfrac{\partial L}{\partial w_j}
&=
-y \dfrac{1}{\sigma}\dfrac{\partial \sigma}{\partial \bar{y}}\dfrac{\partial \bar{y}}{\partial w_j}
-(1-y)\dfrac{-1}{1-\sigma}\dfrac{\partial \sigma}{\partial \bar{y}}\dfrac{\partial \bar{y}}{\partial w_j} \\
&=\dfrac{\sigma - y}{\sigma(1-\sigma)}\dfrac{\partial \sigma}{\partial \bar{y}}\dfrac{\partial \bar{y}}{\partial w_j} \\
&=\dfrac{\sigma - y}{\sigma(1-\sigma)}\sigma(1-\sigma)x_j \\
&=(\sigma -y)x_j \\
&=[\sigma(\bold{w}\cdot \bold{x} + b)-y]x_j
\end{aligned}
∂wj∂L=−yσ1∂yˉ∂σ∂wj∂yˉ−(1−y)1−σ−1∂yˉ∂σ∂wj∂yˉ=σ(1−σ)σ−y∂yˉ∂σ∂wj∂yˉ=σ(1−σ)σ−yσ(1−σ)xj=(σ−y)xj=[σ(w⋅x+b)−y]xj
二元交叉熵损失梯度推导
Python3.8
Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本
您可能感兴趣的与本文相关的镜像
Python3.8
Conda
Python
Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

被折叠的 条评论
为什么被折叠?



