Homework 3
2. Gaussian Classification
Let f X ∣ Y = C i ( x ) ∼ N ( μ i , σ 2 ) f_{X\mid Y=C_i}(x) \sim \mathcal{N}(\mu _i,\sigma^2) f X ∣ Y = C i ( x ) ∼ N ( μ i , σ 2 ) for a two-class, one-dimensional (d = 1 d = 1 d = 1 ) classification problem with classes C 1 C_1 C 1 and C 2 C_2 C 2 , P ( Y = C 1 ) = P ( Y = C 2 ) = 1 / 2 P(Y = C_1) = P(Y = C_2) = 1/2 P ( Y = C 1 ) = P ( Y = C 2 ) = 1/2 , and μ 2 > μ 1 \mu _2 > \mu _1 μ 2 > μ 1 .
Q 1 Q1 Q 1
Find the Bayes optimal decision boundary and the corresponding Bayes decision rule by finding the point(s) at which the posterior probabilities are equal. Use the 0-1 loss function.
A n s w e r : Answer: A n s w er : 对于0-1损失函数,贝叶斯最优决策规则是:对于一个给定的观测值 x x x ,我们应该选择后验概率 P ( Y = C i ∣ X = x ) P(Y=C_i|X=x) P ( Y = C i ∣ X = x ) 最大的那个类别C i C_i C i 。而决策边界即为两者的后验概率相等:
P ( Y = C 1 ∣ X = x ) = P ( Y = C 2 ∣ X = x ) P(Y=C_1|X=x) = P(Y=C_2|X=x) P ( Y = C 1 ∣ X = x ) = P ( Y = C 2 ∣ X = x )
也即:
p ( x ∣ Y = C 1 ) P ( Y = C 1 ) = p ( x ∣ Y = C 2 ) P ( Y = C 2 ) p(x|Y=C_1) P(Y=C_1) = p(x|Y=C_2) P(Y=C_2) p ( x ∣ Y = C 1 ) P ( Y = C 1 ) = p ( x ∣ Y = C 2 ) P ( Y = C 2 )
由于两个类别的先验概率相同,有:
p ( x ∣ Y = C 1 ) = p ( x ∣ Y = C 2 ) p(x|Y=C_1) = p(x|Y=C_2) p ( x ∣ Y = C 1 ) = p ( x ∣ Y = C 2 )
带入正态分布PDF中:
exp ( − ( x − μ 1 ) 2 2 σ 2 ) = exp ( − ( x − μ 2 ) 2 2 σ 2 ) ⇒ ln ( exp ( − ( x − μ 1 ) 2 2 σ 2 ) ) = ln ( exp ( − ( x − μ 2 ) 2 2 σ 2 ) ) ⇒ − ( x − μ 1 ) 2 2 σ 2 = − ( x − μ 2 ) 2 2 σ 2 ⇒ ( x − μ 1 ) 2 = ( x − μ 2 ) 2 ⇒ x 2 − 2 x μ 1 + μ 1 2 = x 2 − 2 x μ 2 + μ 2 2 ⇒ − 2 x μ 1 + μ 1 2 = − 2 x μ 2 + μ 2 2 ⇒ 2 x μ 2 − 2 x μ 1 = μ 2 2 − μ 1 2 ⇒ 2 x ( μ 2 − μ 1 ) = ( μ 2 − μ 1 ) ( μ 2 + μ 1 ) ⇒ 2 x = μ 1 + μ 2 ( since μ 2 > μ 1 ) ⇒ x = μ 1 + μ 2 2 \begin{aligned}
&\exp\left(-\frac{(x-\mu _1)^2}{2\sigma^2}\right)=\exp\left(-\frac{(x-\mu _2)^2}{2\sigma^2}\right) \\
&\Rightarrow \ln\left(\exp\left(-\frac{(x-\mu _1)^2}{2\sigma^2}\right)\right)
= \ln\left(\exp\left(-\frac{(x-\mu _2)^2}{2\sigma^2}\right)\right) \\
&\Rightarrow -\frac{(x-\mu _1)^2}{2\sigma^2} = -\frac{(x-\mu _2)^2}{2\sigma^2} \\
&\Rightarrow (x-\mu _1)^2 = (x-\mu _2)^2 \\
&\Rightarrow x^2 - 2x\mu _1 + \mu _1^2 = x^2 - 2x\mu _2 + \mu _2^2 \\
&\Rightarrow -2x\mu _1 + \mu _1^2 = -2x\mu _2 + \mu _2^2 \\
&\Rightarrow 2x\mu _2 - 2x\mu _1 = \mu _2^2 - \mu _1^2 \\
&\Rightarrow 2x(\mu _2 - \mu _1) = (\mu _2 - \mu _1)(\mu _2 + \mu _1) \\
&\Rightarrow 2x = \mu _1 + \mu _2 \quad(\text{since }\mu _2>\mu _1) \\
&\Rightarrow x = \frac{\mu _1 + \mu _2}{2}
\end{aligned} exp ( − 2 σ 2 ( x − μ 1 ) 2 ) = exp ( − 2 σ 2 ( x − μ 2 ) 2 ) ⇒ ln ( exp ( − 2 σ 2 ( x − μ 1 ) 2 ) ) = ln ( exp ( − 2 σ 2 ( x − μ 2 ) 2 ) ) ⇒ − 2 σ 2 ( x − μ 1 ) 2 = − 2 σ 2 ( x − μ 2 ) 2 ⇒ ( x − μ 1 ) 2 = ( x − μ 2 ) 2 ⇒ x 2 − 2 x μ 1 + μ 1 2 = x 2 − 2 x μ 2 + μ 2 2 ⇒ − 2 x μ 1 + μ 1 2 = − 2 x μ 2 + μ 2 2 ⇒ 2 x μ 2 − 2 x μ 1 = μ 2 2 − μ 1 2 ⇒ 2 x ( μ 2 − μ 1 ) = ( μ 2 − μ 1 ) ( μ 2 + μ 1 ) ⇒ 2 x = μ 1 + μ 2 ( since μ 2 > μ 1 ) ⇒ x = 2 μ 1 + μ 2
决策边界为两个均值的中点。
Q 2 Q2 Q 2
Suppose the decision boundary for your classifier is x = b x = b x = b . The Bayes error is the probability of misclassification, namely
P e = P ( ( C 1 misclassified as C 2 ) ∪ ( C 2 misclassified as C 1 ) ) P_e = P\big((C_1\ \text{misclassified as}\ C_2)\cup (C_2\ \text{misclassified as}\ C_1)\big) P e = P ( ( C 1 misclassified as C 2 ) ∪ ( C 2 misclassified as C 1 ) )
Show that the Bayes error associated with this decision rule, in terms of b b b , is
e ( b ) = 1 2 2 π σ ( ∫ − ∞ b exp ( − ( x − μ 2 ) 2 2 σ 2 ) d x + ∫ b ∞ exp ( − ( x − μ 1 ) 2 2 σ 2 ) d x ) e(b)=\frac{1}{2\sqrt{2\pi}\,\sigma}\left(
\int_{-\infty}^{b}\exp\left(-\frac{(x-\mu _2)^2}{2\sigma^2}\right)\,dx
+\int_{b}^{\infty}\exp\left(-\frac{(x-\mu _1)^2}{2\sigma^2}\right)\,dx
\right) e ( b ) = 2 2 π σ 1 ( ∫ − ∞ b exp ( − 2 σ 2 ( x − μ 2 ) 2 ) d x + ∫ b ∞ exp ( − 2 σ 2 ( x − μ 1 ) 2 ) d x )
p r o o f : proof: p roo f :
P e ( b ) = P ( decide C 2 ∣ actual C 1 ) P ( actual C 1 ) + P ( decide C 1 ∣ actual C 2 ) P ( actual C 2 ) P_e(b)=P\big(\text{decide }C_2\mid\text{actual }C_1\big)\,P(\text{actual }C_1)
+P\big(\text{decide }C_1\mid\text{actual }C_2\big)\,P(\text{actual }C_2) P e ( b ) = P ( decide C 2 ∣ actual C 1 ) P ( actual C 1 ) + P ( decide C 1 ∣ actual C 2 ) P ( actual C 2 )
将先验概率带入这个公式:
P e ( b ) = 1 2 P ( X > b ∣ Y = C 1 ) + 1 2 P ( X < b ∣ Y = C 2 ) (1) P_e(b) = \frac{1}{2} P(X > b | Y=C_1) + \frac{1}{2} P(X < b | Y=C_2) \tag{1} P e ( b ) = 2 1 P ( X > b ∣ Y = C 1 ) + 2 1 P ( X < b ∣ Y = C 2 ) ( 1 )
决策边界是 x = b x=b x = b 并且 μ 2 > μ 1 \mu _2 > \mu _1 μ 2 > μ 1 ,自然的分类规则是:
decide { C 2 , x > b C 1 , x < b \text{decide}\;
\begin{cases}
C_2, & x > b\\[4pt]
C_1, & x < b
\end{cases} decide { C 2 , C 1 , x > b x < b
P ( decide C 2 ∣ actual C 1 ) P ( actual C 1 ) P\big(\text{decide }C_2\mid\text{actual }C_1\big)\,P(\text{actual }C_1) P ( decide C 2 ∣ actual C 1 ) P ( actual C 1 ) 的实际概率为P ( X > b ∣ Y = C 1 ) P(X > b | Y=C_1) P ( X > b ∣ Y = C 1 ) ,这可以通过下面的积分得到:
∫ b ∞ p ( x ∣ Y = C 1 ) d x = ∫ b ∞ 1 2 π σ 2 exp ( − ( x − μ 1 ) 2 2 σ 2 ) d x \int_b^\infty p(x|Y=C_1) dx = \int_b^\infty \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu _1)^2}{2\sigma^2}\right) dx ∫ b ∞ p ( x ∣ Y = C 1 ) d x = ∫ b ∞ 2 π σ 2 1 exp ( − 2 σ 2 ( x − μ 1 ) 2 ) d x
P ( decide C 1 ∣ actual C 2 ) P ( actual C 2 ) P\big(\text{decide }C_1\mid\text{actual }C_2\big)\,P(\text{actual }C_2) P ( decide C 1 ∣ actual C 2 ) P ( actual C 2 ) 的实际概率为P ( X < b ∣ Y = C 2 ) P(X < b | Y=C_2) P ( X < b ∣ Y = C 2 ) ,这可以通过下面的积分得到:
∫ − ∞ b p ( x ∣ Y = C 2 ) d x = ∫ − ∞ b 1 2 π σ 2 exp ( − ( x − μ 2 ) 2 2 σ 2 ) d x \int_{-\infty}^b p(x|Y=C_2) dx = \int_{-\infty}^b \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu _2)^2}{2\sigma^2}\right) dx ∫ − ∞ b p ( x ∣ Y = C 2 ) d x = ∫ − ∞ b 2 π σ 2 1 exp ( − 2 σ 2 ( x − μ 2 ) 2 ) d x
带入 ( 1 ) (1) ( 1 ) ,整理即得:
P e ( b ) = 1 2 2 π σ ( ∫ b ∞ exp ( − ( x − μ 1 ) 2 2 σ 2 ) d x + ∫ − ∞ b exp ( − ( x − μ 2 ) 2 2 σ 2 ) d x ) P_e(b) = \frac{1}{2\sqrt{2\pi}\sigma} \left( \int_b^\infty \exp\left(-\frac{(x-\mu _1)^2}{2\sigma^2}\right) dx + \int_{-\infty}^b \exp\left(-\frac{(x-\mu _2)^2}{2\sigma^2}\right)
dx \right) P e ( b ) = 2 2 π σ 1 ( ∫ b ∞ exp ( − 2 σ 2 ( x − μ 1 ) 2 ) d x + ∫ − ∞ b exp ( − 2 σ 2 ( x − μ 2 ) 2 ) d x )