Lecture 3

Posted on 2025-02-23 Edited on 2025-02-25 In 机器学习

线性回归

Linear Regression

正规方程 (Normal Equation)

对于线性回归模型 $h_\theta(x) = \theta^T x$, 通过最小化平方误差损失函数 $J(\theta) = \frac{1}{2} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2$, 可得闭式解: $\theta = (X^T X)^{-1} X^T y$
其中 $X$ 为设计矩阵, $y$ 为观测值向量。

局部加权线性回归 (Locally Weighted Linear Regression, LWLR)

核心思想: 为每个查询点 $x$ 赋予邻近样本更高的权重, 通过加权平方误差进行局部拟合:
$$
\min_\theta \sum_{i=1}^m w^{(i)}(x) (h_\theta(x^{(i)}) - y^{(i)})^2
$$
权重函数: 常用高斯核权重:
$$
w^{(i)}(x) = \exp\left(-\frac{(x^{(i)} - x)^T (x^{(i)} - x)}{2\tau^2}\right)
$$
其中 $\tau$ 为带宽参数, 控制权重衰减速度。
特点:
- 非参数方法, 计算复杂度随数据量增加而显著上升。
- 难以外推至训练数据范围外的区域。
- 适用于低维数据（n=2,3,4）且样本量适中（数百至数千）。

Logistic Regression

模型假设

假设函数（Sigmoid函数）:
$$
h_\theta(x) = g(\theta^T x) = \frac{1}{1+e^{-\theta^T x}}
$$
概率解释:
$$
\begin{aligned}
p(y=1|x;\theta) &= h_\theta(x) \
p(y=0|x;\theta) &= 1 - h_\theta(x) \
\end{aligned}
$$
统一形式:
$$
p(y|x;\theta) = h_\theta(x)^y (1-h_\theta(x))^{1-y}
$$

参数估计

极大似然估计 (MLE)

似然函数:
$$
L(\theta) = \prod_{i=1}^m h_\theta(x^{(i)})^{y^{(i)}} (1-h_\theta(x^{(i)}))^{1-y^{(i)}}
$$
对数似然函数:
$$
l(\theta) = \sum_{i=1}^m \left[ y^{(i)} \log h_\theta(x^{(i)}) + (1-y^{(i)}) \log (1-h_\theta(x^{(i)})) \right]
$$
梯度上升更新:
$$
\theta := \theta + \alpha \sum_{i=1}^m \left( y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}
$$
（注: 使用梯度上升最大化对数似然, 若定义损失函数为负对数似然, 则采用梯度下降）

牛顿法

更新公式:
$$
\theta := \theta - H^{-1} \nabla_\theta l(\theta)
$$
Hessian矩阵:
$$
H = -\sum_{i=1}^m h_\theta(x^{(i)}) (1-h_\theta(x^{(i)})) x^{(i)} x^{(i)T}
$$
（注: Hessian矩阵负定, 保证收敛到极大值点）
优缺点:
- 优点: 二次收敛速度, 迭代次数少。
- 缺点: 高维时计算 $H^{-1}$ 复杂度为 $O(n^3)$, 适用于维度较低（n < 50）的情况。

模型特性

决策边界: 线性决策边界由 $\theta^T x = 0$ 定义。
与线性回归对比:
- 逻辑回归用于分类, 输出概率；线性回归用于回归, 输出连续值。
- 逻辑回归无闭式解, 需迭代优化；线性回归可通过正规方程直接求解。
正则化:
- 加入L1/L2正则项防止过拟合:
  $$
  J(\theta) = -l(\theta) + \lambda |\theta|^2
  $$
- 更新公式中增加正则化梯度项: $\theta_j := \theta_j - \alpha \left( \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} + \lambda \theta_j \right)$

最小二乘法与最大似然

正态误差假设: 若误差 $\epsilon \sim \mathcal{N}(0, \sigma^2)$ 且独立同分布, 则最小二乘估计等价于极大似然估计。
高斯-马尔可夫定理: 在误差满足零均值、同方差、无自相关且与自变量不相关时, 最小二乘估计为最佳线性无偏估计（BLUE）。

多分类扩展

Softmax回归: 对K个类别, 假设函数为:
$$
h_\theta(x)k = \frac{e^{\theta_k^T x}}{\sum{j=1}^K e^{\theta_j^T x}}
$$
参数估计: 通过极大化对数似然, 梯度更新类似二元逻辑回归。

0%