线性回归图

假设(基于二维)

训练集

\[ \begin{bmatrix} 1&x_{11}&{\cdots}&x_{1n}\\ 1&x_{21}&{\cdots}&x_{2n}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ 1&x_{m1}&{\cdots}&x_{mn}\\ \end{bmatrix}* \begin{bmatrix} \theta_{0}\\ \theta_{1}\\ {\vdots}\\ \theta_{n}\\ \end{bmatrix}= \begin{bmatrix} y_{1}\\ y_{2}\\ {\vdots}\\ y_{n}\\ \end{bmatrix} \]

表达式

\[ h_\theta(x)=\theta_0+\theta_1x_1+...+\theta_nx_n...x为向量 \]

定义代价函数

\[ J_\theta(\theta_0,\theta_1,\theta_2,...,\theta_n)=\frac{1}{2m}\sum_{i=1}^{m}({h_\theta(x^i)-y(x^i)})^2 \]

梯度下降法

\[ min_{\theta_0...\theta_n}J(\theta_0,\theta_1,\theta_2,...,\theta_n) \]

\[ \theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1,\theta_2,...,\theta_n)...\alpha为学习率 \]

\[ \frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1,\theta_2,...,\theta_n)=\frac{1}{m}\sum_{i=1}^{m}({h_\theta(x^i)-y(x^i)})x^i \]

带入得

\[ \theta_j=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m}({h_\theta(x^i)-y(x^i)})x^{ij} \]

梯度下降法实现

定义精度为P

\[ 当\theta_j^n-\theta_j^{n-1}<=P时递归停止 \]

初始化

\[ (\theta_0,\theta_1,\theta_2,...,\theta_n)为(0,0,0,...,0)或者其他 \]

执行递归

\[ \theta_j一步一步变化直到达到J(\theta)最小值 \]

为什么可以实现?

在这里插入图片描述

该图为取\((\theta_0...\theta_{j-1},\theta_{j+1}...\theta_n)\)为固定值时\(J(\theta)\)的切片 当

\[ \theta_j=x_1时\frac{\partial}{\partial\theta_j}J(\theta)<0故\theta_j增加逐渐靠近h,同时精度P决定递归是否停止 \]

\(\theta_j=x_2\)时同理

学习率\(\alpha\)

过大导致可能不收敛

在这里插入图片描述

过小导致收敛太慢 故多次实验,取合适的\(\alpha\)

\[ \alpha=\begin{bmatrix}0.001&0.01&0.1&1&10\\\end{bmatrix} \]

### 特征缩放

\[ \begin{bmatrix}x_{1i}\\x_{2i}\\{\vdots}\\x_{ni}\\\end{bmatrix}=\begin{bmatrix}1000\\1001\\{\vdots}\\1020\\\end{bmatrix}->1000*\begin{bmatrix}1.000\\1.001\\{\vdots}\\1.020\\\end{bmatrix} \]

\(h_\theta=\theta_0+\theta_1x_1^2+\theta_1x_2^3+...\)

可令

\[ x_1^2=t_1,x_2^3=t_2... \]

即可实现

正则方程实现

\[ \theta=(X^TX)^{-1}X^TY...(X^T为X的转置,X^{-1}为X的逆) \]

d2l 从零实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import torch
import random

# 生成数据
def normal_data(w, b, n):
x = torch.normal(0, 1, (n, len(w)))
y = torch.matmul(x, w) + b
y += torch.normal(0, 0.01, y.shape)
return x, y.reshape((-1, 1))

# 批数据
def batch_data(x, y, batch_size):
n = len(x)
indexs = list(range(n))
random.shuffle(indexs)
for i in range(0, n, batch_size):
batch_indexs = torch.tensor(
indexs[i: min(i+batch_size, n)]
)
yield x[batch_indexs], y[batch_indexs]

# 定义模型
def linreg(w, b, x):
return torch.matmul(x, w) +b

# 损失函数
def squared_loss(y, y_hat):
return (y.reshape(y_hat.shape)-y_hat)**2/2

# sgd
def sgd(params, lr, batch_size):
with torch.no_grad():
for param in params:
param -= lr*param.grad/batch_size
param.grad.zero_()

# 生成数据
n = 1000
w = torch.tensor([4.5, 6])
b = torch.tensor([1.7])
x,y = normal_data(w,b,n)

# 超参数
lr = 0.03
num_epochs = 10
net = linreg
loss = squared_loss

# 初始化
batch_size = 50
w = torch.normal(0, 1, [2, 1], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)
trainer = torch.optim.SGD([w,b], lr=lr)

# 训练
for i in range(num_epochs):
for x_batch, y_batch in batch_data(x, y, batch_size):
l = loss(y_batch, net(w, b, x_batch))
l.sum().backward()
sgd([w, b], lr, batch_size)

l = loss(y, net(w ,b, x))
print('epoch:{},loss:{}'.format(i, l.mean()))
1
2
3
4
5
6
7
8
9
10
11
12
13
epoch:0,loss:5.008117198944092
epoch:1,loss:1.4251900911331177
epoch:2,loss:0.4058946669101715
epoch:3,loss:0.11567537486553192
epoch:4,loss:0.0330023355782032
epoch:5,loss:0.009440700523555279
epoch:6,loss:0.0027248619589954615
epoch:7,loss:0.0008111481438390911
epoch:8,loss:0.0002640637394506484
epoch:9,loss:0.0001081191876437515
w,b:
(tensor([[4.4979], [5.9894]], requires_grad=True),
tensor([1.6971], requires_grad=True))

QA

  1. sgd处为什么-=可以而=不行? -=与zero_()(结尾带下划线的函数)是pytorch的原地操作,不会改变变量的地址
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    import torch

    a = torch.arange(5, dtype=torch.float32, requires_grad=True)
    b = a*a
    b.sum().backward()

    print(id(a))
    a.data -= a.grad
    print(id(a))

    a.grad.zero_()
    print(id(a))

    2076247332328
    2076247332328
    2076247332328

git取消代理

1
2
git config --global --unset http.proxy  
git config --global --unset https.proxy