梯度下降

\(w_{t+1} = w_t-\eta \bigtriangledown w_t\)

随机梯度下降SGD

动量法

Nesterov

AdaGrad

RMSprop

Adam

https://pytorch.org/docs/stable/generated/torch.optim.Adam.html

学习率调整策略StepLR