
-
資料在網路層之間的流動:前向傳播和反向傳播可以看做是張量 Tensor(多維陣列)在網路層之間的流動(前向傳播流動的是輸入輸出,反向傳播流動的是梯度),每個網路層會進行一定的運算,然後將結果輸入給下一層 -
計算損失:銜接前向和反向傳播的中間過程,定義了模型的輸出與真實值之間的差異,用來後續提供反向傳播所需的資訊 -
引數更新:使用計算得到的梯度對網路引數進行更新的一類計算
-
tensor
張量,這個是神經網路中資料的基本單位 -
layer
網路層,負責接收上一層的輸入,進行該層的運算,將結果輸出給下一層,由於 tensor 的流動有前向和反向兩個方向,因此對於每種型別網路層我們都需要同時實現 forward 和 backward 兩種運算 -
loss
損失,在給定模型預測值與真實值之後,該元件輸出損失值以及關於最後一層的梯度(用於梯度回傳) -
optimizer
最佳化器,負責使用梯度更新模型的引數
-
net 元件負責管理 tensor 在 layers 之間的前向和反向傳播,同時能提供獲取引數、設定引數、獲取梯度的介面 -
model 元件負責整合所有元件,形成整個 pipeline。即 net 元件進行前向傳播 -> losses 元件計算損失和梯度 -> net 元件將梯度反向傳播 -> optimizer 元件將梯度更新到引數。

# define model
net = Net([layer1, layer2, ...])
model = Model(net, loss_fn, optimizer)
# training
pred = model.forward(train_X)
loss, grads = model.backward(pred, train_Y)
model.apply_grad(grads)
# inference
test_pred = model.forward(test_X)
tensor
layer
# layer.py
classLayer(object):
def__init__(self, name):
self.name = name
self.params, self.grads =
None
,
None
defforward(self, inputs):
raise
NotImplementedError
defbackward(self, grad):
raise
NotImplementedError
# layer.py
classDense(Layer):
def__init__
(self, num_in, num_out,
w_init=XavierUniformInit
()
,
b_init=ZerosInit
()
)
:
super().__init__(
"Linear"
)
self.params = {
"w"
: w_init([num_in, num_out]),
"b"
: b_init([
1
, num_out])}
self.inputs =
None
defforward(self, inputs):
self.inputs = inputs
return
inputs @ self.params[
"w"
] + self.params[
"b"
]
defbackward(self, grad):
self.grads[
"w"
] = self.inputs.T @ grad
self.grads[
"b"
] = np.sum(grad, axis=
0
)
return
grad @ self.params[
"w"
].T
# layer.py
classActivation(Layer):
"""Base activation layer"""
def__init__(self, name):
super().__init__(name)
self.inputs =
None
defforward(self, inputs):
self.inputs = inputs
return
self.func(inputs)
defbackward(self, grad):
return
self.derivative_func(self.inputs) * grad
deffunc(self, x):
raise
NotImplementedError
defderivative_func(self, x):
raise
NotImplementedError
classReLU(Activation):
"""ReLU activation function"""
def__init__(self):
super().__init__(
"ReLU"
)
deffunc(self, x):
return
np.maximum(x,
0.0
)
defderivative_func(self, x):
return
x >
0.0
net
# net.py
classNet(object):
def__init__(self, layers):
self.layers = layers
defforward(self, inputs):
for
layer
in
self.layers:
inputs = layer.forward(inputs)
return
inputs
defbackward(self, grad):
all_grads = []
for
layer
in
reversed(self.layers):
grad = layer.backward(grad)
all_grads.append(layer.grads)
return
all_grads[::
-1
]
defget_params_and_grads(self):
for
layer
in
self.layers:
yield
layer.params, layer.grads
defget_parameters(self):
return
[layer.params
for
layer
in
self.layers]
defset_parameters(self, params):
for
i, layer
in
enumerate(self.layers):
for
key
in
layer.params.keys():
layer.params[key] = params[i][key]
losses
# loss.py
classBaseLoss(object):
defloss(self, predicted, actual):
raise
NotImplementedError
defgrad(self, predicted, actual):
raise
NotImplementedError
classCrossEntropyLoss(BaseLoss):
defloss(self, predicted, actual):
m = predicted.shape[
0
]
exps = np.exp(predicted - np.max(predicted, axis=
1
, keepdims=
True
))
p = exps / np.sum(exps, axis=
1
, keepdims=
True
)
nll = -np.log(np.sum(p * actual, axis=
1
))
return
np.sum(nll) / m
defgrad(self, predicted, actual):
m = predicted.shape[
0
]
grad = np.copy(predicted)
grad -= actual
return
grad / m
optimizer
# optimizer.py
classBaseOptimizer(object):
def__init__(self, lr, weight_decay):
self.lr = lr
self.weight_decay = weight_decay
defcompute_step(self, grads, params):
step = list()
# flatten all gradients
flatten_grads = np.concatenate(
[np.ravel(v)
for
grad
in
grads
for
v
in
grad.values()])
# compute step
flatten_step = self._compute_step(flatten_grads)
# reshape gradients
p =
0
for
param
in
params:
layer = dict()
for
k, v
in
param.items():
block = np.prod(v.shape)
_step = flatten_step[p:p+block].reshape(v.shape)
_step -= self.weight_decay * v
layer[k] = _step
p += block
step.append(layer)
return
step
def_compute_step(self, grad):
raise
NotImplementedError
classAdam(BaseOptimizer):
def__init__
(self, lr=
0.001
, beta1=
0.9
, beta2=
0.999
,
eps=
1e-8
, weight_decay=
0.0
)
:
super().__init__(lr, weight_decay)
self._b1, self._b2 = beta1, beta2
self._eps = eps
self._t =
0
self._m, self._v =
0
,
0
def_compute_step(self, grad):
self._t +=
1
self._m = self._b1 * self._m + (
1
- self._b1) * grad
self._v = self._b2 * self._v + (
1
- self._b2) * (grad **
2
)
# bias correction
_m = self._m / (
1
- self._b1 ** self._t)
_v = self._v / (
1
- self._b2 ** self._t)
return
-self.lr * _m / (_v **
0.5
+ self._eps)
model
# model.py
classModel(object):
def__init__(self, net, loss, optimizer):
self.net = net
self.loss = loss
self.optimizer = optimizer
defforward(self, inputs):
return
self.net.forward(inputs)
defbackward(self, preds, targets):
loss = self.loss.loss(preds, targets)
grad = self.loss.grad(preds, targets)
grads = self.net.backward(grad)
params = self.net.get_parameters()
step = self.optimizer.compute_step(grads, params)
return
loss, step
defapply_grad(self, grads):
for
grad, (param, _)
in
zip(grads, self.net.get_params_and_grads()):
for
k, v
in
param.items():
param[k] += grad[k]
tinynn
├── core
│ ├── initializer.py
│ ├── layer.py
│ ├── loss.py
│ ├── model.py
│ ├── net.py
│ └── optimizer.py
-
資料集:MNIST(http://yann.lecun.com/exdb/mnist/) -
任務型別:多分類 -
網路結構:三層全連線 INPUT(784) -> FC(400) -> FC(100) -> OUTPUT(10),這個網路接收 的輸入,其中 是每次輸入的樣本數,784 是每張 的影像展平後的向量,輸出維度為 ,其中 是樣本數,10 是對應圖片在 10 個類別上的機率 -
啟用函式:ReLU -
損失函式:SoftmaxCrossEntropy -
optimizer:Adam(lr=1e-3) -
batch_size:128 -
Num_epochs:20
# example/mnist/run.py
net = Net([
Dense(
784
,
400
),
ReLU(),
Dense(
400
,
100
),
ReLU(),
Dense(
100
,
10
)
])
model = Model(net=net, loss=SoftmaxCrossEntropyLoss(), optimizer=Adam(lr=args.lr))
iterator = BatchIterator(batch_size=args.batch_size)
evaluator = AccEvaluator()
for
epoch
in
range(num_ep):
for
batch
in
iterator(train_x, train_y):
# training
pred = model.forward(batch.inputs)
loss, grads = model.backward(pred, batch.targets)
model.apply_grad(grads)
# evaluate every epoch
test_pred = model.forward(test_x)
test_pred_idx = np.argmax(test_pred, axis=
1
)
test_y_idx = np.asarray(test_y)
res = evaluator.evaluate(test_pred_idx, test_y_idx)
print(res)
# tinynn
Epoch 0 {'total_num': 10000, 'hit_num': 9658, 'accuracy': 0.9658}
Epoch 1 {'total_num': 10000, 'hit_num': 9740, 'accuracy': 0.974}
Epoch 2 {'total_num': 10000, 'hit_num': 9783, 'accuracy': 0.9783}
Epoch 3 {'total_num': 10000, 'hit_num': 9799, 'accuracy': 0.9799}
Epoch 4 {'total_num': 10000, 'hit_num': 9805, 'accuracy': 0.9805}
Epoch 5 {'total_num': 10000, 'hit_num': 9826, 'accuracy': 0.9826}
Epoch 6 {'total_num': 10000, 'hit_num': 9823, 'accuracy': 0.9823}
Epoch 7 {'total_num': 10000, 'hit_num': 9819, 'accuracy': 0.9819}
Epoch 8 {'total_num': 10000, 'hit_num': 9820, 'accuracy': 0.982}
Epoch 9 {'total_num': 10000, 'hit_num': 9838, 'accuracy': 0.9838}
Epoch 10 {'total_num': 10000, 'hit_num': 9825, 'accuracy': 0.9825}
Epoch 11 {'total_num': 10000, 'hit_num': 9810, 'accuracy': 0.981}
Epoch 12 {'total_num': 10000, 'hit_num': 9845, 'accuracy': 0.9845}
Epoch 13 {'total_num': 10000, 'hit_num': 9845, 'accuracy': 0.9845}
Epoch 14 {'total_num': 10000, 'hit_num': 9835, 'accuracy': 0.9835}
Epoch 15 {'total_num': 10000, 'hit_num': 9817, 'accuracy': 0.9817}
Epoch 16 {'total_num': 10000, 'hit_num': 9815, 'accuracy': 0.9815}
Epoch 17 {'total_num': 10000, 'hit_num': 9835, 'accuracy': 0.9835}
Epoch 18 {'total_num': 10000, 'hit_num': 9826, 'accuracy': 0.9826}
Epoch 19 {'total_num': 10000, 'hit_num': 9819, 'accuracy': 0.9819}
# Tensorflow 1.13.1
Epoch 0 {'total_num': 10000, 'hit_num': 9591, 'accuracy': 0.9591}
Epoch 1 {'total_num': 10000, 'hit_num': 9734, 'accuracy': 0.9734}
Epoch 2 {'total_num': 10000, 'hit_num': 9706, 'accuracy': 0.9706}
Epoch 3 {'total_num': 10000, 'hit_num': 9756, 'accuracy': 0.9756}
Epoch 4 {'total_num': 10000, 'hit_num': 9722, 'accuracy': 0.9722}
Epoch 5 {'total_num': 10000, 'hit_num': 9772, 'accuracy': 0.9772}
Epoch 6 {'total_num': 10000, 'hit_num': 9774, 'accuracy': 0.9774}
Epoch 7 {'total_num': 10000, 'hit_num': 9789, 'accuracy': 0.9789}
Epoch 8 {'total_num': 10000, 'hit_num': 9766, 'accuracy': 0.9766}
Epoch 9 {'total_num': 10000, 'hit_num': 9763, 'accuracy': 0.9763}
Epoch 10 {'total_num': 10000, 'hit_num': 9791, 'accuracy': 0.9791}
Epoch 11 {'total_num': 10000, 'hit_num': 9773, 'accuracy': 0.9773}
Epoch 12 {'total_num': 10000, 'hit_num': 9804, 'accuracy': 0.9804}
Epoch 13 {'total_num': 10000, 'hit_num': 9782, 'accuracy': 0.9782}
Epoch 14 {'total_num': 10000, 'hit_num': 9800, 'accuracy': 0.98}
Epoch 15 {'total_num': 10000, 'hit_num': 9837, 'accuracy': 0.9837}
Epoch 16 {'total_num': 10000, 'hit_num': 9811, 'accuracy': 0.9811}
Epoch 17 {'total_num': 10000, 'hit_num': 9793, 'accuracy': 0.9793}
Epoch 18 {'total_num': 10000, 'hit_num': 9818, 'accuracy': 0.9818}
Epoch 19 {'total_num': 10000, 'hit_num': 9811, 'accuracy': 0.9811}

-
layer :全連線層、2D 卷積層、 2D反捲積層、MaxPooling 層、Dropout 層、BatchNormalization 層、RNN 層以及 ReLU、Sigmoid、Tanh、LeakyReLU、SoftPlus 等啟用函式 -
loss:SigmoidCrossEntropy、SoftmaxCrossEntroy、MSE、MAE、Huber -
optimizer:RAam、Adam、SGD、RMSProp、Momentum 等最佳化器,並且增加了動態調節學習率 LRScheduler -
實現了 mnist(分類)、nn_paint(迴歸)、DQN(強化學習)、AutoEncoder 和 DCGAN (無監督)等常見模型。見 tinynn/examples:https://github.com/borgwang/tinynn/tree/master/examples
再看非目標類別所在維度
-
Deep Learning, Goodfellow, et al. (2016) -
Joel Grus – Livecoding Madness – Let's Build a Deep Learning Library -
TensorFlow Documentation -
PyTorch Documentation

關於我們
