《PyTorch深度学习实践》12. 循环神经网络(基础篇)

循环神经网络(基础篇)

RNN适合用来解决序列问题

image-20200926142919033

RNN Cell和RNN

image-20200926144353929

1
2
cell = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers) # num_layers指RNN的隐层数目
out, hidden = cell(inputs, hidden)

具体输入输出的说明:

image-20200926151359710

num_layers=3时的情况:

image-20200926151449627

RNN的输入输出向量表示

采用one-hot向量来表示文本

image-20200926153240847

对于每个RNN Cell的损失,可以用交叉熵来计算:

image-20200926153640783

代码表示

采用RNN Cell

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size):
super(Model, self).__init__()
# self.num_layers = num_layers
self.batch_size = batch_size
self.input_size = input_size
self.hidden_size = hidden_size
self.rnncell = torch.nn.RNNCell(input_size=self.input_size,
hidden_size=self.hidden_size)

def forward(self, input, hidden):
hidden = self.rnncell(input, hidden)
return hidden

def init_hidden(self):
return torch.zeros(self.batch_size, self.hidden_size)

net = Model(input_size, hidden_size, batch_size)

训练过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)

for epoch in range(15):
loss = 0
optimizer.zero_grad()
hidden = net.init_hidden()
print('Predicted string: ', end='')
for input, label in zip(inputs, labels): # Shape of inputs: (seqLen, batchSize, inputSize)
# Shape of input: (batchSize, inputSize)
hidden = net(input, hidden)
loss += criterion(hidden, label) # 注意:这里没有用.item(),而是直接构造的计算图
# 因为这里算的loss是多个输出联合的loss,具体如下图
_, idx = hidden.max(dim=1)
print(idx2char[idx.item()], end='')
loss.backward()
optimizer.step()
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))

image-20200926163316495

直接采用RNN

相比于使用RNN Cell要简化了不少,主要是不需要手动循环处理每个输入了,而是直接给出开头的$h$和全部的输入,一次调用即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
super(Model, self).__init__()
self.num_layers = num_layers
self.batch_size = batch_size
self.input_size = input_size
self.hidden_size = hidden_size
self.rnn= torch.nn.RNN(input_size=self.input_size,
hidden_size=self.hidden_size,
num_layers=num_layers)

def forward(self, input):
hidden = torch.zeros(self.num_layers,
self.batch_size,
self.hidden_size)
out, _ = self.rnn(input, hidden)
return out.view(-1, self.hidden_size)

net = Model(input_size, hidden_size, batch_size, num_layers)

训练过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)

for epoch in range(15):
optimizer.zero_grad()
outputs = net(inputs) # Size of Inputs: (seqLen, batchSize, inputSize
loss = criterion(outputs, labels) # Size of Outputs: (seqLen * batchSize * 1)
loss.backward()
optimizer.step()

_, idx = outputs.max(dim=1)
idx = idx.data.numpy()
print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='')
print(', Epoch [%d/15] loss = %.3f' % (epoch + 1, loss.item()))

Embedding

One-hot的一些问题:

  • 维度过高
  • 矩阵稀疏
  • 硬编码

解决方案:Embedding

加入Embedding后的网络:

image-20200926163655344

其中Linear Layer是用来保证输出和label的维度一致的。

添加了Embedding层和线性层的模型代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.emb = torch.nn.Embedding(input_size, embedding_size)
self.rnn = torch.nn.RNN(input_size=embedding_size,
hidden_size=hidden_size,
num_layers=num_layers,
batch_first=True)
self.fc = torch.nn.Linear(hidden_size, num_class)

def forward(self, x):
hidden = torch.zeros(num_layers, x.size(0), hidden_size)
x = self.emb(x) # (batch, seqLen, embeddingSize)
x, _ = self.rnn(x, hidden)
x = self.fc(x)
return x.view(-1, num_class)

net = Model()

LSTM和GRU

image-20200926170043773

image-20200926170400613

GRU运算效率要高一些

在Colab上运行

课程来源:《PyTorch深度学习实践》完结合集