Loading...

报错说明

1
2
3
4
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

解决方法

在代码最上面加上:

1
2
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

然后运行文件,即可查找到错误的位置:
错误问题大概率分为以下几种情况:

  • 分类问题:标签个数不对应
  • loss为nan
  • 超出索引范围等

我的是因为索引超过了max_length才导致报错的,索引将max_length增大即可:

1
2
3
4
5
6
7
8
9
self.pos_embedding = nn.Embedding(max_length, hid_dim)



----

#pos=[16,102]
#pos_embedding(max_length,768) ,max_length=100
trg = self.dropout((self.tok_embedding(trg) * self.scale) + self.pos_embedding(pos))