Open
Description
When I run the default training script for sd1.5, an error occurred on the line accelerate.backward(loss)
. The detailed error is as follows:
Traceback (most recent call last):
File "/mnt/lustre/zhuguibo/dcy/BrushNet/examples/brushnet/train_brushnet.py", line 1403, in <module>
main(args)
File "/mnt/lustre/zhuguibo/dcy/BrushNet/examples/brushnet/train_brushnet.py", line 1303, in main
accelerator.backward(loss)
File "/mnt/lustre/zhuguibo/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/accelerator.py", line 2248, in backward
loss.backward(**kwargs)
File "/mnt/lustre/zhuguibo/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/mnt/lustre/zhuguibo/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: an illegal instruction was encountered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Any one met the same issue?
Metadata
Metadata
Assignees
Labels
No labels