RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:859, invalid usage, NCCL version

小鱼儿 2022-10-15 13:49 63阅读 0赞
  1. Traceback (most recent call last):
  2. File "tools/train_net.py", line 209, in <module>
  3. launch(
  4. File "/home/e300/code/detectron2/detectron2/engine/launch.py", line 67, in launch
  5. mp.spawn(
  6. File "/home/e300/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 247, in spawn
  7. return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  8. File "/home/e300/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 205, in start_processes
  9. while not context.join():
  10. File "/home/e300/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 166, in join
  11. raise ProcessRaisedException(msg, error_index, failed_process.pid)
  12. torch.multiprocessing.spawn.ProcessRaisedException:
  13. -- Process 3 terminated with the following error:
  14. Traceback (most recent call last):
  15. File "/home/e300/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
  16. fn(i, *args)
  17. File "/home/e300/code/detectron2/detectron2/engine/launch.py", line 108, in _distributed_worker
  18. raise e
  19. File "/home/e300/code/detectron2/detectron2/engine/launch.py", line 98, in _distributed_worker
  20. dist.init_process_group(
  21. File "/home/e300/anaconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 467, in init_process_group
  22. barrier()
  23. File "/home/e300/anaconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2145, in barrier
  24. work = _default_pg.barrier()
  25. RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:859, invalid usage, NCCL version 2.7.8
  26. ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

错误原因:
用了多个gpu,根据自己电脑配置修改下gpu个数即可

发表评论

表情:
评论列表 (有 0 条评论,63人围观)

还没有评论,来说两句吧...

相关阅读