-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Traceback (most recent call last):
File "/home/bradley/venv/lib64/python3.7/site-packages/kwola/tasks/RunTrainingStep.py", line 646, in runTrainingStep
results = agent.learnFromBatches(batches)
File "/home/bradley/venv/lib64/python3.7/site-packages/kwola/components/agents/DeepLearningAgent.py", line 1499, in learnFromBatches
"computeRewards": True
File "/home/bradley/venv/lib64/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/bradley/venv/lib64/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
self._sync_params()
File "/home/bradley/venv/lib64/python3.7/site-packages/torch/nn/parallel/distributed.py", line 515, in _sync_params
self.broadcast_bucket_size)
File "/home/bradley/venv/lib64/python3.7/site-packages/torch/nn/parallel/distributed.py", line 485, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size)
RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:84] Timed out waiting 1800000ms for recv operation to complete