Steps to reproduce
type: volume
name: demo-volume
backend: gcp # or aws
region: <REGION>
availability_zone: <AZ>
size: 10GB
type: fleet
name: demo-fleet
nodes: 1
backends: [gcp] # or [aws]
regions: [<REGION>]
availability_zones: [<AZ>]
resources:
cpu: 4..
memory: 1GB..
disk: 1GB..
gpu: 0
blocks: auto
type: dev-environment
volumes:
- demo-volume:/volume
init:
- echo $DSTACK_JOB_ID > /volume/job_id
resources:
cpu: 1..
memory: 1GB..
gpu: 0..
disk: 1GB..
- Create a fleet and a volume
-
dstack apply --name devenv-1 --fleet demo-fleet
-
ssh devenv-1 cat /volume/job_id
f5ecabff-61d7-4914-9ca4-bc4043069a66
-
dstack apply --name devenv-2 --fleet demo-fleet --reuse
Error (Volume error)
Failed to attach volume: unexpected error
-
ssh devenv-1 cat /volume/job_id
cat: /volume/job_ids: Input/output error
- (see server logs section for the error produced by this step)
Actual behaviour
The second job fails on Compute.attach_volume() since the volume is in use (already attached to the same instance), then, during failed job termination, the server calls Compute.detach_volume(), successfully detaching the volume from the instance despite it's still used by the first job.
Expected behaviour
No response
dstack version
0.20.19
Server logs
ERROR dstack._internal.server.background.pipeline_tasks.jobs_terminating:981 Got exception when detaching volume volume-gcp from
instance gcp-0
Traceback (most recent call last):
File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/jobs_terminating.py", line 940, in
_detach_volume_from_job_instance
await common.run_async(
...<4 lines>...
)
File "/home/def/dev/dstack/src/dstack/_internal/utils/common.py", line 50, in run_async
return await asyncio.get_running_loop().run_in_executor(None, func_with_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/def/.local/share/uv/python/cpython-3.13.9-linux-x86_64-gnu/lib/python3.13/concurrent/futures/thread.py",
line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/def/dev/dstack/src/dstack/_internal/core/backends/gcp/compute.py", line 857, in detach_volume
attachment_data = get_or_error(volume.get_attachment_data_for_instance(instance_id))
File "/home/def/dev/dstack/src/dstack/_internal/utils/common.py", line 292, in get_or_error
raise ValueError("Optional value is None")
ValueError: Optional value is None
Additional information
No response
Steps to reproduce
Actual behaviour
The second job fails on
Compute.attach_volume()since the volume is in use (already attached to the same instance), then, during failed job termination, the server callsCompute.detach_volume(), successfully detaching the volume from the instance despite it's still used by the first job.Expected behaviour
No response
dstack version
0.20.19
Server logs
ERROR dstack._internal.server.background.pipeline_tasks.jobs_terminating:981 Got exception when detaching volume volume-gcp from instance gcp-0 Traceback (most recent call last): File "/home/def/dev/dstack/src/dstack/_internal/server/background/pipeline_tasks/jobs_terminating.py", line 940, in _detach_volume_from_job_instance await common.run_async( ...<4 lines>... ) File "/home/def/dev/dstack/src/dstack/_internal/utils/common.py", line 50, in run_async return await asyncio.get_running_loop().run_in_executor(None, func_with_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/def/.local/share/uv/python/cpython-3.13.9-linux-x86_64-gnu/lib/python3.13/concurrent/futures/thread.py", line 59, in run result = self.fn(*self.args, **self.kwargs) File "/home/def/dev/dstack/src/dstack/_internal/core/backends/gcp/compute.py", line 857, in detach_volume attachment_data = get_or_error(volume.get_attachment_data_for_instance(instance_id)) File "/home/def/dev/dstack/src/dstack/_internal/utils/common.py", line 292, in get_or_error raise ValueError("Optional value is None") ValueError: Optional value is NoneAdditional information
No response