[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 #3432

MengqingCao · 2025-10-14T01:09:45Z

What this PR does / why we need it?

Adapt deepseek-v3.2 to vllm 0.11.0, removing the useless patch.

The final goal is to remove all the patches and align the code arch to vllm, thus we need to do the following work in next prs.
TODO:

remove patch on attention spec
refactor the kvcache creation logic

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with existing test.
Test pass with deepseek-v3.2-exp

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

github-actions · 2025-10-14T01:09:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the deepseek-v3.2 model to adapt to vllm 0.11.0, which involves removing obsolete patches and changing the mechanism for detecting sparse attention. The renaming of use_sfa to use_sparse improves code clarity. While the refactoring is extensive and generally well-executed, I've identified several critical issues where the old configuration access pattern was not fully updated. These will lead to AttributeError exceptions at runtime and need to be fixed.

vllm_ascend/distributed/mooncake_connector.py

vllm_ascend/torchair/models/torchair_deepseek_v2.py

vllm_ascend/torchair/torchair_model_runner.py

MengqingCao · 2025-10-14T03:23:38Z

Test pass with deepseek-v3.2-w8a8:

import os
from vllm import LLM, SamplingParams

os.environ["VLLM_USE_MODELSCOPE"] = "True"
os.environ["HCCL_BUFFSIZE"] = "1024"

def main():
    prompts = [
        "窗前明月光，",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪，",
        "家书抵万金啥意思？",
        "plz tell me a story: ",
    ]
    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(model="/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8",
              tensor_parallel_size=16,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=1024,
              # max_num_seqs=2,
              gpu_memory_utilization=0.9,
              quantization="ascend",
              additional_config={"ascend_scheduler_config":{"enabled":True}}
              )
    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if __name__ == "__main__":
    main()

Results:

root@cmq-docker:/vllm-workspace/vllm-ascend# python scripts/run_ds.py 
/vllm-workspace/vllm/vllm/__init__.py:7: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
  from .version import __version__, __version_tuple__  # isort:skip
INFO 10-14 03:13:10 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 10-14 03:13:10 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 10-14 03:13:10 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 10-14 03:13:10 [__init__.py:207] Platform plugin ascend is activated
WARNING 10-14 03:13:13 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 10-14 03:13:13 [registry.py:582] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 10-14 03:13:13 [registry.py:582] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
INFO 10-14 03:13:13 [utils.py:233] non-default args: {'trust_remote_code': True, 'max_model_len': 1024, 'tensor_parallel_size': 16, 'disable_log_stats': True, 'quantization': 'ascend', 'enforce_eager': True, 'additional_config': {'ascend_scheduler_config': {'enabled': True}}, 'model': '/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8'}
INFO 10-14 03:13:13 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
You are using a model of type deepseek_v32 to instantiate a model of type deepseek_v3. This is not supported for all configurations of models and can yield errors.
INFO 10-14 03:13:14 [config.py:388] Replacing legacy 'type' key with 'rope_type'
INFO 10-14 03:13:14 [model.py:547] Resolved architecture: DeepseekV32ForCausalLM
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 10-14 03:13:14 [model.py:1510] Using max model len 1024
INFO 10-14 03:13:14 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 10-14 03:13:14 [config.py:422] Using custom fp8 kv-cache format for DeepSeekV3.2
INFO 10-14 03:13:14 [__init__.py:381] Cudagraph is disabled under eager mode
INFO 10-14 03:13:14 [platform.py:179] Compilation disabled, using eager mode by default
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:15 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:15 [core.py:77] Initializing a V1 LLM engine (vdev) with config: model='/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8', speculative_config=None, tokenizer='/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=16, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=True, kv_cache_dtype=bfloat16, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8, enable_prefix_caching=True, chunked_prefill_enabled=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null}
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [multiproc_executor.py:720] Reducing Torch parallelism from 320 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:15 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], buffer_handle=(16, 16777216, 10, 'psm_528dbe69'), local_subscribe_addr='ipc:///tmp/4b94b813-95b9-4404-8669-00a689a1219c', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) WARNING 10-14 03:13:15 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:19 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:19 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_249dc2ae'), local_subscribe_addr='ipc:///tmp/d86fd2ed-d279-40ae-b560-764e15f8399a', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:20 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:20 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_195518fc'), local_subscribe_addr='ipc:///tmp/24dad618-fb15-48d1-aeca-78f767573735', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank1]:[W1014 03:13:20.861616957 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:20 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:21 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_2e30cb22'), local_subscribe_addr='ipc:///tmp/03a3543f-9b63-4c94-b105-5e87ba752092', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank2]:[W1014 03:13:21.455990564 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:21 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:21 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_61208f09'), local_subscribe_addr='ipc:///tmp/70194c40-4b87-4ae4-bfdf-b675407637eb', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank4]:[W1014 03:13:22.148502012 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:22 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:22 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_9c23977c'), local_subscribe_addr='ipc:///tmp/e43a6a47-f35b-4bc2-a573-ee09fd649b2f', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:22 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:22 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_5d684123'), local_subscribe_addr='ipc:///tmp/1cac3638-4b77-405d-a437-d26ee74b37c6', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank6]:[W1014 03:13:23.239528660 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:23 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:23 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c69bf8e4'), local_subscribe_addr='ipc:///tmp/47b459fb-012b-44ed-89d1-4c60b518c0d5', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank3]:[W1014 03:13:23.650381449 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank8]:[W1014 03:13:23.020025003 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:24 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:24 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_9882b1d1'), local_subscribe_addr='ipc:///tmp/8cf27b4e-5959-4f50-b2ec-a5232a1d7cb8', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:24 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:24 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_b001e390'), local_subscribe_addr='ipc:///tmp/385ece7c-0b36-4de8-aab0-c9968e874819', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank5]:[W1014 03:13:24.118055725 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank9]:[W1014 03:13:25.373742877 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:25 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:25 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_2622a75d'), local_subscribe_addr='ipc:///tmp/99a79e0a-2829-4404-a1f6-ccea361f973a', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:25 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:25 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:25 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_51dd5c95'), local_subscribe_addr='ipc:///tmp/0a850256-beaa-40d8-a2d7-d81969e175b6', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:25 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_27b575d4'), local_subscribe_addr='ipc:///tmp/e04cc5e2-8745-483a-a0fb-c36f9400e9e6', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank15]:[W1014 03:13:25.860076501 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank14]:[W1014 03:13:25.942467580 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank10]:[W1014 03:13:25.119551278 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:26 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:26 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_2d9165b3'), local_subscribe_addr='ipc:///tmp/b5af7133-1826-4a8f-97c2-de8286de9d00', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank11]:[W1014 03:13:26.717876490 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:26 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:26 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_f0f3d5d0'), local_subscribe_addr='ipc:///tmp/35f0ff5c-794d-40af-b098-5940539f4bff', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:27 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:27 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_692fda41'), local_subscribe_addr='ipc:///tmp/ff9a17fc-d701-44df-b551-da34cb15f477', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank13]:[W1014 03:13:27.371838371 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank7]:[W1014 03:13:27.579003951 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:27 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:27 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_5dae18ed'), local_subscribe_addr='ipc:///tmp/9b812ed3-4c08-4b5e-9340-7542e8e51eb0', remote_subscribe_addr=None, remote_addr_ipv6=False)
[rank12]:[W1014 03:13:27.135636056 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[rank0]:[W1014 03:13:27.143625798 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], buffer_handle=(15, 4194304, 6, 'psm_226b3c13'), local_subscribe_addr='ipc:///tmp/0989bbaa-cf89-462b-95ee-1aebb1d18e45', remote_subscribe_addr=None, remote_addr_ipv6=False)
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 10 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 10, EP rank 10
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 9 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 9, EP rank 9
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 2 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 1 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 0 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 3 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 11 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 11, EP rank 11
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 4 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 4, EP rank 4
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 5 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 5, EP rank 5
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 6 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 6, EP rank 6
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 7 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 7, EP rank 7
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 8 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 8, EP rank 8
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 14 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 14, EP rank 14
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 12 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 12, EP rank 12
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 13 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 13, EP rank 13
(EngineCore_DP0 pid=610071) INFO 10-14 03:13:28 [parallel_state.py:1208] rank 15 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 15, EP rank 15
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) INFO 10-14 03:13:28 [model_runner_v1.py:2567] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) INFO 10-14 03:13:29 [utils.py:64] Using the vLLM Ascend Quantization now!
Loading safetensors checkpoint shards:   0% Completed | 0/163 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   1% Completed | 1/163 [00:00<00:49,  3.25it/s]
Loading safetensors checkpoint shards:   1% Completed | 2/163 [00:00<00:56,  2.83it/s]
Loading safetensors checkpoint shards:   2% Completed | 3/163 [00:01<01:00,  2.64it/s]
Loading safetensors checkpoint shards:   2% Completed | 4/163 [00:01<01:04,  2.48it/s]
Loading safetensors checkpoint shards:   3% Completed | 5/163 [00:01<01:06,  2.39it/s]
Loading safetensors checkpoint shards:   4% Completed | 6/163 [00:02<01:06,  2.35it/s]
Loading safetensors checkpoint shards:   4% Completed | 7/163 [00:02<01:06,  2.33it/s]
Loading safetensors checkpoint shards:   5% Completed | 8/163 [00:03<01:08,  2.27it/s]
Loading safetensors checkpoint shards:   6% Completed | 9/163 [00:03<01:06,  2.31it/s]
Loading safetensors checkpoint shards:   6% Completed | 10/163 [00:04<01:06,  2.31it/s]
Loading safetensors checkpoint shards:   7% Completed | 11/163 [00:04<01:04,  2.36it/s]
Loading safetensors checkpoint shards:   7% Completed | 12/163 [00:04<01:02,  2.44it/s]
Loading safetensors checkpoint shards:   8% Completed | 13/163 [00:05<01:02,  2.40it/s]
Loading safetensors checkpoint shards:   9% Completed | 14/163 [00:05<01:01,  2.41it/s]
Loading safetensors checkpoint shards:   9% Completed | 15/163 [00:06<01:02,  2.38it/s]
Loading safetensors checkpoint shards:  10% Completed | 16/163 [00:06<01:02,  2.34it/s]
Loading safetensors checkpoint shards:  10% Completed | 17/163 [00:07<01:02,  2.32it/s]
Loading safetensors checkpoint shards:  11% Completed | 18/163 [00:07<01:01,  2.37it/s]
Loading safetensors checkpoint shards:  12% Completed | 19/163 [00:07<00:59,  2.41it/s]
Loading safetensors checkpoint shards:  12% Completed | 20/163 [00:08<00:57,  2.48it/s]
Loading safetensors checkpoint shards:  13% Completed | 21/163 [00:08<00:57,  2.47it/s]
Loading safetensors checkpoint shards:  13% Completed | 22/163 [00:09<00:57,  2.47it/s]
Loading safetensors checkpoint shards:  14% Completed | 23/163 [00:09<00:56,  2.50it/s]
Loading safetensors checkpoint shards:  15% Completed | 24/163 [00:09<00:55,  2.51it/s]
Loading safetensors checkpoint shards:  15% Completed | 25/163 [00:10<00:54,  2.52it/s]
Loading safetensors checkpoint shards:  16% Completed | 26/163 [00:10<00:42,  3.24it/s]
Loading safetensors checkpoint shards:  17% Completed | 27/163 [00:10<00:42,  3.23it/s]
Loading safetensors checkpoint shards:  17% Completed | 28/163 [00:11<00:45,  3.00it/s]
Loading safetensors checkpoint shards:  18% Completed | 29/163 [00:11<00:47,  2.82it/s]
Loading safetensors checkpoint shards:  18% Completed | 30/163 [00:11<00:48,  2.77it/s]
Loading safetensors checkpoint shards:  19% Completed | 31/163 [00:12<00:49,  2.65it/s]
Loading safetensors checkpoint shards:  20% Completed | 32/163 [00:12<00:50,  2.59it/s]
Loading safetensors checkpoint shards:  20% Completed | 33/163 [00:13<00:50,  2.56it/s]
Loading safetensors checkpoint shards:  21% Completed | 34/163 [00:13<00:50,  2.53it/s]
Loading safetensors checkpoint shards:  21% Completed | 35/163 [00:13<00:51,  2.50it/s]
Loading safetensors checkpoint shards:  22% Completed | 36/163 [00:14<00:51,  2.46it/s]
Loading safetensors checkpoint shards:  23% Completed | 37/163 [00:14<00:51,  2.43it/s]
Loading safetensors checkpoint shards:  23% Completed | 38/163 [00:15<00:50,  2.49it/s]
Loading safetensors checkpoint shards:  24% Completed | 39/163 [00:15<00:49,  2.50it/s]
Loading safetensors checkpoint shards:  25% Completed | 40/163 [00:15<00:49,  2.47it/s]
Loading safetensors checkpoint shards:  25% Completed | 41/163 [00:16<00:50,  2.41it/s]
Loading safetensors checkpoint shards:  26% Completed | 42/163 [00:16<00:50,  2.42it/s]
Loading safetensors checkpoint shards:  26% Completed | 43/163 [00:17<00:48,  2.46it/s]
Loading safetensors checkpoint shards:  27% Completed | 44/163 [00:17<00:48,  2.44it/s]
Loading safetensors checkpoint shards:  28% Completed | 45/163 [00:18<00:48,  2.42it/s]
Loading safetensors checkpoint shards:  28% Completed | 46/163 [00:18<00:47,  2.48it/s]
Loading safetensors checkpoint shards:  29% Completed | 47/163 [00:18<00:44,  2.58it/s]
Loading safetensors checkpoint shards:  29% Completed | 48/163 [00:18<00:35,  3.28it/s]
Loading safetensors checkpoint shards:  30% Completed | 49/163 [00:19<00:37,  3.06it/s]
Loading safetensors checkpoint shards:  31% Completed | 50/163 [00:19<00:39,  2.88it/s]
Loading safetensors checkpoint shards:  31% Completed | 51/163 [00:20<00:39,  2.82it/s]
Loading safetensors checkpoint shards:  32% Completed | 52/163 [00:20<00:40,  2.73it/s]
Loading safetensors checkpoint shards:  33% Completed | 53/163 [00:20<00:41,  2.65it/s]
Loading safetensors checkpoint shards:  33% Completed | 54/163 [00:21<00:42,  2.59it/s]
Loading safetensors checkpoint shards:  34% Completed | 55/163 [00:21<00:42,  2.53it/s]
Loading safetensors checkpoint shards:  34% Completed | 56/163 [00:22<00:42,  2.49it/s]
Loading safetensors checkpoint shards:  35% Completed | 57/163 [00:22<00:42,  2.50it/s]
Loading safetensors checkpoint shards:  36% Completed | 58/163 [00:22<00:41,  2.52it/s]
Loading safetensors checkpoint shards:  36% Completed | 59/163 [00:23<00:41,  2.50it/s]
Loading safetensors checkpoint shards:  37% Completed | 60/163 [00:23<00:41,  2.49it/s]
Loading safetensors checkpoint shards:  37% Completed | 61/163 [00:24<00:41,  2.49it/s]
Loading safetensors checkpoint shards:  38% Completed | 62/163 [00:24<00:40,  2.52it/s]
Loading safetensors checkpoint shards:  39% Completed | 63/163 [00:24<00:39,  2.54it/s]
Loading safetensors checkpoint shards:  39% Completed | 64/163 [00:25<00:38,  2.57it/s]
Loading safetensors checkpoint shards:  40% Completed | 65/163 [00:25<00:37,  2.58it/s]
Loading safetensors checkpoint shards:  40% Completed | 66/163 [00:25<00:37,  2.59it/s]
Loading safetensors checkpoint shards:  41% Completed | 67/163 [00:26<00:36,  2.63it/s]
Loading safetensors checkpoint shards:  42% Completed | 68/163 [00:26<00:36,  2.61it/s]
Loading safetensors checkpoint shards:  42% Completed | 69/163 [00:27<00:36,  2.58it/s]
Loading safetensors checkpoint shards:  43% Completed | 70/163 [00:27<00:36,  2.52it/s]
Loading safetensors checkpoint shards:  44% Completed | 71/163 [00:27<00:37,  2.49it/s]
Loading safetensors checkpoint shards:  44% Completed | 72/163 [00:28<00:36,  2.50it/s]
Loading safetensors checkpoint shards:  45% Completed | 73/163 [00:28<00:36,  2.46it/s]
Loading safetensors checkpoint shards:  45% Completed | 74/163 [00:29<00:36,  2.44it/s]
Loading safetensors checkpoint shards:  46% Completed | 75/163 [00:29<00:36,  2.39it/s]
Loading safetensors checkpoint shards:  47% Completed | 76/163 [00:30<00:35,  2.43it/s]
Loading safetensors checkpoint shards:  47% Completed | 77/163 [00:30<00:35,  2.42it/s]
Loading safetensors checkpoint shards:  48% Completed | 78/163 [00:30<00:35,  2.41it/s]
Loading safetensors checkpoint shards:  48% Completed | 79/163 [00:31<00:34,  2.41it/s]
Loading safetensors checkpoint shards:  49% Completed | 80/163 [00:31<00:34,  2.39it/s]
Loading safetensors checkpoint shards:  50% Completed | 81/163 [00:32<00:34,  2.37it/s]
Loading safetensors checkpoint shards:  50% Completed | 82/163 [00:32<00:34,  2.33it/s]
Loading safetensors checkpoint shards:  51% Completed | 83/163 [00:33<00:34,  2.33it/s]
Loading safetensors checkpoint shards:  52% Completed | 84/163 [00:33<00:33,  2.33it/s]
Loading safetensors checkpoint shards:  52% Completed | 85/163 [00:33<00:33,  2.33it/s]
Loading safetensors checkpoint shards:  53% Completed | 86/163 [00:34<00:33,  2.29it/s]
Loading safetensors checkpoint shards:  53% Completed | 87/163 [00:34<00:33,  2.26it/s]
Loading safetensors checkpoint shards:  54% Completed | 88/163 [00:35<00:32,  2.29it/s]
Loading safetensors checkpoint shards:  55% Completed | 89/163 [00:35<00:32,  2.29it/s]
Loading safetensors checkpoint shards:  55% Completed | 90/163 [00:36<00:31,  2.30it/s]
Loading safetensors checkpoint shards:  56% Completed | 91/163 [00:36<00:31,  2.30it/s]
Loading safetensors checkpoint shards:  56% Completed | 92/163 [00:36<00:30,  2.32it/s]
Loading safetensors checkpoint shards:  57% Completed | 93/163 [00:37<00:29,  2.34it/s]
Loading safetensors checkpoint shards:  58% Completed | 94/163 [00:37<00:29,  2.33it/s]
Loading safetensors checkpoint shards:  58% Completed | 95/163 [00:38<00:28,  2.35it/s]
Loading safetensors checkpoint shards:  59% Completed | 96/163 [00:38<00:27,  2.40it/s]
Loading safetensors checkpoint shards:  60% Completed | 97/163 [00:38<00:26,  2.45it/s]
Loading safetensors checkpoint shards:  60% Completed | 98/163 [00:39<00:26,  2.47it/s]
Loading safetensors checkpoint shards:  61% Completed | 99/163 [00:39<00:25,  2.48it/s]
Loading safetensors checkpoint shards:  61% Completed | 100/163 [00:40<00:25,  2.44it/s]
Loading safetensors checkpoint shards:  62% Completed | 101/163 [00:40<00:20,  3.09it/s]
Loading safetensors checkpoint shards:  63% Completed | 102/163 [00:40<00:20,  2.97it/s]
Loading safetensors checkpoint shards:  63% Completed | 103/163 [00:41<00:22,  2.69it/s]
Loading safetensors checkpoint shards:  64% Completed | 104/163 [00:41<00:22,  2.59it/s]
Loading safetensors checkpoint shards:  64% Completed | 105/163 [00:41<00:22,  2.55it/s]
Loading safetensors checkpoint shards:  65% Completed | 106/163 [00:42<00:22,  2.52it/s]
Loading safetensors checkpoint shards:  66% Completed | 107/163 [00:42<00:18,  2.96it/s]
Loading safetensors checkpoint shards:  66% Completed | 108/163 [00:42<00:18,  2.91it/s]
Loading safetensors checkpoint shards:  67% Completed | 109/163 [00:43<00:19,  2.74it/s]
Loading safetensors checkpoint shards:  67% Completed | 110/163 [00:43<00:20,  2.65it/s]
Loading safetensors checkpoint shards:  68% Completed | 111/163 [00:44<00:19,  2.63it/s]
Loading safetensors checkpoint shards:  69% Completed | 112/163 [00:44<00:20,  2.48it/s]
Loading safetensors checkpoint shards:  69% Completed | 113/163 [00:45<00:20,  2.46it/s]
Loading safetensors checkpoint shards:  70% Completed | 114/163 [00:45<00:19,  2.46it/s]
Loading safetensors checkpoint shards:  71% Completed | 115/163 [00:45<00:19,  2.49it/s]
Loading safetensors checkpoint shards:  71% Completed | 116/163 [00:46<00:18,  2.53it/s]
Loading safetensors checkpoint shards:  72% Completed | 117/163 [00:46<00:18,  2.47it/s]
Loading safetensors checkpoint shards:  72% Completed | 118/163 [00:47<00:18,  2.41it/s]
Loading safetensors checkpoint shards:  73% Completed | 119/163 [00:47<00:18,  2.40it/s]
Loading safetensors checkpoint shards:  74% Completed | 120/163 [00:47<00:17,  2.43it/s]
Loading safetensors checkpoint shards:  74% Completed | 121/163 [00:48<00:17,  2.43it/s]
Loading safetensors checkpoint shards:  75% Completed | 122/163 [00:48<00:17,  2.40it/s]
Loading safetensors checkpoint shards:  75% Completed | 123/163 [00:49<00:16,  2.41it/s]
Loading safetensors checkpoint shards:  76% Completed | 124/163 [00:49<00:16,  2.34it/s]
Loading safetensors checkpoint shards:  77% Completed | 125/163 [00:50<00:16,  2.30it/s]
Loading safetensors checkpoint shards:  77% Completed | 126/163 [00:50<00:15,  2.32it/s]
Loading safetensors checkpoint shards:  78% Completed | 127/163 [00:50<00:15,  2.37it/s]
Loading safetensors checkpoint shards:  79% Completed | 128/163 [00:51<00:14,  2.40it/s]
Loading safetensors checkpoint shards:  79% Completed | 129/163 [00:51<00:11,  2.93it/s]
Loading safetensors checkpoint shards:  80% Completed | 130/163 [00:51<00:11,  2.96it/s]
Loading safetensors checkpoint shards:  80% Completed | 131/163 [00:52<00:11,  2.77it/s]
Loading safetensors checkpoint shards:  81% Completed | 132/163 [00:52<00:11,  2.72it/s]
Loading safetensors checkpoint shards:  82% Completed | 133/163 [00:52<00:11,  2.64it/s]
Loading safetensors checkpoint shards:  82% Completed | 134/163 [00:53<00:10,  2.64it/s]
Loading safetensors checkpoint shards:  83% Completed | 135/163 [00:53<00:10,  2.65it/s]
Loading safetensors checkpoint shards:  83% Completed | 136/163 [00:54<00:10,  2.55it/s]
Loading safetensors checkpoint shards:  84% Completed | 137/163 [00:54<00:10,  2.55it/s]
Loading safetensors checkpoint shards:  85% Completed | 138/163 [00:54<00:09,  2.60it/s]
Loading safetensors checkpoint shards:  85% Completed | 139/163 [00:55<00:09,  2.60it/s]
Loading safetensors checkpoint shards:  86% Completed | 140/163 [00:55<00:08,  2.56it/s]
Loading safetensors checkpoint shards:  87% Completed | 141/163 [00:56<00:08,  2.57it/s]
Loading safetensors checkpoint shards:  87% Completed | 142/163 [00:56<00:08,  2.53it/s]
Loading safetensors checkpoint shards:  88% Completed | 143/163 [00:56<00:08,  2.41it/s]
Loading safetensors checkpoint shards:  88% Completed | 144/163 [00:57<00:07,  2.45it/s]
Loading safetensors checkpoint shards:  89% Completed | 145/163 [00:57<00:07,  2.45it/s]
Loading safetensors checkpoint shards:  90% Completed | 146/163 [00:58<00:07,  2.41it/s]
Loading safetensors checkpoint shards:  90% Completed | 147/163 [00:58<00:06,  2.42it/s]
Loading safetensors checkpoint shards:  91% Completed | 148/163 [00:58<00:06,  2.44it/s]
Loading safetensors checkpoint shards:  91% Completed | 149/163 [00:59<00:05,  2.43it/s]
Loading safetensors checkpoint shards:  93% Completed | 151/163 [00:59<00:03,  3.19it/s]
Loading safetensors checkpoint shards:  93% Completed | 152/163 [01:00<00:03,  2.94it/s]
Loading safetensors checkpoint shards:  94% Completed | 153/163 [01:00<00:03,  2.80it/s]
Loading safetensors checkpoint shards:  94% Completed | 154/163 [01:01<00:03,  2.74it/s]
Loading safetensors checkpoint shards:  95% Completed | 155/163 [01:01<00:02,  2.69it/s]
Loading safetensors checkpoint shards:  96% Completed | 156/163 [01:01<00:02,  2.66it/s]
Loading safetensors checkpoint shards:  96% Completed | 157/163 [01:02<00:02,  2.61it/s]
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) INFO 10-14 03:14:34 [default_loader.py:267] Loading weights took 62.18 seconds
Loading safetensors checkpoint shards:  97% Completed | 158/163 [01:02<00:01,  2.61it/s]
Loading safetensors checkpoint shards:  98% Completed | 159/163 [01:02<00:01,  2.58it/s]
Loading safetensors checkpoint shards:  98% Completed | 160/163 [01:03<00:01,  2.60it/s]
Loading safetensors checkpoint shards:  99% Completed | 161/163 [01:03<00:00,  2.57it/s]
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) INFO 10-14 03:14:36 [default_loader.py:267] Loading weights took 63.84 seconds
Loading safetensors checkpoint shards:  99% Completed | 162/163 [01:04<00:00,  2.54it/s]
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) INFO 10-14 03:14:36 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
Loading safetensors checkpoint shards: 100% Completed | 163/163 [01:04<00:00,  2.55it/s]
Loading safetensors checkpoint shards: 100% Completed | 163/163 [01:04<00:00,  2.53it/s]
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) 
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) INFO 10-14 03:14:37 [default_loader.py:267] Loading weights took 64.60 seconds
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) INFO 10-14 03:14:38 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) INFO 10-14 03:14:39 [default_loader.py:267] Loading weights took 66.48 seconds
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) INFO 10-14 03:14:39 [default_loader.py:267] Loading weights took 66.61 seconds
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) INFO 10-14 03:14:39 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) INFO 10-14 03:14:39 [default_loader.py:267] Loading weights took 66.72 seconds
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) INFO 10-14 03:14:39 [default_loader.py:267] Loading weights took 66.88 seconds
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) INFO 10-14 03:14:39 [default_loader.py:267] Loading weights took 66.97 seconds
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) INFO 10-14 03:14:39 [default_loader.py:267] Loading weights took 67.03 seconds
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) INFO 10-14 03:14:40 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) INFO 10-14 03:14:41 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) INFO 10-14 03:14:41 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) INFO 10-14 03:14:41 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) INFO 10-14 03:14:41 [default_loader.py:267] Loading weights took 68.47 seconds
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) INFO 10-14 03:14:41 [default_loader.py:267] Loading weights took 68.59 seconds
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) INFO 10-14 03:14:41 [default_loader.py:267] Loading weights took 68.65 seconds
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) INFO 10-14 03:14:41 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) INFO 10-14 03:14:41 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) INFO 10-14 03:14:41 [default_loader.py:267] Loading weights took 69.12 seconds
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) INFO 10-14 03:14:41 [default_loader.py:267] Loading weights took 69.19 seconds
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) INFO 10-14 03:14:42 [default_loader.py:267] Loading weights took 69.29 seconds
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) INFO 10-14 03:14:42 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) INFO 10-14 03:14:42 [default_loader.py:267] Loading weights took 69.54 seconds
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) INFO 10-14 03:14:42 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) INFO 10-14 03:14:42 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) INFO 10-14 03:14:43 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) INFO 10-14 03:14:43 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) INFO 10-14 03:14:43 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) INFO 10-14 03:14:43 [model_runner_v1.py:2593] Loading model weights took 42.5184 GB
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) WARNING 10-14 03:14:44 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) WARNING 10-14 03:14:45 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) WARNING 10-14 03:14:46 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) WARNING 10-14 03:14:46 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) WARNING 10-14 03:14:47 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) WARNING 10-14 03:14:47 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) WARNING 10-14 03:14:47 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) WARNING 10-14 03:14:47 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10441577676, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10218149068, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10456347852, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10456323276, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10441426124, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10441319628, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10218968268, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10232923340, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10231706828, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10454742220, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10456061132, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10221147340, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10450908364, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 9180513484, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10217718988, total memory: 65464696832
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) INFO 10-14 03:14:52 [worker_v1.py:238] Available memory: 10232415436, total memory: 65464696832
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 65,280 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 63.75x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,368 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.62x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,704 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 71.00x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,240 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.50x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,704 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 71.00x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,240 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.50x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,704 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 71.00x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,240 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.50x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,704 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 71.00x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,368 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.62x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,704 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 71.00x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,240 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.50x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,576 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 70.88x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,368 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.62x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 72,704 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 71.00x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1087] GPU KV cache size: 74,368 tokens
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.62x
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:52 [core.py:210] init engine (profile, create kv cache, warmup model) took 9.04 seconds
(EngineCore_DP0 pid=610071) WARNING 10-14 03:14:53 [core.py:112] Using configured V1 scheduler class vllm_ascend.core.scheduler.AscendScheduler. This scheduler interface is not public and compatibility may not be maintained.
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:53 [__init__.py:381] Cudagraph is disabled under eager mode
(EngineCore_DP0 pid=610071) INFO 10-14 03:14:53 [platform.py:179] Compilation disabled, using eager mode by default
INFO 10-14 03:14:53 [llm.py:306] Supported_tasks: ['generate']
Adding requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 329.43it/s]
Processed prompts:   0%|                                                                         | 0/7 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s](EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP7 pid=610113)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP5 pid=610103)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP14 pid=610148)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP4 pid=610098)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP6 pid=610108)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP3 pid=610093)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP10 pid=610128)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP9 pid=610123)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP12 pid=610138)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP15 pid=610153)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP1 pid=610083)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP11 pid=610133)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP2 pid=610088)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP8 pid=610118)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP0 pid=610078)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143) /vllm-workspace/vllm-ascend/vllm_ascend/attention/sfa_v1.py:374: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
(EngineCore_DP0 pid=610071) (Worker_TP13 pid=610143)   actual_query_lens = torch.tensor(query_lens[reqs_start:],
Processed prompts: 100%|████████████████████████████████████████████████████████████████| 7/7 [00:15<00:00,  2.25s/it, est. speed input: 3.31 toks/s, output: 44.52 toks/s]
Prompt: '窗前明月光，', Generated text: '疑是地上霜。\n\n举头望明月，低头思故乡。\n\n李白《静夜思》\n\n床前洒满皎洁的月光，诗人恍惚间以为是地上的秋霜。抬起头来仰望天上的明月，低下头来不由得思念起遥远的故乡。\n\n这首小诗，没有精工华美的辞藻，没有奇特新颖的立意，只是用叙述的语气，写远客思乡之情，然而它却意味深长，耐人寻绎，千百年来，如此广泛地吸引着'
Prompt: 'The president of the United States is Mr.', Generated text: ' Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States'
Prompt: 'The capital of France is', Generated text: ' Paris, one of the most important and influential cities in the world. Paris is located in the north-central part of the country, on the banks of the Seine River. It is not only the political center of France but also a global hub for art, fashion, gastronomy, and culture. The city is renowned for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, the Louvre Museum, and the Champs-Élysées. Paris has a rich history that dates'
Prompt: 'The future of AI is', Generated text: " here, and it's changing everything. From healthcare to transportation, AI is revolutionizing industries and transforming the way we live and work. But what does this mean for you? How can you stay ahead of the curve and thrive in this new era of intelligence? In this video, we'll explore the latest advancements in AI and what they mean for the future. We'll dive into the world of machine learning, natural language processing, and computer vision, and see how these technologies are being applied in real-world"
Prompt: '感时花溅泪，', Generated text: '恨别鸟惊心。 烽火连三月，家书抵万金。 白头搔更短，浑欲不胜簪。 4、望岳 杜甫 岱宗夫如何，齐鲁青未了。 造化钟神秀，阴阳割昏晓。 荡胸生层云，决眦入归鸟。 会当凌绝顶，一览众山小。 5、春望 杜甫 国破山河在，城春'
Prompt: '家书抵万金啥意思？', Generated text: '家书抵万金的意思及全诗出处和翻译赏析\n\n家书抵万金，这是一句流传千古的诗句，它表达了家书在人们心中的珍贵和重要性。那么，家书抵万金到底是什么意思呢？本文将从全诗出处、翻译赏析等方面进行探讨。\n\n一、全诗出处\n\n家书抵万金出自唐代诗人杜甫的《春望》。全诗如下：\n\n国破山河在，城春草木深。\n\n感时花溅泪'
Prompt: 'plz tell me a story: ', Generated text: "2nd grade reading level, about a girl who wants to be a scientist when she grows up and she goes to the moon\n\nOf course! Here is a story for you.\n\n### Luna's Big Dream\n\nLuna loved science. While other kids had posters of pop stars or cartoon characters, Luna had a giant poster of the solar system above her bed. Her favorite subject in school was when her class got to go to the library and learn about planets and stars.\n\nOne night, she pointed her"

github-actions · 2025-10-14T13:33:04Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MengqingCao <[email protected]>

MengqingCao · 2025-10-15T02:29:43Z

vllm_ascend/models/deepseek_v2.py

+class AscendDeepseekV2Model(DeepseekV2Model, nn.Module):
+
+    def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
+        # Rewrite this init func mainly for removing cuda-hard code


we need to add this for vLLM 0.11.0, as the cuda hard code, ptal @zzzzwwjj @wangxiyuan

Any vllm PR to fix this hard code?

Yes, merged now vllm-project/vllm@302ef40

MengqingCao · 2025-10-15T06:21:04Z

Test passed with torchair enabled:

import os
from vllm import LLM, SamplingParams

os.environ["VLLM_USE_MODELSCOPE"] = "True"
os.environ["HCCL_BUFFSIZE"] = "1024"

def main():
    prompts = [
        "窗前明月光，",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪，",
        "家书抵万金啥意思？",
        "plz tell me a story: ",
    ]
    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(model="/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8",
              tensor_parallel_size=16,
              trust_remote_code=True,
              max_model_len=1024,
              # max_num_seqs=2,
              gpu_memory_utilization=0.9,
              quantization="ascend",
              additional_config={
                                 "ascend_scheduler_config":{"enabled":True},
                                 "torchair_graph_config":{"enabled":True,"graph_batch_sizes":[16]}
                                 }
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if __name__ == "__main__":
    main()

INFO 10-15 06:14:35 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 10-15 06:14:35 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 10-15 06:14:35 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 10-15 06:14:35 [__init__.py:207] Platform plugin ascend is activated
WARNING 10-15 06:14:38 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 10-15 06:14:38 [registry.py:582] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 10-15 06:14:38 [registry.py:582] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
INFO 10-15 06:14:38 [utils.py:233] non-default args: {'trust_remote_code': True, 'max_model_len': 1024, 'tensor_parallel_size': 16, 'disable_log_stats': True, 'quantization': 'ascend', 'additional_config': {'ascend_scheduler_config': {'enabled': True}, 'torchair_graph_config': {'enabled': True, 'graph_batch_sizes': [16]}}, 'model': '/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8'}
INFO 10-15 06:14:38 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 10-15 06:14:39 [config.py:388] Replacing legacy 'type' key with 'rope_type'
INFO 10-15 06:14:47 [model.py:547] Resolved architecture: DeepseekV32ForCausalLM
INFO 10-15 06:14:47 [model.py:1510] Using max model len 1024
INFO 10-15 06:14:47 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 10-15 06:14:47 [config.py:422] Using custom fp8 kv-cache format for DeepSeekV3.2
INFO 10-15 06:14:47 [platform.py:195] Torchair compilation enabled on NPU. Setting CUDAGraphMode to NONE
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:48 [core.py:644] Waiting for init message from front-end.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:48 [core.py:77] Initializing a V1 LLM engine (vdev) with config: model='/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8', speculative_config=None, tokenizer='/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=16, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=bfloat16, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8, enable_prefix_caching=True, chunked_prefill_enabled=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [multiproc_executor.py:720] Reducing Torch parallelism from 320 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:48 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], buffer_handle=(16, 16777216, 10, 'psm_bb1b76b6'), local_subscribe_addr='ipc:///tmp/76781728-c71d-416b-ac33-5431bad2c47f', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:48 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:14:49 [camem.py:64] Failed to import vllm_ascend_C:libvllm_ascend_kernels.so: cannot open shared object file: No such file or directory. Sleep mode will be disabled. 
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:53 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:53 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_6b821a03'), local_subscribe_addr='ipc:///tmp/e7f05ee9-1c64-45d9-ad1d-f0ed21673d8f', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:54 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:54 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_e6cbccc0'), local_subscribe_addr='ipc:///tmp/af6931d8-180f-4f68-87d1-903fb4288340', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:54 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:54 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_e36fa14a'), local_subscribe_addr='ipc:///tmp/6f7a3853-9ca3-4694-a7f7-b163043cd844', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:54 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:54 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_31f04bb2'), local_subscribe_addr='ipc:///tmp/03be8942-b405-49ad-8ae6-09d8b0021834', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:55 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:55 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_49c71683'), local_subscribe_addr='ipc:///tmp/791c5972-fd9c-4af7-a60c-02a047e084f8', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:55 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:55 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_ad2e9fd0'), local_subscribe_addr='ipc:///tmp/798eaa9e-08f1-4a9d-a645-2d3997417ba6', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:56 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:56 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_6cc20d33'), local_subscribe_addr='ipc:///tmp/9f663d76-0441-4286-8e06-aa1588143d50', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:57 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:57 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_b1d2b920'), local_subscribe_addr='ipc:///tmp/432ebb06-9b74-477c-b5c6-244cb8696824', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:57 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_b3478d7f'), local_subscribe_addr='ipc:///tmp/5ce668fd-2ec8-442d-a3fa-e1295a20ba36', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_5986bdcf'), local_subscribe_addr='ipc:///tmp/5a9a06af-5c1b-435f-8789-8767419c1f8b', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_1ed29e8d'), local_subscribe_addr='ipc:///tmp/cb123463-9311-4126-8287-c71e0fd9233c', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:58 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_50348932'), local_subscribe_addr='ipc:///tmp/abb27451-9cec-4827-8a13-2324f9d66117', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:59 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:59 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_90f7614f'), local_subscribe_addr='ipc:///tmp/dfc463c2-f42c-4cd1-bfe7-d74beba9dbcb', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:59 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:14:59 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c55689df'), local_subscribe_addr='ipc:///tmp/3558143d-c149-4456-95e4-148c9ec32ee4', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:00 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:00 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_6fe4e21a'), local_subscribe_addr='ipc:///tmp/fe1b9325-529b-4984-9da5-82ce09e79abe', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [worker_v1.py:102] custom_ops module loaded successfully. Custom operators like torch.ops.custom.npu_sparse_flash_attention are now available.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_4ac52f6b'), local_subscribe_addr='ipc:///tmp/249a79bd-5a0d-48b5-aa95-a1898e10196b', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], buffer_handle=(15, 4194304, 6, 'psm_b8ac1671'), local_subscribe_addr='ipc:///tmp/89d05bee-ce93-4bf5-994a-c8782c5bc941', remote_subscribe_addr=None, remote_addr_ipv6=False)
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 0 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 2 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 3 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 1 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 5 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 5, EP rank 5
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 4 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 4, EP rank 4
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 7 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 7, EP rank 7
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 8 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 8, EP rank 8
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 10 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 10, EP rank 10
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 9 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 9, EP rank 9
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 12 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 12, EP rank 12
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 13 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 13, EP rank 13
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 14 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 14, EP rank 14
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 11 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 11, EP rank 11
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 15 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 15, EP rank 15
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:15:01 [parallel_state.py:1208] rank 6 in world size 16 is assigned as DP rank 0, PP rank 0, TP rank 6, EP rank 6
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:15:01 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:15:01 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:01 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_mtp:TorchairDeepSeekMTP.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v2:TorchairDeepseekV2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_deepseek_v3:TorchairDeepseekV3ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen2:CustomQwen2ForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.qwen3_moe:CustomQwen3MoeForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:15:02 [registry.py:582] Model architecture PanguProMoEForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.torchair.models.torchair_pangu_moe:PanguProMoEForCausalLM.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:15:02 [model_runner_v1.py:2596] Starting to load model /vllm-workspace/cache/modelscope/hub/models/vllm-ascend/DeepSeek-V3___2-Exp-W8A8...
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:15:02 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:15:03 [utils.py:64] Using the vLLM Ascend Quantization now!
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:16:08 [default_loader.py:267] Loading weights took 61.53 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:16:08 [default_loader.py:267] Loading weights took 61.47 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:16:09 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:16:09 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:16:10 [default_loader.py:267] Loading weights took 63.77 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:16:11 [default_loader.py:267] Loading weights took 64.48 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:16:11 [default_loader.py:267] Loading weights took 64.40 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:16:11 [default_loader.py:267] Loading weights took 64.50 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:16:11 [default_loader.py:267] Loading weights took 64.54 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:16:11 [default_loader.py:267] Loading weights took 64.72 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:16:11 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:16:12 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:16:13 [default_loader.py:267] Loading weights took 66.21 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:16:13 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:16:13 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:16:13 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:16:13 [default_loader.py:267] Loading weights took 66.30 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:16:13 [default_loader.py:267] Loading weights took 66.43 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:16:13 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:16:13 [default_loader.py:267] Loading weights took 66.89 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:16:13 [default_loader.py:267] Loading weights took 66.96 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:16:14 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:16:14 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:16:14 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:16:14 [default_loader.py:267] Loading weights took 67.88 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:16:14 [default_loader.py:267] Loading weights took 68.03 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:16:14 [default_loader.py:267] Loading weights took 68.01 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:16:14 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:16:15 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:16:16 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:16:16 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:16:16 [model_runner_v1.py:2622] Loading model weights took 42.5174 GB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m WARNING 10-15 06:16:16 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m WARNING 10-15 06:16:17 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m WARNING 10-15 06:16:18 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m WARNING 10-15 06:16:18 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m WARNING 10-15 06:16:18 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m WARNING 10-15 06:16:19 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m WARNING 10-15 06:16:20 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m WARNING 10-15 06:16:20 [cudagraph_dispatcher.py:106] cudagraph dispatching keys are not initialized. No cudagraph will be used.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:16:24 [worker_v1.py:238] Available memory: 10369326284, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:16:24 [worker_v1.py:238] Available memory: 10369486028, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:16:24 [worker_v1.py:238] Available memory: 10382314700, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:16:24 [worker_v1.py:238] Available memory: 10596543692, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:16:24 [worker_v1.py:238] Available memory: 10381061324, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:16:24 [worker_v1.py:238] Available memory: 9331727564, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10382884044, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10598223052, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10373889228, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10370075852, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10597891276, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10612915404, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10612055244, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10613214412, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10609741004, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:16:25 [worker_v1.py:238] Available memory: 10608938188, total memory: 65464696832
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 66,304 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 64.75x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,856 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.12x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,856 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.12x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,856 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.12x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,728 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.00x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,728 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.00x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,728 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.00x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 73,728 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 72.00x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1087] GPU KV cache size: 75,392 tokens
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:16:25 [kv_cache_utils.py:1091] Maximum concurrency for 1,024 tokens per request: 73.62x
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:16:25 [torchair_model_runner.py:240] Capturing torchair graph, this usually takes 1.0~3.0 mins.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:17:25 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
................................�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:18:10 [torchair_model_runner.py:204] Batchsize 256 is compiled successfully: 1/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:18:25 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
.�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:19:25 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
...............................�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP14 pid=924517)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP3 pid=924472)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP6 pid=924481)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP0 pid=924463)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.15 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP8 pid=924487)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP7 pid=924484)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP4 pid=924475)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP11 pid=924496)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP13 pid=924508)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP2 pid=924469)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP15 pid=924520)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP1 pid=924466)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP10 pid=924493)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP5 pid=924478)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP9 pid=924490)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.14 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:19:56 [torchair_model_runner.py:204] Batchsize 16 is compiled successfully: 2/2.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m �[1;36m(Worker_TP12 pid=924505)�[0;0m INFO 10-15 06:19:56 [model_runner_v1.py:3520] Graph capturing finished in 211 secs, took 0.16 GiB
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:19:56 [core.py:210] init engine (profile, create kv cache, warmup model) took 219.83 seconds
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m WARNING 10-15 06:19:56 [core.py:112] Using configured V1 scheduler class vllm_ascend.core.scheduler.AscendScheduler. This scheduler interface is not public and compatibility may not be maintained.
�[1;36m(EngineCore_DP0 pid=924456)�[0;0m INFO 10-15 06:19:56 [platform.py:195] Torchair compilation enabled on NPU. Setting CUDAGraphMode to NONE
INFO 10-15 06:19:57 [llm.py:306] Supported_tasks: ['generate']
Prompt: '窗前明月光，', Generated text: '疑是地上霜。\n\n举头望明月，低头思故乡。\n\n李白《静夜思》\n\n月出皎兮，佼人僚兮，舒窈纠兮，劳心悄兮！\n\n《诗经·陈风·月出》\n\n海上生明月，天涯共此时。\n\n情人怨遥夜，竟夕起相思。\n\n灭烛怜光满，披衣觉露滋。\n\n不堪盈手赠，还寝梦佳期。\n\n张九龄《望月怀远》\n\n'
Prompt: 'The president of the United States is Mr.', Generated text: ' Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States.\n\nThe president of the United States is Mr. Obama.\n\nMr. Obama is the president of the United States'
Prompt: 'The capital of France is', Generated text: " Paris, one of the most important cultural and commercial centers in the world. Paris is located in the north-central part of the country, on the Seine River. It is known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Paris is also famous for its fashion, cuisine, art, and romantic ambiance. As the capital, Paris serves as the political, administrative, and economic heart of France, hosting the French government, including the President's"
Prompt: 'The future of AI is', Generated text: " here, and it's changing everything we know about technology and society. From self-driving cars to virtual assistants, AI is transforming the way we live, work, and interact with the world around us. But what does the future hold for this rapidly evolving technology? In this article, we'll explore the latest trends and predictions for the future of AI, and what it means for businesses, individuals, and society as a whole.\n\n## The Rise of Artificial General Intelligence (AGI)\n\nOne of the most"
Prompt: '感时花溅泪，', Generated text: '恨别鸟惊心。 烽火连三月，家书抵万金。 白头搔更短，浑欲不胜簪。 4、望岳 杜甫 岱宗夫如何，齐鲁青未了。 造化钟神秀，阴阳割昏晓。 荡胸生层云，决眦入归鸟。 会当凌绝顶，一览众山小。 5、春望 杜甫 国破山河在，城春'
Prompt: '家书抵万金啥意思？', Generated text: '家书抵万金的意思及全诗出处和翻译赏析\n\n家书抵万金，这是一句流传千古的诗句，它表达了家书在人们心中的珍贵和重要性。那么，家书抵万金到底是什么意思呢？本文将从全诗出处、翻译赏析等方面进行探讨。\n\n一、全诗出处\n\n家书抵万金出自唐代诗人杜甫的《春望》。全诗如下：\n\n国破山河在，城春草木深。\n\n感时花溅泪'
Prompt: 'plz tell me a story: ', Generated text: "2nd grade reading level, about a girl who wants to be a scientist when she grows up and she goes to the moon\n\nOf course! Here is a story for you.\n\n### Luna's Big Dream\n\nLuna loved science. She loved it more than cartoons, more than ice cream, and almost as much as her dog, Sparky. Her room wasn't filled with dolls. It was filled with books about planets, a model of the solar system that hung from her ceiling, and a"

github-actions bot added module:core module:quantization labels Oct 14, 2025

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

MengqingCao force-pushed the ds32 branch from bf325ab to 5e6930b Compare October 14, 2025 01:16

zzzzwwjj approved these changes Oct 14, 2025

View reviewed changes

MengqingCao mentioned this pull request Oct 14, 2025

[KVCache] Refactor KVCache as page_size_bytes is ineffective #3438

Merged

github-actions bot added merge-conflicts module:tests labels Oct 14, 2025

[DS32] Adapt deepseek-v3.2 to 0.11.0

6d99915

Signed-off-by: MengqingCao <[email protected]>

MengqingCao force-pushed the ds32 branch from 1fb84eb to 6d99915 Compare October 15, 2025 02:28

github-actions bot removed the merge-conflicts label Oct 15, 2025

MengqingCao commented Oct 15, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Oct 15, 2025

wangxiyuan approved these changes Oct 15, 2025

View reviewed changes

wangxiyuan merged commit 8abe517 into vllm-project:main Oct 15, 2025
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 #3432

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 #3432

Uh oh!

MengqingCao commented Oct 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MengqingCao commented Oct 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

MengqingCao Oct 15, 2025

Uh oh!

Yikun Oct 15, 2025

Uh oh!

MengqingCao Oct 15, 2025

Uh oh!

MengqingCao commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 #3432

[Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 #3432

Uh oh!

Conversation

MengqingCao commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MengqingCao commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 14, 2025

Uh oh!

MengqingCao Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MengqingCao commented Oct 14, 2025 •

edited

Loading

MengqingCao commented Oct 14, 2025 •

edited

Loading