sglang inference error
#1
by
henryhaohao - opened
Does anyone know how to run inference on this model with sglang? π
I try this, but failed finally.
uv run python -m sglang.launch_server \
--model-path /home/models/GLM-4.7-REAP-40-W4A16 \
--served-model-name GLM-4.7-REAP-40-W4A16 \
--tp-size 2 \
--host 0.0.0.0 \
--trust-remote-code \
--port 8888
and the error:
File "/home/hoo/Project/sglang-nightly/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 445, in fused_experts_impl
invoke_fused_moe_kernel(
~~~~~~~~~~~~~~~~~~~~~~~^
curr_hidden_states,
^^^^^^^^^^^^^^^^^^^
...<22 lines>...
filter_expert=filter_expert,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/hoo/Project/sglang-nightly/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe_triton_kernels.py", line 704, in invoke_fused_moe_kernel
C.stride(2),
~~~~~~~~^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
[2026-01-03 17:53:44] Received sigquit from a child process. It usually means the child failed.
I run with vllm, but it shows "gptq" is wrong parameter, did they run with vllm for any testing?