sglang inference error

#1
by henryhaohao - opened

Does anyone know how to run inference on this model with sglang? 😊
I try this, but failed finally.

uv run python -m sglang.launch_server \
  --model-path /home/models/GLM-4.7-REAP-40-W4A16 \
  --served-model-name GLM-4.7-REAP-40-W4A16 \
  --tp-size 2 \
  --host 0.0.0.0 \
  --trust-remote-code \
  --port 8888

and the error:

  File "/home/hoo/Project/sglang-nightly/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 445, in fused_experts_impl
    invoke_fused_moe_kernel(
    ~~~~~~~~~~~~~~~~~~~~~~~^
        curr_hidden_states,
        ^^^^^^^^^^^^^^^^^^^
    ...<22 lines>...
        filter_expert=filter_expert,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/hoo/Project/sglang-nightly/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe_triton_kernels.py", line 704, in invoke_fused_moe_kernel
    C.stride(2),
    ~~~~~~~~^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

[2026-01-03 17:53:44] Received sigquit from a child process. It usually means the child failed.

I run with vllm, but it shows "gptq" is wrong parameter, did they run with vllm for any testing?

Sign up or log in to comment