0xSero/GLM-4.7-REAP-40-W4A16 · sglang inference error

sglang inference error

by henryhaohao - opened Jan 3

Jan 3

Does anyone know how to run inference on this model with sglang? 😊
I try this, but failed finally.

uv run python -m sglang.launch_server \
  --model-path /home/models/GLM-4.7-REAP-40-W4A16 \
  --served-model-name GLM-4.7-REAP-40-W4A16 \
  --tp-size 2 \
  --host 0.0.0.0 \
  --trust-remote-code \
  --port 8888

and the error:

  File "/home/hoo/Project/sglang-nightly/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 445, in fused_experts_impl
    invoke_fused_moe_kernel(
    ~~~~~~~~~~~~~~~~~~~~~~~^
        curr_hidden_states,
        ^^^^^^^^^^^^^^^^^^^
    ...<22 lines>...
        filter_expert=filter_expert,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/hoo/Project/sglang-nightly/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe_triton_kernels.py", line 704, in invoke_fused_moe_kernel
    C.stride(2),
    ~~~~~~~~^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

[2026-01-03 17:53:44] Received sigquit from a child process. It usually means the child failed.

ArthurWWW

Jan 6

I run with vllm, but it shows "gptq" is wrong parameter, did they run with vllm for any testing?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment