ubergarm/Step-3.5-Flash-GGUF

anikifoss

9 days ago

Great quant, as always, thank you!

ubergarm

Owner 8 days ago

The iq4_xs has no imatrix, just for you! ;p

ik updated and its working right now off of tip of main as per: https://github.com/ikawrakow/ik_llama.cpp/pull/1240

So i'll cook an imatrix and release some of those too haha

anikifoss

8 days ago

Thanks, this quant fits perfectly on 4x R9700, getting about PP 2000 tokens/sec and TG 35 tokens/sec.

ubergarm

Owner 8 days ago

@anikifoss

I just updated the perplexity graphs, i got lucky with that iq4_xs as it is pretty good even compared to ik's newer quantization types!

It also seems to be better than the "official Int4" version that just got updated: https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4/discussions/13

I did a quick test and it seems to be able to vibe code some small c++ okay at least, though folks have been saying it is quite chatty when thinking.

anikifoss

8 days ago

•

edited 8 days ago

It seems decent at coding, but I can't tool calling to work with llama.cpp. Getting this error, probably because of limited templating support in llama.cpp:
perator(): got exception: {"error":{"code":500,"message":"\n------------\nWhile executing FilterExpression at line 55, column 63 in source:\n...- for args_name, args_value in arguments|items %}↵ {{- '<...\n ^\nError: Unknown (built-in) filter 'items' for type String","type":"server_error"}}

mindkrypted

8 days ago

@anikifoss

I've been able to use native tool calls with OpenCode using this modified version.

It's been working great for hundreds of calls with the only drawback of seeing artifact of the tool call in the TUI.
e.g.

First, let me check what's in the current directory and explore the project structure.
I'll analyze this codebase to create an AGENTS.md file with build/lint/test commands and code style guidelines.<tool_call>
<function=bash

{% macro render_content(content) %}
    {%- if content is none -%}
        {{- '' }}
    {%- elif content is string -%}
        {{- content }}
    {%- elif content is mapping -%}
        {{- content['value'] if 'value' in content else content['text'] }}
    {%- elif content is iterable -%}
        {%- for item in content -%}
            {%- if item is string -%}
                {{- item }}
            {%- elif item.type == 'text' -%}
                {{- item['value'] if 'value' in item else item['text'] }}
            {%- elif item.type == 'image' -%}
                <im_patch>
            {%- endif -%}
        {%- endfor -%}
    {%- endif -%}
{% endmacro %}
{{- bos_token }}
{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- render_content(messages[0].content) + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou have access to the following functions in JSONSchema format:\n\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson(ensure_ascii=False) }}
    {%- endfor %}
    {{- "\n</tools>\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...>\n...\n</function> block must be nested within <tool_call>\n...\n</tool_call> XML tags\n- Required parameters MUST be specified\n</IMPORTANT><|im_end|>\n" }}
{%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + render_content(messages[0].content) + '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}
{%- set ns = namespace(last_query_index=0) %}
{%- for message in messages %}
    {%- if message.role == "user" %}
        {%- set content_str = render_content(message.content) %}
        {%- if content_str is string and not(content_str.startswith('<tool_response>') and content_str.endswith('</tool_response>')) %}
            {%- set ns.last_query_index = loop.index0 %}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{%- for message in messages %}
    {%- set content = render_content(message.content) %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {%- set role_name = 'observation' if (message.role == "system" and not loop.first and message.name == 'observation') else message.role %}
        {{- '<|im_start|>' + role_name + '\n' + content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = render_content(message.reasoning_content) %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
            {%- else %}
                {%- set reasoning_content = '' %}
            {%- endif %}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n' + content }}
        {%- else %}
            {{- '<|im_start|>' + message.role + '\n' + content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if tool_call.function is defined %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
                {%- if tool_call.arguments is defined %}
                    {%- set arguments = tool_call.arguments %}
                    {# FIX: Removed fromjson, use .items(), added mapping check #}
                    {%- if arguments is mapping %}
                        {%- for args_name, args_value in arguments.items() %}
                            {{- '<parameter=' + args_name + '>\n' }}
                            {%- set args_value = args_value | tojson(ensure_ascii=False) | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
                            {{- args_value }}
                            {{- '\n</parameter>\n' }}
                        {%- endfor %}
                    {%- endif %}
                {%- endif %}
                {{- '</function>\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>tool_response\n' }}
        {%- endif %}
        {{- '<tool_response>' }}
        {{- content }}
        {{- '</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}

ubergarm

Owner 8 days ago

@anikifoss

Yeah I noticed it was giving similar issues on mainline for me too, and the recommendation I've heard from pwilkin is to try this branch: https://github.com/ggml-org/llama.cpp/pull/18675

@mindkrypted

Thanks for sharing yours, its been tricky with all the new models getting the chat templates just right to work with the web interface for simple chats and also all the various clients. I'm doing testing with pydantic-ai mostly.

I've had good luck in a couple simple tests with the default chat template on ik_llama.cpp, but if I hit any snags I'll try your template jinja out!!

anikifoss

8 days ago

@mindkrypted thank worked, thank you!

ubergarm

Owner 8 days ago

Super thanks for testing y'all, I linked to this discussion in the model card! Cheers!

ubergarm

Owner 7 days ago

@gelim

I saw your post over here: https://github.com/ggml-org/llama.cpp/pull/19283#issuecomment-3868260203

Have you tried the chat template above? Or have you tried running mainline llama.cpp with pwilkin's autoparser branch: https://github.com/ggml-org/llama.cpp/pull/18675

ubergarm
/

Step-3.5-Flash-GGUF

Thank you!