Calls that default to conversational task fail with a 404

#82
by ccozad - opened

tl;dr; The HuggingFaceInferenceAPI class in LlamaIndex calls the conversational API and this causes a 404. A work around is to add a task="text-generation" parameter to force the library to use a valid task name. This may be a problem in other areas that default to the conversational task

I ran into an issue in the "Components of LlamaIndex" notebook where the section that has you create a VectorStoreIndex in LlamaIndex and then use it (index.as_query_engine(...) and then query_engine.query(...))that then throws a 404 not found exception like:

huggingface_hub.errors.HfHubHTTPError: 404 Client Error: Not Found for url: /static-proxy?url=https%3A%2F%2Frouter.huggingface.co%2Fhf-inference%2Fmodels%2FQwen%2FQwen2.5-Coder-32B-Instruct%2Fv1%2Fchat%2Fcompletions%3C%2Fcode%3E%3C%2Fp%3E

Based on web searches, conversational tasks were deprecated in 2024. It looks like the task was perhaps finally removed recently?

The relevant packages in LlamaIndex state the following

class HuggingFaceInferenceAPI(FunctionCallingLLM):
    """
    Wrapper on the Hugging Face's Inference API.

    Overview of the design:
    - Synchronous uses InferenceClient, asynchronous uses AsyncInferenceClient
    - chat uses the conversational task: https://huggingface.co/tasks/conversational
    - complete uses the text generation task: https://huggingface.co/tasks/text-generation

    Note: some models that support the text generation task can leverage Hugging
    Face's optimized deployment toolkit called text-generation-inference (TGI).
    Use InferenceClient.get_model_status to check if TGI is being used.

    Relevant links:
    - General Docs: https://huggingface.co/docs/api-inference/index
    - API Docs: https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client
    - Source: https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub/inference
    """

The HuggingFaceInferenceAPI can be called with a task parameter to workaround this issue like task="text-generation" to force the library to use a valid task name.

The full call should look like the following:

llm = HuggingFaceInferenceAPI(
    token=hf_token, 
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    task="text-generation"
)

or like so if using a notebook with the HF token set earlier:

llm = HuggingFaceInferenceAPI(
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    task="text-generation"
)

Relevant issue on the LlamaIndex side: https://github.com/run-llama/llama_index/issues/18547

Sign up or log in to comment