HF document not works when try to deploy on Sagemaker

Hi, I am following this (Deploy models to Amazon SageMaker) to deploy a model on AWS Sagemaker, I have installed latest 3.0 Sagemaker, but having weird problem to deploy the model on AWS Sagemaker. seems document is not accurate to deploy the model now or out dated. Can anyone help fixing?

Errors: 1.

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[32], line 1
----> 1 from sagemaker.huggingface.model import HuggingFaceModel

ModuleNotFoundError: No module named 'sagemaker.huggingface'

Script:

WARNING: This snippet is not yet compatible with SageMaker version >= 3.0.0.

To use this snippet, install a compatible version:

pip install ‘sagemaker<3.0.0’

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client(‘iam’)
role = iam.get_role(RoleName=‘sagemaker_execution_role’)[‘Role’][‘Arn’]

Hub Model configuration. 


hub = {
‘HF_MODEL_ID’:‘google/medgemma-27b-it’,
‘SM_NUM_GPUS’: json.dumps(1),
‘HF_TOKEN’: ‘’
}

assert hub[‘HF_TOKEN’] != ‘’, “You have to provide a token.”

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri(“huggingface”,version=“3.3.6”),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type=“ml.g5.2xlarge”,
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
“inputs”: “Hi, what can you help me with?”,
})
1 Like

SageMaker version mismatch?


Your error is exactly what happens when you follow Hugging Face’s SageMaker “v2-style” examples while you have SageMaker Python SDK v3.x installed.

In SDK v3, the old high-level integration modules (including sagemaker.huggingface) are not available the same way. Hugging Face’s own AWS landing page warns that most docs/tutorials still target SDK v2 and explicitly tells you to install sagemaker<3.0.0. (Hugging Face)

So the “cause” is not mysterious. It is a version mismatch between:

  • Your installed SDK (v3.x), and
  • Docs/snippets that import sagemaker.huggingface (v2.x-era API). (Hugging Face)

Below are the practical fixes and the other pitfalls you will hit next (token name, instance size, GPU count).


What is happening (and why sagemaker.huggingface is missing)

1) Hugging Face docs are internally inconsistent right now

The “Deploy models to Amazon SageMaker” page still says things like “upgrade to the latest sagemaker” and shows v2 imports (from sagemaker.huggingface import HuggingFaceModel). (Hugging Face)
But the Hugging Face “Hugging Face on AWS” index page explicitly says SDK v3 was released and most docs still use v2, so you should pin <3.0.0. (Hugging Face)

2) SageMaker SDK v3 is a different interface

AWS describes v3 as a new “experience” that replaces a lot of v2 patterns with newer abstractions (notably ModelBuilder). (GitHub)
Net effect: v2 tutorial code will often break on v3.


Fix option A (fastest): use SageMaker SDK v2 since the HF docs you’re using are v2

Step 1: pin SDK to v2 in the SAME kernel/environment you run the notebook in

pip install -U "sagemaker<3.0.0"
# then restart the Jupyter kernel

This is exactly what Hugging Face tells you to do while they update tutorials. (Hugging Face)

Step 2: use the correct import (v2)

from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

This is the import Hugging Face shows in their inference doc. (Hugging Face)

Step 3: fix your environment variables (your script is currently wrong here)

In the LLM (TGI) section, the variable is HUGGING_FACE_HUB_TOKEN, not HF_TOKEN. (Hugging Face)

So your hub/env should look like:

hub = {
  "HF_MODEL_ID": "google/medgemma-27b-it",
  "SM_NUM_GPUS": "1",
  "HUGGING_FACE_HUB_TOKEN": "<YOUR_TOKEN>",
}

Hugging Face’s own example uses HUGGING_FACE_HUB_TOKEN and asserts it is set. (Hugging Face)


Fix option B (stay on SDK v3): switch to the v3 deployment path (ModelBuilder)

If you must stay on sagemaker>=3, assume you need to adopt the v3 way:

  • ModelBuilder is the central concept in AWS’s v3 docs and blog posts. (AWS Document)
  • SageMaker’s documentation shows ModelBuilder being used with a Hugging Face Hub model ID and passing HUGGING_FACE_HUB_TOKEN via env_vars. (Amazon SageMaker)

Reality check: v3 support for all Hugging Face tasks and containers has been an active area with open issues and ongoing work (so expect sharp edges). (GitHub)

If your primary goal is “deploy now”, option A is usually lower risk.


Problems you will likely hit next (even after the import works)

1) Your instance is almost certainly too small for a 27B model (unless you quantize heavily)

ml.g5.2xlarge has 1 NVIDIA A10G GPU with 24 GB VRAM. (Amazon Web Services, Inc.)
A rough sizing rule:

  • 27B parameters at fp16 is ~54 GB just for weights, plus runtime memory.
  • So you usually need multi-GPU tensor parallelism or quantization.

Hugging Face’s doc explicitly says SM_NUM_GPUS controls tensor parallel degree and is needed when the model is too big for one GPU. (Hugging Face)

Also note: G5 sizes with more GPUs exist (4-GPU and 8-GPU variants). (Amazon Web Services, Inc.)

2) You must set SM_NUM_GPUS to match the instance’s GPU count

  • ml.g5.2xlarge → 1 GPU, so SM_NUM_GPUS="1" is consistent.
  • If you move to a 4-GPU G5 size, set SM_NUM_GPUS="4".

HF calls this out and gives an example of choosing SM_NUM_GPUS to match instance GPU count. (Hugging Face)

3) Your pasted script uses “smart quotes”

You pasted curly quotes like and . Python will fail on those even before SageMaker does. Replace them with plain ' and ".

4) Sanity check: you might have a local naming collision

A classic cause of weird import behavior is having a file named sagemaker.py (or a folder named sagemaker) in your working directory. A Stack Overflow case with the same family of error was fixed by renaming a local sagemaker.py. (Stack Overflow)


What I would do in your exact case (lowest friction)

  1. Downgrade to v2 in that notebook kernel: pip install -U "sagemaker<3.0.0" then restart kernel. (Hugging Face)

  2. Use the v2 import: from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri. (Hugging Face)

  3. Fix token variable name: use HUGGING_FACE_HUB_TOKEN, not HF_TOKEN. (Hugging Face)

  4. Re-evaluate hardware:

    • If you insist on medgemma-27b-it, plan for a bigger multi-GPU instance or quantization.
    • ml.g5.2xlarge is 24 GB VRAM. That is commonly not enough for a 27B model in fp16. (Amazon Web Services, Inc.)

Similar issues and good references (high signal)

  • HF warning that most docs still use SDK v2 and to install sagemaker<3.0.0. (Hugging Face)
  • HF “Deploy models to SageMaker” page showing the v2-style imports and TGI env vars (HUGGING_FACE_HUB_TOKEN, SM_NUM_GPUS). (Hugging Face)
  • AWS explanation of ModelBuilder (the v3 direction) and how deployment flow changes. (AWS Document)
  • SageMaker docs example showing ModelBuilder + HF Hub model ID + HUGGING_FACE_HUB_TOKEN. (Amazon SageMaker)
  • Active v3 support discussion for Hugging Face tasks (expect gaps). (GitHub)
  • HF inference toolkit repo (what powers “zero-code” behavior). (GitHub)
  • G5 GPU memory and GPU counts by size (so you can choose instance properly). (Amazon Web Services, Inc.)
  • Similar import-confusion case caused by a local sagemaker.py file. (Stack Overflow)

Summary

  • Your import error is because you installed SageMaker SDK v3, but you are using HF docs/snippets written for v2. (Hugging Face)
  • Fast fix: pip install "sagemaker<3.0.0" + restart kernel. (Hugging Face)
  • Fix your env var name: use HUGGING_FACE_HUB_TOKEN. (Hugging Face)
  • ml.g5.2xlarge is 1×A10G 24 GB. A 27B model likely needs multi-GPU or quantization. (Amazon Web Services, Inc.)