VertexAI Training and Gated model access

Hi everyone,

I hope that you are well.

Quick question: is using a valid token (WRITE) enough to setup a custom training job on VertexAI using a gated model (aka TxGemma)?

Despite my token is valid (for WRITE), it seems that my custom training job fails to retrieve the TxGemma-2-predict model. This is a gated model and I was assuming that providing a valid TOKEN was enough but I am not sure anymore.

Thanks

Best regards

Jerome

1 Like

Assuming the token itself is correct.
In Vertex AI, it seems you need to explicitly pass the token “as an environment variable” to the container…


Short direct answer:

  • From the Hugging Face side, yes: a valid user access token with write scope (which includes read) is enough to access a gated model like google/txgemma-2b-predict, as long as the same HF account behind that token has accepted the TxGemma terms. (Hugging Face)
  • From the Vertex AI side, that token only works if you inject it into the training container as an environment variable (for example HF_TOKEN) and your code (or transformers/TRL) uses it when calling from_pretrained. The official Vertex + TRL examples for gated models do exactly this. (Hugging Face)

So: just having a valid token in your account is not enough; you must pass it to the Vertex job in the right way.

Below is the “correct way” with context and a concrete pattern.


1. Hugging Face side: gating + token requirements

1.1 TxGemma is a gated model

TxGemma models on Hugging Face (e.g. google/txgemma-9b-chat, google/txgemma-9b-predict, google/txgemma-2b-predict) are “gated”:

  • The repo page says:

    “This repository is publicly accessible, but you have to accept the conditions to access its files and content.” (Hugging Face)

  • To get access, you must:

    • Log in to Hugging Face in a browser.
    • Open the model page.
    • Click through and accept the Health AI Developer Foundations terms.
    • Access is granted immediately to that user account. (Hugging Face)

So step 0 is verifying that the HF user who owns your token actually sees “Access granted” on the TxGemma page.

1.2 Token scopes: read vs write

Hugging Face “User Access Tokens” are the standard way to authenticate apps and notebooks:

  • Docs: “User Access Tokens are the preferred way to authenticate… You can set the role (read, write, admin).” (Hugging Face)

  • For private or gated models, the requirement is: token must have read or broader scope. For example, BigQuery’s Vertex integration docs explicitly say:

    “The token must have the read role scope or broader” for gated models. (Google Cloud Documentation)

Your write token satisfies this; write ⊇ read, so scope is not the problem.

Typical error when gating is not correctly satisfied looks like what you may have seen on Kaggle for TxGemma:

“Access to model google/txgemma-2b-predict is restricted. You must have access to it and be authenticated to access it.” (Kaggle)

That can happen if:

  • Token is missing in the environment, or
  • Token belongs to a different user than the one who accepted the terms.

2. Vertex AI side: how to specify the token correctly

The canonical pattern is shown in the Hugging Face + Google Cloud examples for fine-tuning Mistral or Gemma with TRL on Vertex AI. They all do the same thing:

  1. Use a Hugging Face PyTorch Training DLC image.
  2. Create a CustomContainerTrainingJob whose command runs trl sft.
  3. Pass the HF token via environment_variables={"HF_TOKEN": ...}. (Hugging Face)

2.1. Official example pattern (Mistral / Gemma on Vertex)

From the “Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI” guide: (Hugging Face)

  • They define a CustomContainerTrainingJob:

    job = aiplatform.CustomContainerTrainingJob(
        display_name="trl-full-sft",
        container_uri=os.getenv("CONTAINER_URI"),
        command=[
            "sh",
            "-c",
            'exec trl sft "$@"',
            "--",
        ],
    )
    
  • They build args for trl sft (model, dataset, hyperparameters). (Hugging Face)

  • Crucially, when they call job.submit(...), they pass the token via environment_variables:

    from huggingface_hub import get_token
    
    job.submit(
        args=args,
        replica_count=1,
        machine_type="a2-highgpu-4g",
        accelerator_type="NVIDIA_TESLA_A100",
        accelerator_count=4,
        base_output_dir=f"{os.getenv('BUCKET_URI')}/Mistral-7B-v0.3-SFT-Guanaco",
        environment_variables={
            "HF_HOME": "/root/.cache/huggingface",
            "HF_TOKEN": get_token(),
            "TRL_USE_RICH": "0",
            "ACCELERATE_LOG_LEVEL": "INFO",
            "TRANSFORMERS_LOG_LEVEL": "INFO",
            "TQDM_POSITION": "-1",
        },
    )
    

The Gemma/LoRA TRL example is identical in structure and explicitly notes:

“As you are fine-tuning a gated model … you need to set the HF_TOKEN environment variable.” (Hugging Face)

The Vertex AI community samples for Gemma and Llama follow the same pattern: they read HF_TOKEN from the notebook/UI and add it to env_vars before constructing the job. (GitHub)

2.2. How transformers / TRL picks up the token

Recent huggingface_hub / transformers automatically look for environment variables like:

  • HF_TOKEN
  • HUGGING_FACE_HUB_TOKEN

If one is set, AutoModel.from_pretrained("google/txgemma-2b-predict") will use it to authenticate. (Hugging Face)

So as long as:

  • Your TRL CLI (inside the DLC) uses standard from_pretrained, and
  • You pass HF_TOKEN correctly,

you do not need to manually add token=... in your training script.

If you want to be explicit in a custom script, you can still do:

import os
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "google/txgemma-2b-predict"
token = os.environ["HF_TOKEN"]

tokenizer = AutoTokenizer.from_pretrained(model_id, token=token)
model = AutoModelForCausalLM.from_pretrained(model_id, token=token)

But with the DLC + TRL CLI examples, the environment variable alone is normally enough.


3. Concrete “correct way” for your TxGemma job

Assuming:

  • You’re using the Hugging Face PyTorch Training DLC (as in the TRL examples).
  • You want to fine-tune google/txgemma-2b-predict with trl sft on Vertex AI.

The steps are:

  1. Verify HF-side gating with the right account

    • Log in to Hugging Face as the user who owns the token.
    • Open https://huggingface.co/google/txgemma-2b-predict.
    • Confirm that it shows you have access (no “request access” banner; terms accepted). (Hugging Face)
  2. Create or copy your HF token (write is fine)

    • Go to “Settings → Access Tokens” and create a token with read or write scope. (Hugging Face)
    • This token must belong to the same user as step 1.
  3. In your notebook/script, pass the token into Vertex via env vars

    In Python:

    from google.cloud import aiplatform
    import os
    
    aiplatform.init(
        project=os.getenv("PROJECT_ID"),
        location=os.getenv("LOCATION"),
        staging_bucket=os.getenv("BUCKET_URI"),
    )
    
    HF_TOKEN = "hf_xxx"  # or read from a secret / env var
    
    job = aiplatform.CustomContainerTrainingJob(
        display_name="txgemma-2b-sft",
        container_uri=os.getenv("CONTAINER_URI"),  # HF PyTorch DLC
        command=[
            "sh",
            "-c",
            'exec trl sft "$@"',
            "--",
        ],
    )
    
    args = [
        "--model_name_or_path=google/txgemma-2b-predict",
        "--torch_dtype=bfloat16",
        # + your dataset + training args...
    ]
    
    job.run(
        args=args,
        replica_count=1,
        machine_type="g2-standard-12",
        accelerator_type="NVIDIA_L4",
        accelerator_count=1,
        base_output_dir=f"{os.getenv('BUCKET_URI')}/txgemma-2b-sft",
        environment_variables={
            "HF_HOME": "/root/.cache/huggingface",
            "HF_TOKEN": HF_TOKEN,  # <- key line
            "TRL_USE_RICH": "0",
            "ACCELERATE_LOG_LEVEL": "INFO",
            "TRANSFORMERS_LOG_LEVEL": "INFO",
        },
    )
    

    This mirrors the official HF/Vertex examples almost exactly, just swapping mistralai/Mistral-7B-v0.3 for google/txgemma-2b-predict. (Hugging Face)

  4. Use the correct model ID

    In --model_name_or_path, use the full HF repo id:

    --model_name_or_path=google/txgemma-2b-predict
    

    not a Vertex Model Garden ID or a local alias. (Featherless)

  5. Check Cloud Logs if it still fails

    • 401 or “restricted” → token or gating issue (wrong account, typo in token, env var missing).
    • 404 → wrong model id.
    • Timeout / DNS → networking (VPC/SCC) blocking outbound access to huggingface.co. (Hugging Face Forums)

4. Answering your exact questions explicitly

is using a valid token (WRITE) enough to setup a custom training job on VertexAI using a gated model (aka TxGemma)?

From Hugging Face’s perspective: yes. Any token with read or broader scope is sufficient to download a gated model, provided the associated user has accepted the model’s terms. (Hugging Face)

From Vertex AI’s perspective: the token must be correctly passed into the container (as HF_TOKEN or HUGGING_FACE_HUB_TOKEN) and actually used by transformers / TRL. If your job “fails to retrieve the model,” the usual reasons are:

  • The token is not set in environment_variables,
  • It belongs to a user that hasn’t accepted TxGemma’s terms,
  • The model id is wrong, or
  • There’s a network restriction.

Correct way to specify Hugging Face token on VertexAI?

The standard, documented way is:

  • In your CustomContainerTrainingJob.run() (or .submit()), pass:

    environment_variables={
        "HF_TOKEN": "<your_hf_token>",
        "HF_HOME": "/root/.cache/huggingface",
        ...
    }
    

    exactly as the Mistral/Gemma TRL examples do when fine-tuning gated models. (Hugging Face)

  • Optionally, use HUGGING_FACE_HUB_TOKEN instead; recent huggingface_hub respects both names.

This is the “correct” and common approach used both in Hugging Face’s official docs and Google’s Vertex AI sample notebooks.

Thanks John

I confirm that I have the correct Token and than HF authenticate me correctly. as I have a 404, I am now checking if the name / path of the model is retrieved correctly from the yaml file I am using to the configuration. :slight_smile:

Best regards

Jerome

1 Like

I think that I have an issue in the way that the config file is parsed by trl. Gemini gave me several ways to by-pass the possible issue. I am testing them now. Hopefully I can find a way to continue using the yaml file as it is nicer code when I use the CLI command. I cannot pass all the configuration parameter explicitly LOL :smiley:

1 Like

Oh… It seems like you have to explicitly pass YAML path as an argument…:sweat_smile:


Below is a focused, Vertex-AI-specific view of how TRL + YAML config works, what can go wrong, and how to set it up cleanly.


1. Architecture: what actually runs in “Vertex + TRL”

In the standard pattern that Google and Hugging Face document:

  1. You use a Hugging Face PyTorch Training DLC as the container image.

  2. On Vertex AI, you create a CustomContainerTrainingJob whose command runs the TRL CLI, typically:

    command=[
        "sh",
        "-c",
        'exec trl sft "$@"',
        "--",
    ]
    

    so Vertex executes trl sft inside the DLC. (Hugging Face)

  3. You pass TRL CLI arguments via the job’s args (e.g. --config=..., --model_name_or_path=...).

  4. Vertex automatically mounts your GCS bucket into the container at /gcs/<BUCKET>. (Hugging Face Forums)

From TRL’s perspective, once the container starts, this is just:

trl sft --config /gcs/my-bucket/configs/sft_config.yaml

running on a Linux machine. Everything about YAML parsing, dataset handling, etc., is exactly the same as on your local machine.

So the key pieces are:

  • The Python that launches the Vertex job
  • The YAML config consumed by TRL
  • The environment variables (HF token, HF cache, etc.)

2. How the YAML is parsed by TRL inside the Vertex container

The TRL CLI (trl sft, trl dpo, etc.) uses TrlParser, which is a thin extension around HfArgumentParser. (Hugging Face)

Mechanism:

  1. You call:

    trl sft --config /path/to/sft_config.yaml
    
  2. Internally:

    • TrlParser.parse_args_and_config():

      • Loads the YAML file.
      • Applies any env: block at the top level (sets environment variables).
      • Maps other top-level keys into dataclasses like SFTConfig, ScriptArguments, etc.
    • CLI flags (like --num_train_epochs) override values from the YAML. (Hugging Face)

A minimal example from the docs:

# config.yaml
env:
  VAR1: value1
arg1: 23
arg2: alpha
parser = TrlParser(dataclass_types=[MyArguments])
training_args = parser.parse_args_and_config()
# python main.py --config config.yaml  -> arg1=23, arg2='alpha', VAR1 in env

If you call python main.py --arg1 5 --arg2 beta, the CLI args override the YAML. (Hugging Face)

So:

  • YAML is authoritative unless overridden on the CLI.
  • All keys must match known dataclass fields (model_name_or_path, dataset_name, datasets, etc.).
  • env: must be at the top level.

This behaviour is identical in Vertex and locally.


3. Typical “Vertex + TRL + YAML” layout

3.1. YAML config on GCS

You place your config in GCS, for example:

gs://my-bucket/configs/sft_config.yaml

Inside the Vertex container, this is:

/gcs/my-bucket/configs/sft_config.yaml

You then reference this path in your job’s args:

args = [
    "--config=/gcs/my-bucket/configs/sft_config.yaml",
]

3.2. Example config for Vertex + TRL

A typical sft_config.yaml for Gemma/TxGemma on Vertex looks like:

# ---------------------------
# 1. Environment variables
# ---------------------------
env:
  HF_HOME: /root/.cache/huggingface

# You *can* put HF_TOKEN here, but on Vertex it is safer
# to inject it from the job's environment_variables (see below).

# ---------------------------
# 2. Model + training params
# ---------------------------
model_name_or_path: google/txgemma-2b-predict
output_dir: /gcs/my-bucket/outputs/txgemma-herg
overwrite_output_dir: true

max_seq_length: 1024
per_device_train_batch_size: 2
per_device_eval_batch_size: 2
gradient_accumulation_steps: 8
num_train_epochs: 3
learning_rate: 5e-5
warmup_ratio: 0.05
weight_decay: 0.01
bf16: true

# ---------------------------
# 3. LoRA / PEFT
# ---------------------------
use_peft: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.1
lora_target_modules:
  - q_proj
  - v_proj
  - o_proj

# ---------------------------
# 4. Dataset (local/GCS)
# ---------------------------
datasets:
  - path: json
    data_files:
      train: /gcs/my-bucket/drug-herg/train.jsonl
      validation: /gcs/my-bucket/drug-herg/eval.jsonl
    split: train
    columns: [prompt, completion]

dataset_name: null       # ignored when datasets: is present
dataset_text_field: null # ignored

# ---------------------------
# 5. SFT-specific options
# ---------------------------
completion_only_loss: true

This structure matches TRL’s script utilities docs: datasets: is a mixture, with each entry mapping to datasets.load_dataset(path, data_files, ...). (Hugging Face)

On Vertex, GCS paths under /gcs/... look like local files to datasets.load_dataset. (Hugging Face Forums)


4. Vertex job definition: how YAML and environment tie together

In the Vertex notebook/script that launches training, you typically do:

from google.cloud import aiplatform
import os

aiplatform.init(
    project=os.getenv("PROJECT_ID"),
    location=os.getenv("LOCATION"),
    staging_bucket=os.getenv("BUCKET_URI"),
)

HF_TOKEN = "hf_..."  # ideally loaded from a secret or env var

job = aiplatform.CustomContainerTrainingJob(
    display_name="txgemma-2b-sft",
    container_uri=os.getenv("CONTAINER_URI"),  # HF PyTorch Training DLC
    command=[
        "sh",
        "-c",
        'exec trl sft "$@"',
        "--",
    ],
)

args = [
    "--config=/gcs/my-bucket/configs/sft_config.yaml",
]

job.run(
    args=args,
    replica_count=1,
    machine_type="g2-standard-12",
    accelerator_type="NVIDIA_L4",
    accelerator_count=1,
    base_output_dir=f"{os.getenv('BUCKET_URI')}/outputs/txgemma-2b-sft",
    environment_variables={
        "HF_TOKEN": HF_TOKEN,
        "HF_HOME": "/root/.cache/huggingface",
        "TRL_USE_RICH": "0",
        "ACCELERATE_LOG_LEVEL": "INFO",
        "TRANSFORMERS_LOG_LEVEL": "INFO",
    },
)

This pattern matches the official Mistral/Gemma TRL examples almost exactly; they simply swap in the appropriate model ID and dataset. (Hugging Face)

Key points:

  • YAML is loaded inside the container when trl sft --config ... runs.
  • environment_variables are set by Vertex before your command runs.
  • HF_TOKEN in environment_variables is the recommended way to authenticate to gated Hugging Face models from Vertex. (Hugging Face)

If YAML also has an env: block, those variables are set in addition to what Vertex provided, but in practice you normally prefer to inject secrets like HF_TOKEN via Vertex and use YAML env: for non-sensitive defaults.


5. Common “Vertex + TRL + YAML” failure modes

Given you suspect a parsing issue, here are the most common pitfalls in this exact setup.

5.1. Wrong path for --config in Vertex

  • You must pass the path as seen inside the container.
  • gs://... is not a valid local path inside the container; inside, it becomes /gcs/<BUCKET>/.... (Hugging Face Forums)
  • If you pass --config=gs://my-bucket/..., TRL will fail to open the file and ignore the config, falling back to CLI/defaults.

Check that:

args = ["--config=/gcs/my-bucket/configs/sft_config.yaml"]

and that this file exists (you can test with a tiny debug job that runs ls -R /gcs/my-bucket).

5.2. YAML keys not matching TRL’s expected names

For your version of TRL, the script utilities docs at that version are the source of truth. (Hugging Face)

Common mistakes:

  • Using train_dataset instead of datasets / dataset_name.
  • Misspelling keys like completion_only_loss, per_device_train_batch_size.
  • Nesting keys under sub-dicts that TRL doesn’t know (e.g. training: {max_seq_length: 1024} instead of max_seq_length: 1024 at top level).

If a key is unknown, TrlParser may:

  • Throw an “unrecognized argument” error (best case), or
  • Simply not set that field, leaving the default (worst case, more confusing).

5.3. Mixing datasets: and dataset_name incorrectly

In current TRL, when you provide datasets: (mixture), it is supposed to be used instead of dataset_name. (Hugging Face)

However, some earlier CLI versions had issues when:

  • YAML defined datasets:, but
  • The CLI still had --dataset_name or default assumptions, leading to errors like “the following arguments are required: --dataset_name”.

Safer patterns:

  • If you use datasets: in YAML, do not pass --dataset_name on the CLI.
  • In YAML, explicitly set dataset_name: null (or omit it), and rely only on datasets:.

5.4. env: block mis-indented or unused

To have TRL set env vars from YAML, env: must be at the top level. (Hugging Face)

Correct:

env:
  HF_HOME: /root/.cache/huggingface
  SOME_FLAG: "1"
model_name_or_path: ...

Incorrect (and ignored):

training:
  env:
    HF_HOME: /root/.cache/huggingface

For HF_TOKEN, it is usually better to inject it via Vertex environment_variables rather than YAML, because:

  • You can use secrets / runtime injection.
  • You do not need to put secrets into a file stored in GCS.

5.5. TRL version mismatch inside the DLC

It’s easy to end up with:

  • HF PyTorch DLC’s pre-installed TRL version, plus
  • Another TRL version pulled at runtime (pip install trl -U in a startup script).

If you follow docs for TRL 0.15 but the container uses 0.8 or a dev version, YAML keys or semantics may differ. Issues like “TrlParser not working with --config” have been reported around version changes. (GitHub)

Check inside the container (in a small debug job):

python -c "import trl, inspect; print('TRL version:', trl.__version__)"

and then consult the docs for that exact version.


6. Practical debugging loop for your case

Given where you are now, a targeted debugging sequence in Vertex would be:

  1. Debug job 1: Verify config path and TRL version

    Submit a tiny job whose command is:

    sh -c 'ls -R /gcs && python -c "import trl; print(trl.__version__)"'
    

    to ensure:

    • /gcs/my-bucket/configs/sft_config.yaml exists.
    • You know the TRL version.
  2. Debug job 2: Minimal TRL + YAML

    Put a minimal YAML on GCS:

    model_name_or_path: google/txgemma-2b-predict
    dataset_name: stanfordnlp/imdb
    output_dir: /gcs/my-bucket/debug-out
    num_train_epochs: 1
    per_device_train_batch_size: 1
    

    Run:

    args = ["--config=/gcs/my-bucket/configs/minimal.yaml"]
    

    and confirm the job starts and uses those values. If this works, you know YAML parsing is fine in principle.

  3. Debug job 3: Add datasets: and LoRA step by step

    Incrementally extend the YAML with:

    • datasets: with a local JSONL dataset.
    • LoRA settings.
    • completion_only_loss.

    If at any step something breaks, you can isolate exactly which section is mis-specified.


Summary for “Vertex + TRL” specifically

  • In Vertex custom training with HF PyTorch DLC, trl sft runs inside the container exactly as it would locally.

  • You pass a YAML config via --config=/gcs/<BUCKET>/.../sft_config.yaml.

  • TRL’s TrlParser reads that YAML, sets env: variables, and maps keys into SFTConfig etc. CLI args override YAML. (Hugging Face)

  • Common Vertex-specific pitfalls are:

    • Using gs://... instead of /gcs/... for --config and data_files. (Hugging Face Forums)
    • Mixing dataset_name and datasets: incorrectly.
    • Mis-indented or wrong YAML keys.
    • TRL version mismatch inside the DLC.
  • For gated models (TxGemma), pass your HF token via Vertex environment_variables (HF_TOKEN) and let TRL/Transformers pick it up in the container.