ViT
Collection
11 items • Updated
How to use keras/vit_base_patch16_384_imagenet with KerasHub:
import keras_hub
import keras
# Load ImageClassifier model
image_classifier = keras_hub.models.ImageClassifier.from_preset(
"hf://keras/vit_base_patch16_384_imagenet",
num_classes=2,
)
# Fine-tune
image_classifier.fit(
x=keras.random.randint((32, 64, 64, 3), 0, 256),
y=keras.random.randint((32, 1), 0, 2),
)
# Classify image
image_classifier.predict(keras.random.randint((1, 64, 64, 3), 0, 256))
import keras_hub
# Create a Backbone model unspecialized for any task
backbone = keras_hub.models.Backbone.from_preset("hf://keras/vit_base_patch16_384_imagenet")
How to use keras/vit_base_patch16_384_imagenet with Keras:
# Available backend options are: "jax", "torch", "tensorflow".
import os
os.environ["KERAS_BACKEND"] = "jax"
import keras
model = keras.saving.load_model("hf://keras/vit_base_patch16_384_imagenet")
Vision Transformer (ViT) adapts the Transformer architecture, originally designed for natural language processing, to the domain of computer vision. It treats images as sequences of patches, similar to how Transformers treat sentences as sequences of words.. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
Keras and KerasHub can be installed with:
pip install -U -q keras-hub
pip install -U -q keras
| Model ID | img_size | Acc | Top-5 | Parameters |
|---|---|---|---|---|
| Base | ||||
| vit_base_patch16_224_imagenet | 224 | - | - | 85798656 |
| vit_base_patch_16_224_imagenet21k | 224 | - | - | 85798656 |
| vit_base_patch_16_384_imagenet | 384 | - | - | 86090496 |
| vit_base_patch32_224_imagenet21k | 224 | - | - | 87455232 |
| vit_base_patch32_384_imagenet | 384 | - | - | 87528192 |
| Large | ||||
| vit_large_patch16_224_imagenet | 224 | - | - | 303301632 |
| vit_large_patch16_224_imagenet21k | 224 | - | - | 303301632 |
| vit_large_patch16_384_imagenet | 224 | - | - | 303690752 |
| vit_large_patch32_224_imagenet21k | 224 | - | - | 305510400 |
| vit_large_patch32_384_imagenet | 224 | - | - | 305607680 |
| Huge | ||||
| vit_huge_patch14_224_imagenet21k | 224 | - | - | 630764800 |
image_classifier = keras_hub.models.ImageClassification.from_preset(
"vit_base_patch16_384_imagenet"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
image_classifier(input_data)
backbone = keras_hub.models.Backbone.from_preset(
"vit_base_patch16_384_imagenet"
)
preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
"vit_base_patch16_384_imagenet"
)
model = keras_hub.models.ViTImageClassifier(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor,
)
image_classifier = keras_hub.models.ImageClassification.from_preset(
"hf://keras/vit_base_patch16_384_imagenet"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
image_classifier(input_data)
backbone = keras_hub.models.Backbone.from_preset(
"hf://keras/vit_base_patch16_384_imagenet"
)
preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
"hf://keras/vit_base_patch16_384_imagenet"
)
model = keras_hub.models.ViTImageClassifier(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor,
)