Instructions to use apple/TiC-CLIP-bestpool-sequential with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TiC-CLIP
How to use apple/TiC-CLIP-bestpool-sequential with TiC-CLIP:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: custom-apple-license | |
| license_link: https://github.com/apple/ml-tic-clip/blob/main/LICENSE | |
| tags: | |
| - vision | |
| - zero-shot-image-classification | |
| datasets: | |
| - apple/TiC-DataComp | |
| library_name: tic-clip | |
| # Model Card for TiC-CLIP-bestpool-sequential | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| This repository contains TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) with data from 2014 to 2022 using our modified OpenCLIP code. | |
| For additional information refer to our [GitHub repo](https://github.com/apple/ml-tic-clip). | |
| ## Model Details | |
| ### Model Description | |
| Keeping large foundation models up to date on latest data is inherently expensive. | |
| To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. | |
| This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. | |
| We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: | |
| TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, | |
| contains over 12.7B timestamped image-text pairs spanning 9 years (2014-2022). | |
| We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. | |
| We show OpenAI's CLIP (trained on data up to 2020) loses ≈8% zero-shot accuracy on our curated retrieval task from 2021-2022 compared with more recently trained models in OpenCLIP repository. | |
| We then study how to efficiently train models on time-continuous data. | |
| We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by 2.5× when compared to the standard practice of retraining from scratch. | |
| Code is available at [this https URL](https://github.com/apple/ml-tic-clip). | |
| - **Developed by:** Apple | |
| - **License:** See [LICENSE](https://github.com/apple/ml-tic-clip/blob/main/LICENSE) | |
| ### Model Sources [optional] | |
| <!-- Provide the basic links for the model. --> | |
| - **Repository:** [ml-tic-clip GitHub repo](https://github.com/apple/ml-tic-clip) | |
| - **Paper:** [TiC-CLIP: Continual Training of CLIP Models, Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F., International Conference on Learning Representations (ICLR), 2024.](https://arxiv.org/abs/2310.16226) | |
| ## Uses | |
| Researchers can use TiC-CLIP pretrained models for faster design of continual learning methods by start from a pretrained checkpoint and continually train on the next year or next month data. | |
| ## How to Get Started with the Model | |
| The models are compatible with DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet. | |
| The models can also be used to resume a training or as initialization for new training using OpenCLIP code. | |
| Please follow instructions in our [GitHub repo](https://github.com/apple/ml-tic-clip) to create the evaluation sets or follow [DataComp](https://github.com/mlfoundations/datacomp) for the standard evaluations on 38 datasets. | |
| The following snippet assumes the TiC-DataComp data has been prepared and following the instructions in the GitHub repo. | |
| ### Training | |
| ```bash | |
| YEAR=2016 # There are no models before 2016 since data from 2014-2016 were compined into one year | |
| REPO="apple/TiC-CLIP-bestpool-sequential" | |
| huggingface-cli download $REPO checkpoints/$YEAR.pt | |
| ## Train Cummulative | |
| pushd datacomp | |
| final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/ | |
| torchrun --nproc_per_node 8 --nnodes 1 \ | |
| train.py \ | |
| --scale "tic_medium" \ | |
| --dataset_resampled \ | |
| --data_dir $final_data_dir \ | |
| --output_dir "./results/" \ | |
| --exp_name "datacomp_medium-basic_cumulative" \ | |
| --imagenet_val $IMAGENET_VAL_PATH \ | |
| --save_frequency 1 \ | |
| --resume | |
| popd | |
| ``` | |
| ### Evaluation | |
| ```bash | |
| ## Evaluate Model | |
| # Evaluate a ViT-B/16 model on TiC/Retrieval/Yearly/$YEAR and | |
| # TiC/DataCompNet/Yearly/$YEAR | |
| pushd datacomp | |
| python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly | |
| python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification | |
| ``` | |
| ### OpenCLIP Load and Inference Example | |
| ```python | |
| import open_clip | |
| from huggingface_hub import hf_hub_download | |
| filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-sequential", filename="checkpoints/2016.pt") | |
| model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename) | |
| tokenizer = open_clip.get_tokenizer('ViT-B-16') | |
| image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0) | |
| text = tokenizer(["a diagram", "a dog", "a cat"]) | |
| with torch.no_grad(), torch.cuda.amp.autocast(): | |
| image_features = model.encode_image(image) | |
| text_features = model.encode_text(text) | |
| image_features /= image_features.norm(dim=-1, keepdim=True) | |
| text_features /= text_features.norm(dim=-1, keepdim=True) | |
| text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1) | |
| print("Label probs:", text_probs) | |
| ``` | |
| ## Training Details | |
| ### Training Data | |
| <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | |
| Please refer to [TiC-DataComp](https://huggingface.co/datasets/apple/TiC-DataComp). | |
| ### Training Procedure | |
| Please refer to Sections 2-3 of our [TiC-CLIP](https://github.com/apple/ml-tic-clip) paper. | |
| ## Citation | |
| **[TiC-CLIP: Continual Training of CLIP Models](https://arxiv.org/abs/2310.16226). (ICLR 2024)** | |
| *Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..* | |
| ```bibtex | |
| @inproceedings{garg2024tic, | |
| title={TiC-CLIP: Continual Training of CLIP Models}, | |
| author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash}, | |
| booktitle={The Twelfth International Conference on Learning Representations (ICLR)}, | |
| year={2024}, | |
| url={https://openreview.net/forum?id=TLADT8Wrhn} | |
| } | |