Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -20,6 +20,7 @@ Disclaimer: The team releasing Mask2Former did not write a model card for this m
|
|
| 20 |
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
|
| 21 |
[MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
|
| 22 |
without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
|
|
|
|
| 23 |
In the paper [Mask2Former for Video Instance Segmentation
|
| 24 |
](https://arxiv.org/abs/2112.10764), the authors have shown that Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
|
| 25 |
|
|
@@ -34,9 +35,9 @@ You can use this particular checkpoint for instance segmentation. See the [model
|
|
| 34 |
Here is how to use this model:
|
| 35 |
|
| 36 |
```python
|
| 37 |
-
import requests
|
| 38 |
import torch
|
| 39 |
-
|
|
|
|
| 40 |
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
|
| 41 |
|
| 42 |
|
|
@@ -46,7 +47,7 @@ model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/video-mask
|
|
| 46 |
|
| 47 |
file_path = hf_hub_download(repo_id="shivi/video-demo", filename="cars.mp4", repo_type="dataset")
|
| 48 |
video = torchvision.io.read_video(file_path)[0]
|
| 49 |
-
video_frames = [image_processor(images=frame, return_tensors="pt"
|
| 50 |
video_input = torch.cat(video_frames)
|
| 51 |
|
| 52 |
with torch.no_grad():
|
|
|
|
| 20 |
Mask2Former addresses instance, semantic and panoptic segmentation with the same paradigm: by predicting a set of masks and corresponding labels. Hence, all 3 tasks are treated as if they were instance segmentation. Mask2Former outperforms the previous SOTA,
|
| 21 |
[MaskFormer](https://arxiv.org/abs/2107.06278) both in terms of performance an efficiency by (i) replacing the pixel decoder with a more advanced multi-scale deformable attention Transformer, (ii) adopting a Transformer decoder with masked attention to boost performance without
|
| 22 |
without introducing additional computation and (iii) improving training efficiency by calculating the loss on subsampled points instead of whole masks.
|
| 23 |
+
|
| 24 |
In the paper [Mask2Former for Video Instance Segmentation
|
| 25 |
](https://arxiv.org/abs/2112.10764), the authors have shown that Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.
|
| 26 |
|
|
|
|
| 35 |
Here is how to use this model:
|
| 36 |
|
| 37 |
```python
|
|
|
|
| 38 |
import torch
|
| 39 |
+
import torchvision
|
| 40 |
+
from huggingface_hub import hf_hub_download
|
| 41 |
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
|
| 42 |
|
| 43 |
|
|
|
|
| 47 |
|
| 48 |
file_path = hf_hub_download(repo_id="shivi/video-demo", filename="cars.mp4", repo_type="dataset")
|
| 49 |
video = torchvision.io.read_video(file_path)[0]
|
| 50 |
+
video_frames = [image_processor(images=frame, return_tensors="pt").pixel_values for frame in video]
|
| 51 |
video_input = torch.cat(video_frames)
|
| 52 |
|
| 53 |
with torch.no_grad():
|