File size: 1,819 Bytes
6537858
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# NILC Portuguese Word Embeddings β€” FastText Skip-Gram 600d

Pretrained **static word embeddings** for **Portuguese** (Brazilian + European), trained by the [NILC group](http://nilc.icmc.usp.br/) on a large multi-genre corpus (~1.39B tokens, 17 sources).

This repository contains the **FastText Skip-Gram 600d** model in safetensors format.

---

## πŸ“‚ Files
- `embeddings.safetensors` β†’ word vectors (`[vocab_size, 600]`)
- `vocab.txt` β†’ vocabulary (one token per line, aligned with rows)

---

## πŸš€ Usage

```python
from safetensors.numpy import load_file

data = load_file("embeddings.safetensors")
vectors = data["embeddings"]

with open("vocab.txt") as f:
    vocab = [w.strip() for w in f]

word2idx = {w: i for i, w in enumerate(vocab)}
print(vectors[word2idx["rei"]])  # vector for "rei"
```

Or in PyTorch:

```python
from safetensors.torch import load_file
tensors = load_file("embeddings.safetensors")
vectors = tensors["embeddings"]  # torch.Tensor
```

---

## πŸ“– Reference
```bibtex
@inproceedings{hartmann-etal-2017-portuguese,
  title        = {{P}ortuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks},
  author       = {Hartmann, Nathan  and Fonseca, Erick  and Shulby, Christopher  and Treviso, Marcos  and Silva, J{'e}ssica  and Alu{'i}sio, Sandra},
  year         = 2017,
  month        = oct,
  booktitle    = {Proceedings of the 11th {B}razilian Symposium in Information and Human Language Technology},
  publisher    = {Sociedade Brasileira de Computa{\c{c}}{\~a}o},
  address      = {Uberl{\^a}ndia, Brazil},
  pages        = {122--131},
  url          = {https://aclanthology.org/W17-6615/},
  editor       = {Paetzold, Gustavo Henrique  and Pinheiro, Vl{'a}dia}
}
```

---

## πŸ“œ License
Creative Commons Attribution 4.0 International (CC BY 4.0)