xun commited on
Commit
fd8c050
·
verified ·
1 Parent(s): c7e7011

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ samples/HEARME_en.wav filter=lfs diff=lfs merge=lfs -text
37
+ samples/HEARME_zf_001.wav filter=lfs diff=lfs merge=lfs -text
38
+ samples/HEARME_zm_010.wav filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - hexgrad/Kokoro-82M
5
+ pipeline_tag: text-to-speech
6
+ ---
7
+ 🐈 GitHub: https://github.com/hexgrad/kokoro
8
+
9
+ <audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/resolve/main/samples/HEARME_en.wav" type="audio/wav"></audio>
10
+
11
+ **Kokoro** is an open-weight series of small but powerful TTS models.
12
+
13
+ This model is the result of a short training run that added 100 Chinese speakers from a professional dataset. The Chinese data was freely and permissively granted to us by [LongMaoData](https://www.longmaosoft.com/), a professional dataset company. Thank you for making this model possible.
14
+
15
+ Separately, some crowdsourced synthetic English data also entered the training mix:<sup>[1]</sup>
16
+ - 1 hour of Maple, an American female.
17
+ - 1 hour of Sol, another American female.
18
+ - And 1 hour of Vale, an older British female.
19
+
20
+ This model is not a strict upgrade over its predecessor since it drops many voices, but it is released early to gather feedback on new voices and tokenization. Aside from the Chinese dataset and the 3 hours of English, the rest of the data was left behind for this training run. The goal is to push the model series forward and ultimately restore some of the voices that were left behind.
21
+
22
+ Current guidance from the U.S. Copyright Office indicates that synthetic data generally does not qualify for copyright protection. Since this synthetic data is crowdsourced, the model trainer is not bound by any Terms of Service. This Apache licensed model also aligns with OpenAI's stated mission of broadly distributing the benefits of AI. If you would like to help further that mission, consider contributing permissive audio data to the cause.
23
+
24
+ <sup>[1] LongMaoData had no involvement in the crowdsourced synthetic English data.</sup><br/>
25
+ <sup>[2] The following Chinese text is machine-translated.</sup>
26
+
27
+ > Kokoro 是一系列体积虽小但功能强大的 TTS 模型。
28
+ >
29
+ > 该模型是经过短期训练的结果,从专业数据集中添加了100名中文使用者。中文数据由专业数据集公司「[龙猫数据](https://www.longmaosoft.com/)」免费且无偿地提供给我们。感谢你们让这个模型成为可能。
30
+ >
31
+ > 另外,一些众包合成英语数据也进入了训练组合:
32
+ > - 1小时的 Maple,美国女性。
33
+ > - 1小时的 Sol,另一位美国女性。
34
+ > - 和1小时的 Vale,一位年长的英国女性。
35
+ >
36
+ > 由于该模型删除了许多声音,因此它并不是对其前身的严格升级,但它提前发布以收集有关新声音和标记化的反馈。除了中文数据集和3小时的英语之外,其余数据都留在本次训练中。目标是推动模型系列的发展,并最终恢复一些被遗留的声音。
37
+ >
38
+ > 美国版权局目前的指导表明,合成数据通常不符合版权保护的资格。由于这些合成数据是众包的,因此模型训练师不受任何服务条款的约束。该 Apache 许可模式也符合 OpenAI 所宣称的广泛传播 AI 优势的使命。如果您愿意帮助进一步完成这一使命,请考虑为此贡献许可的音频数据。
39
+
40
+ <audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/resolve/main/samples/HEARME_zf_001.wav" type="audio/wav"></audio>
41
+
42
+ <audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/resolve/main/samples/HEARME_zm_010.wav" type="audio/wav"></audio>
43
+
44
+ - [Releases](#releases)
45
+ - [Usage](#usage)
46
+ - [Samples](https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/blob/main/samples) ↗️
47
+ - [Model Facts](#model-facts)
48
+ - [Acknowledgements](#acknowledgements)
49
+
50
+ ### Releases
51
+
52
+ | Model | Published | Training Data | Langs & Voices | SHA256 |
53
+ | ----- | --------- | ------------- | -------------- | ------ |
54
+ | **v1.1-zh** | **2025 Feb 26** | **>100 hours** | **2 & 103** | `b1d8410f` |
55
+ | [v1.0](https://huggingface.co/hexgrad/Kokoro-82M) | 2025 Jan 27 | Few hundred hrs | 8 & 54 | `496dba11` |
56
+ | [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
57
+
58
+ | Training Costs | v0.19 | v1.0 | v1.1-zh | **Total** |
59
+ | -------------- | ----- | ---- | ------- | --------- |
60
+ | in A100 80GB GPU hours | 500 | 500 | 120 | **1120** |
61
+ | average hourly rate | $0.80/h | $1.20/h | $0.90/h | |
62
+ | in USD | $400 | $600 | $110 | **$1110** |
63
+
64
+ ### Usage
65
+ You can run this cell on [Google Colab](https://colab.research.google.com/).
66
+ ```py
67
+ !pip install -q kokoro>=0.8.2 "misaki[zh]>=0.8.2" soundfile
68
+ !apt-get -qq -y install espeak-ng > /dev/null 2>&1
69
+ from IPython.display import display, Audio
70
+
71
+ !wget https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/resolve/main/samples/make_en.py
72
+ !python make_en.py
73
+ display(Audio('HEARME_en.wav', rate=24000, autoplay=True))
74
+
75
+ !wget https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/resolve/main/samples/make_zh.py
76
+ !python make_zh.py
77
+ display(Audio('HEARME_zf_001.wav', rate=24000, autoplay=False))
78
+ ```
79
+ TODO: Improve usage. Similar to https://hf.co/hexgrad/Kokoro-82M#usage but you should pass `repo_id='hexgrad/Kokoro-82M-v1.1-zh'` when constructing a `KModel` or `KPipeline`. See [`make_en.py`](https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/blob/main/samples/make_en.py) and [`make_zh.py`](https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh/blob/main/samples/make_zh.py).
80
+
81
+ ### Model Facts
82
+
83
+ **Architecture:**
84
+ - StyleTTS 2: https://arxiv.org/abs/2306.07691
85
+ - ISTFTNet: https://arxiv.org/abs/2203.02395
86
+ - Decoder only: no diffusion, no encoder release
87
+ - 82 million parameters, same as https://hf.co/hexgrad/Kokoro-82M
88
+
89
+ **Architected by:** Li et al @ https://github.com/yl4579/StyleTTS2
90
+
91
+ **Trained by**: `@rzvzn` on Discord
92
+
93
+ **Languages:** English, Chinese
94
+
95
+ **Model SHA256 Hash:** `b1d8410fa44dfb5c15471fd6c4225ea6b4e9ac7fa03c98e8bea47a9928476e2b`
96
+
97
+ ### Acknowledgements
98
+ TODO: Write acknowledgements. Similar to https://hf.co/hexgrad/Kokoro-82M#acknowledgements
99
+
100
+ <img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2024/08/terminator-zero-41-1.jpg" width="400" alt="kokoro" />
config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "istftnet": {
3
+ "upsample_kernel_sizes": [20, 12],
4
+ "upsample_rates": [10, 6],
5
+ "gen_istft_hop_size": 5,
6
+ "gen_istft_n_fft": 20,
7
+ "resblock_dilation_sizes": [
8
+ [1, 3, 5],
9
+ [1, 3, 5],
10
+ [1, 3, 5]
11
+ ],
12
+ "resblock_kernel_sizes": [3, 7, 11],
13
+ "upsample_initial_channel": 512
14
+ },
15
+ "dim_in": 64,
16
+ "dropout": 0.2,
17
+ "hidden_dim": 512,
18
+ "max_conv_dim": 512,
19
+ "max_dur": 50,
20
+ "multispeaker": true,
21
+ "n_layer": 3,
22
+ "n_mels": 80,
23
+ "n_token": 178,
24
+ "style_dim": 128,
25
+ "text_encoder_kernel_size": 5,
26
+ "plbert": {
27
+ "hidden_size": 768,
28
+ "num_attention_heads": 12,
29
+ "intermediate_size": 2048,
30
+ "max_position_embeddings": 512,
31
+ "num_hidden_layers": 12,
32
+ "dropout": 0.1
33
+ },
34
+ "vocab": {
35
+ ";": 1,
36
+ ":": 2,
37
+ ",": 3,
38
+ ".": 4,
39
+ "!": 5,
40
+ "?": 6,
41
+ "/": 7,
42
+ "—": 9,
43
+ "…": 10,
44
+ "\"": 11,
45
+ "(": 12,
46
+ ")": 13,
47
+ "“": 14,
48
+ "”": 15,
49
+ " ": 16,
50
+ "\u0303": 17,
51
+ "ʣ": 18,
52
+ "ʥ": 19,
53
+ "ʦ": 20,
54
+ "ʨ": 21,
55
+ "ᵝ": 22,
56
+ "ㄓ": 23,
57
+ "A": 24,
58
+ "I": 25,
59
+ "ㄅ": 30,
60
+ "O": 31,
61
+ "ㄆ": 32,
62
+ "Q": 33,
63
+ "R": 34,
64
+ "S": 35,
65
+ "T": 36,
66
+ "ㄇ": 37,
67
+ "ㄈ": 38,
68
+ "W": 39,
69
+ "ㄉ": 40,
70
+ "Y": 41,
71
+ "ᵊ": 42,
72
+ "a": 43,
73
+ "b": 44,
74
+ "c": 45,
75
+ "d": 46,
76
+ "e": 47,
77
+ "f": 48,
78
+ "ㄊ": 49,
79
+ "h": 50,
80
+ "i": 51,
81
+ "j": 52,
82
+ "k": 53,
83
+ "l": 54,
84
+ "m": 55,
85
+ "n": 56,
86
+ "o": 57,
87
+ "p": 58,
88
+ "q": 59,
89
+ "r": 60,
90
+ "s": 61,
91
+ "t": 62,
92
+ "u": 63,
93
+ "v": 64,
94
+ "w": 65,
95
+ "x": 66,
96
+ "y": 67,
97
+ "z": 68,
98
+ "ɑ": 69,
99
+ "ɐ": 70,
100
+ "ɒ": 71,
101
+ "æ": 72,
102
+ "ㄋ": 73,
103
+ "ㄌ": 74,
104
+ "β": 75,
105
+ "ɔ": 76,
106
+ "ɕ": 77,
107
+ "ç": 78,
108
+ "ㄍ": 79,
109
+ "ɖ": 80,
110
+ "ð": 81,
111
+ "ʤ": 82,
112
+ "ə": 83,
113
+ "ㄎ": 84,
114
+ "ㄦ": 85,
115
+ "ɛ": 86,
116
+ "ɜ": 87,
117
+ "ㄏ": 88,
118
+ "ㄐ": 89,
119
+ "ɟ": 90,
120
+ "ㄑ": 91,
121
+ "ɡ": 92,
122
+ "ㄒ": 93,
123
+ "ㄔ": 94,
124
+ "ㄕ": 95,
125
+ "ㄗ": 96,
126
+ "ㄘ": 97,
127
+ "ㄙ": 98,
128
+ "月": 99,
129
+ "ㄚ": 100,
130
+ "ɨ": 101,
131
+ "ɪ": 102,
132
+ "ʝ": 103,
133
+ "ㄛ": 104,
134
+ "ㄝ": 105,
135
+ "ㄞ": 106,
136
+ "ㄟ": 107,
137
+ "ㄠ": 108,
138
+ "ㄡ": 109,
139
+ "ɯ": 110,
140
+ "ɰ": 111,
141
+ "ŋ": 112,
142
+ "ɳ": 113,
143
+ "ɲ": 114,
144
+ "ɴ": 115,
145
+ "ø": 116,
146
+ "ㄢ": 117,
147
+ "ɸ": 118,
148
+ "θ": 119,
149
+ "œ": 120,
150
+ "ㄣ": 121,
151
+ "ㄤ": 122,
152
+ "ɹ": 123,
153
+ "ㄥ": 124,
154
+ "ɾ": 125,
155
+ "ㄖ": 126,
156
+ "ㄧ": 127,
157
+ "ʁ": 128,
158
+ "ɽ": 129,
159
+ "ʂ": 130,
160
+ "ʃ": 131,
161
+ "ʈ": 132,
162
+ "ʧ": 133,
163
+ "ㄨ": 134,
164
+ "ʊ": 135,
165
+ "ʋ": 136,
166
+ "ㄩ": 137,
167
+ "ʌ": 138,
168
+ "ɣ": 139,
169
+ "ㄜ": 140,
170
+ "ㄭ": 141,
171
+ "χ": 142,
172
+ "ʎ": 143,
173
+ "十": 144,
174
+ "压": 145,
175
+ "言": 146,
176
+ "ʒ": 147,
177
+ "ʔ": 148,
178
+ "阳": 149,
179
+ "要": 150,
180
+ "阴": 151,
181
+ "应": 152,
182
+ "用": 153,
183
+ "又": 154,
184
+ "中": 155,
185
+ "ˈ": 156,
186
+ "ˌ": 157,
187
+ "ː": 158,
188
+ "穵": 159,
189
+ "外": 160,
190
+ "万": 161,
191
+ "ʰ": 162,
192
+ "王": 163,
193
+ "ʲ": 164,
194
+ "为": 165,
195
+ "文": 166,
196
+ "瓮": 167,
197
+ "我": 168,
198
+ "3": 169,
199
+ "5": 170,
200
+ "1": 171,
201
+ "2": 172,
202
+ "4": 173,
203
+ "元": 175,
204
+ "云": 176,
205
+ "ᵻ": 177
206
+ }
207
+ }
kokoro-v1_1-zh.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1d8410fa44dfb5c15471fd6c4225ea6b4e9ac7fa03c98e8bea47a9928476e2b
3
+ size 327247856
model-kokoro-v1_1-zh.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa262a71da3f86d9b7999c4c22e9a72a361a5d709b7a5b707b80079fd9a2e0d2
3
+ size 327119008
samples/HEARME_en.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b759a65788991932d031d6fc8440f7a8efc402273fc1c2ca9d52ffd8a16a6666
3
+ size 4528044
samples/HEARME_zf_001.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c8685f06fd809ca2e892f8b71f3549d0640ab992b37648781f9138be33ef035
3
+ size 4267644
samples/HEARME_zm_010.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:915d93163e2e5319370b539b72a90c69c214c143206024c086c57e5fbdd67484
3
+ size 4253244
samples/make_en.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file is hardcoded to transparently reproduce HEARME_en.wav
2
+ # Therefore it may NOT generalize gracefully to other texts
3
+ # Refer to Usage in README.md for more general usage patterns
4
+
5
+ # pip install kokoro>=0.8.1
6
+ from kokoro import KModel, KPipeline
7
+ from pathlib import Path
8
+ import numpy as np
9
+ import soundfile as sf
10
+ import torch
11
+ import tqdm
12
+
13
+ REPO_ID = 'hexgrad/Kokoro-82M-v1.1-zh'
14
+ SAMPLE_RATE = 24000
15
+
16
+ # How much silence to insert between paragraphs: 5000 is about 0.2 seconds
17
+ N_ZEROS = 5000
18
+
19
+ # Whether to join sentences in paragraphs 1 and 3
20
+ JOIN_SENTENCES = True
21
+
22
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
23
+
24
+ texts = [(
25
+ "[Kokoro](/kˈQkəɹQ/) is an open-weight series of small but powerful TTS models.",
26
+ ), (
27
+ "This model is the result of a short training run that added 100 Chinese speakers from a professional dataset.",
28
+ "The Chinese data was freely and permissively granted to us by LongMaoData, a professional dataset company. Thank you for making this model possible.",
29
+ ), (
30
+ "Separately, some crowdsourced synthetic English data also entered the training mix:",
31
+ "1 hour of Maple, an American female.",
32
+ "1 hour of [Sol](/sˈOl/), another American female.",
33
+ "And 1 hour of Vale, an older British female.",
34
+ ), (
35
+ "This model is not a strict upgrade over its predecessor since it drops many voices, but it is released early to gather feedback on new voices and tokenization.",
36
+ "Aside from the Chinese dataset and the 3 hours of English, the rest of the data was left behind for this training run.",
37
+ "The goal is to push the model series forward and ultimately restore some of the voices that were left behind.",
38
+ ), (
39
+ "Current guidance from the U.S. Copyright Office indicates that synthetic data generally does not qualify for copyright protection.",
40
+ "Since this synthetic data is crowdsourced, the model trainer is not bound by any Terms of Service.",
41
+ "This Apache licensed model also aligns with OpenAI's stated mission of broadly distributing the benefits of AI.",
42
+ "If you would like to help further that mission, consider contributing permissive audio data to the cause.",
43
+ )]
44
+
45
+ if JOIN_SENTENCES:
46
+ for i in (1, 3):
47
+ texts[i] = [' '.join(texts[i])]
48
+
49
+ model = KModel(repo_id=REPO_ID).to(device).eval()
50
+ en_pipelines = [KPipeline(lang_code='b' if british else 'a', repo_id=REPO_ID, model=model) for british in (False, True)]
51
+
52
+ path = Path(__file__).parent
53
+
54
+ wavs = []
55
+ for paragraph in tqdm.tqdm(texts):
56
+ for i, sentence in enumerate(paragraph):
57
+ voice, british = 'bf_vale', True
58
+ if 'Maple' in sentence:
59
+ voice, british = 'af_maple', False
60
+ elif 'Sol' in sentence:
61
+ voice, british = 'af_sol', False
62
+ generator = en_pipelines[british](sentence, voice=voice)
63
+ f = path / f'en{len(wavs):02}.wav'
64
+ result = next(generator)
65
+ wav = result.audio
66
+ sf.write(f, wav, SAMPLE_RATE)
67
+ if i == 0 and wavs and N_ZEROS > 0:
68
+ wav = np.concatenate([np.zeros(N_ZEROS), wav])
69
+ wavs.append(wav)
70
+
71
+ sf.write(path / 'HEARME_en.wav', np.concatenate(wavs), SAMPLE_RATE)
samples/make_zh.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file is hardcoded to transparently reproduce HEARME_zh.wav
2
+ # Therefore it may NOT generalize gracefully to other texts
3
+ # Refer to Usage in README.md for more general usage patterns
4
+
5
+ # pip install kokoro>=0.8.1 "misaki[zh]>=0.8.1"
6
+ from kokoro import KModel, KPipeline
7
+ from pathlib import Path
8
+ import numpy as np
9
+ import soundfile as sf
10
+ import torch
11
+ import tqdm
12
+
13
+ REPO_ID = 'hexgrad/Kokoro-82M-v1.1-zh'
14
+ SAMPLE_RATE = 24000
15
+
16
+ # How much silence to insert between paragraphs: 5000 is about 0.2 seconds
17
+ N_ZEROS = 5000
18
+
19
+ # Whether to join sentences in paragraphs 1 and 3
20
+ JOIN_SENTENCES = True
21
+
22
+ VOICE = 'zf_001' if True else 'zm_010'
23
+
24
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
25
+
26
+ texts = [(
27
+ "Kokoro 是一系列体积虽小但功能强大的 TTS 模型。",
28
+ ), (
29
+ "该模型是经过短期训练的结果,从专业数据集中添加了100名中文使用者。",
30
+ "中文数据由专业数据集公司「龙猫数据」免费且无偿地提供给我们。感谢你们让这个模型成为可能。",
31
+ ), (
32
+ "另外,一些众包合成英语数据也进入了训练组合:",
33
+ "1小时的 Maple,美国女性。",
34
+ "1小时的 Sol,另一位美国女性。",
35
+ "和1小时的 Vale,一位年长的英国女性。",
36
+ ), (
37
+ "由于该模型删除了许多声音,因此它并不是对其前身的严格升级,但它提前发布以收集有关新声音和标记化的反馈。",
38
+ "除了中文数据集和3小时的英语之外,其余数据都留在本次训练中。",
39
+ "目标是推动模型系列的发展,并最终恢复一些被遗留的声音。",
40
+ ), (
41
+ "美国版权局目前的指导表明,合成数据通常不符合版权保护的资格。",
42
+ "由于这些合成数据是众包的,因此模型训练师不受任何服务条款的约束。",
43
+ "该 Apache 许可模式也符合 OpenAI 所宣称的广泛传播 AI 优势的使命。",
44
+ "如果您愿意帮助进一步完成这一使命,请考虑为此贡献许可的音频数据。",
45
+ )]
46
+
47
+ if JOIN_SENTENCES:
48
+ for i in (1, 3):
49
+ texts[i] = [''.join(texts[i])]
50
+
51
+ en_pipeline = KPipeline(lang_code='a', repo_id=REPO_ID, model=False)
52
+ def en_callable(text):
53
+ if text == 'Kokoro':
54
+ return 'kˈOkəɹO'
55
+ elif text == 'Sol':
56
+ return 'sˈOl'
57
+ return next(en_pipeline(text)).phonemes
58
+
59
+ # HACK: Mitigate rushing caused by lack of training data beyond ~100 tokens
60
+ # Simple piecewise linear fn that decreases speed as len_ps increases
61
+ def speed_callable(len_ps):
62
+ speed = 0.8
63
+ if len_ps <= 83:
64
+ speed = 1
65
+ elif len_ps < 183:
66
+ speed = 1 - (len_ps - 83) / 500
67
+ return speed * 1.1
68
+
69
+ model = KModel(repo_id=REPO_ID).to(device).eval()
70
+ zh_pipeline = KPipeline(lang_code='z', repo_id=REPO_ID, model=model, en_callable=en_callable)
71
+
72
+ path = Path(__file__).parent
73
+
74
+ wavs = []
75
+ for paragraph in tqdm.tqdm(texts):
76
+ for i, sentence in enumerate(paragraph):
77
+ generator = zh_pipeline(sentence, voice=VOICE, speed=speed_callable)
78
+ f = path / f'zh{len(wavs):02}.wav'
79
+ result = next(generator)
80
+ wav = result.audio
81
+ sf.write(f, wav, SAMPLE_RATE)
82
+ if i == 0 and wavs and N_ZEROS > 0:
83
+ wav = np.concatenate([np.zeros(N_ZEROS), wav])
84
+ wavs.append(wav)
85
+
86
+ sf.write(path / f'HEARME_{VOICE}.wav', np.concatenate(wavs), SAMPLE_RATE)
voices/af_maple.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1211a6b94795d843cb7957568ccf2208e6ce76d2fbb36c7279b24e1be9b862f
3
+ size 523425
voices/af_sol.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d24aad751d7f62618506264c1cf3436276901447d85f1209231e9be29da4261
3
+ size 523351
voices/bf_vale.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e66bc4578345d490985ce73c49464e6f6a9e7c58586b99a9ae14c988ae14e01f
3
+ size 523420
voices/zf_001.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bdc9a87e13e9bb1ea3e7803259c2ecbfebaeeb2ff80b5d0c76df1a464c1c962
3
+ size 523331
voices/zf_002.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c8cf221ff2e0915fc807cac5f233f42798ee8e2bd58bc5ad0259fd95e405a26
3
+ size 523331
voices/zf_003.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac28a59eefaa7e37b2aabffc792d40081392aa89d679b579859debf5209441a1
3
+ size 523331
voices/zf_004.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d50c3a87071a11d703d9d4ff7dd1f77fe6b8c5c3a9e60e81bc848816c0e959f
3
+ size 523331
voices/zf_005.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64d656103a908954496676529f4e8dee783afd4c8dccd1a9042cd8dbe05e39f4
3
+ size 523331
voices/zf_006.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef341ad2c4ec5dab3bf32daa0a70b8779c5aba10a9e18f57e5b6b29c7ec93d37
3
+ size 523331
voices/zf_007.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52a966710a29b50d9d11df15b5572c28062d2edf89585fe2c14abe281e2e49a8
3
+ size 523331
voices/zf_008.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:361c1da6b087284a66c803d413225a09d57334ab515a93d5e16a2d553d9941f6
3
+ size 523331
voices/zf_017.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e507bb858a75f0518f08918827740530a656a008e3c057435a17b3a95f267624
3
+ size 523331
voices/zf_018.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e91abc6581fb0fe75a50059ec43ea702c0beef7356727d2d62698e0c58dbd9de
3
+ size 523331
voices/zf_019.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f766b0f90d6c05cd8a87d76a3c46e60c39bee7fa9513fa4d5258b9d1976e8f81
3
+ size 523331
voices/zf_021.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:165d873e726d09cefa4340024a539811a6bb7c971f83bf2aae0ef4f95e97c292
3
+ size 523331
voices/zf_022.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f2691ee6103fd68f2d109ad80236955354c64103216a2e2211e059a5010fd1d
3
+ size 523331
voices/zf_023.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41bc8ec1656034c5e9b8eff5401621df79f62d02aae67ac0d9dc23f81c16616e
3
+ size 523331
voices/zf_024.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56e3f7a15d8d4c49c601c8f1695822ee1024f3e9099b61003ab5d2c0dcb95afc
3
+ size 523331
voices/zf_026.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84797c09de6d1e68704c0795af415a0f59efa42b22446df50838beed9f0308ea
3
+ size 523331
voices/zf_027.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ebfe562e6402a58b50b1fd0e026a216950c038871bb386facaec10fd156caa7
3
+ size 523331
voices/zf_028.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc72e247048665f8633089d6827cd1aa460ec3feb98bf4b47ac053da9036bbc8
3
+ size 523331
voices/zf_032.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:226ba4218b4337a1157eaaf31f2a78e6e89c5dac69bef5a4686045f80e607280
3
+ size 523331
voices/zf_036.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da570e76064fa127272682899bda676dc39ac44d6a368302124256ddbd7ab011
3
+ size 523331
voices/zf_038.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:067843a37247588b5e55de0bbc2e1c9e2f4cb4b8fc89ee6aff800e5bf7d1f038
3
+ size 523331
voices/zf_039.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2d4b88445679882edaf7b74bcefe424e53dc2824c79a77a0d98904098886886
3
+ size 523331
voices/zf_040.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8985324579c2717f7bee61709fa52d0c9749cf51476980b0bd6081a4c3a455b
3
+ size 523331
voices/zf_042.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68f16707c8c90ca79323ccfd6c1be1555a181c419ad0d3d465951585ac99ecdf
3
+ size 523331
voices/zf_043.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f13f57f684b05e204817d6803b82654564fcb3a0da83a04b3dd0e913f99edfc5
3
+ size 523331
voices/zf_044.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:41523a18272dcc6b76dfcae481394a6639fa66bbca5b89e8ab66fc69956d4c46
3
+ size 523331
voices/zf_046.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f3389c0135d07bf8b04f7a0833956fa1aef35b457e62307b4885ceb4b9602cf
3
+ size 523331
voices/zf_047.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:123554ef3d840d2ff612473aeaef8183f9145b30786963b87f913fbbff943462
3
+ size 523331
voices/zf_048.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba33ae91d6cd00720185ff56afd117bb2685d95def74ea385fa91dea16170e46
3
+ size 523331
voices/zf_049.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57e31c0b581cbaa9faf24713b46e196961fbe0751625af1da54ec8563d5aaf5b
3
+ size 523331
voices/zf_051.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d81d9b1ca12c7e42ca6574cdce12b783829c806cc0d4f71cbc628629c4b6f94
3
+ size 523331
voices/zf_059.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6df5c9b26f9dffa9b500af5a26da9027f7ceccb40aaba2db69d4dd1fc98af55c
3
+ size 523331
voices/zf_060.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1927c285a38bf816d0ad4e4afe4a1e43806aa85d285795bfd83ebd20e5e12d72
3
+ size 523331
voices/zf_067.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09be635ac4e28824679bf42196860b8e086b60da1e5faa8095919f5a71de0af0
3
+ size 523331
voices/zf_070.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e23b173ccff20bcfd73c24b726c35f83c23285dc6adde7cc00df82422c9794b8
3
+ size 523331
voices/zf_071.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74b074bc00fc501cce07ce927774058f0d6bf1ffc92fe3466db8f282c7a92dec
3
+ size 523331
voices/zf_072.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89bb17115014ca7684b7af79636923347276d95e5e9de4fdd7587f64d399f4af
3
+ size 523331