3D-Speaker
This version of 3D-Speaker has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 4.1-patch1
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through
The repo of AXera Platform, which you can get the detail of guide
Support Platform
| Chips | model | cost |
|---|---|---|
| AX650 | ERes2NetV2 | 5.09ms |
| Ecapa-tdnn | 7.37ms |
How to use
Download all files from this repository to the device
root@ax650:~/3D-Speaker# tree
.
|-- ax650
| `-- res2netv2.axmodel
| `-- ecapa-tdnn.axmodel
|-- wavs
| `-- speaker1_a_cn_16k.wav
| `-- speaker1_b_cn_16k.wav
| `-- speaker2_a_cn_16k.wav
|-- run_onnx_res2netv2.py
|-- run_axmodel_res2netv2.py
|-- run_onnx_ecapa_tdnn.py
|-- run_axmodel_ecapa_tdnn.py
|-- res2netv2.onnx
|-- ecapa-tdnn.onnx
Inference
Input Wavs:
|-- wavs
| `-- speaker1_a_cn_16k.wav
| `-- speaker1_b_cn_16k.wav
| `-- speaker2_a_cn_16k.wav
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)
root@ax650 ~/3d_speaker # python3 run_axmodel_ecapa_tdnn.py --wavs ./speaker1_a_cn_16k.wav ./speaker2_a_cn_16k.wav
[INFO] Available providers: ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO]: Computing the similarity score...
[INFO]: The similarity score between two input wavs is 0.7166
Output: [INFO]: The similarity score between two input wavs is 0.7166
- Downloads last month
- 24