sherpa-onnx

Apache-2.0
- Yesattribution
- Permissivelinking
- Permissivedistribution
- Permissivemodification
- Yespatent grant
- Yesprivate use
- Permissivesublicensing
- Notrademark grant
Sendmail

Downloads

Readme

Supported functions

Speech recognition Speech synthesis ✔️ ✔️ Speaker identification Speaker diarization Speaker verification ✔️ ✔️ ✔️ Spoken Language identification Audio tagging Voice activity detection ✔️ ✔️ ✔️ Keyword spotting Add punctuation Speech enhancement ✔️ ✔️ ✔️

Supported platforms

Architecture Android iOS Windows macOS linux HarmonyOS x64 ✔️ ✔️ ✔️ ✔️ ✔️ x86 ✔️ ✔️ arm64 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ arm32 ✔️ ✔️ ✔️ riscv64 ✔️

Supported programming languages

1. C++ 2. C 3. Python 4. JavaScript ✔️ ✔️ ✔️ ✔️ 5. Java 6. C# 7. Kotlin 8. Swift ✔️ ✔️ ✔️ ✔️ 9. Go 10. Dart 11. Rust 12. Pascal ✔️ ✔️ ✔️ ✔️

For Rust support, please see sherpa-rs

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
Text-to-speech (i.e., TTS)
Speaker diarization
Speaker identification
Speaker verification
Spoken language identification
Audio tagging
VAD (e.g., silero-vad)
Keyword spotting

on the following platforms and operating systems:

x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64), RK NPU
Linux, macOS, Windows, openKylin
Android, WearOS
iOS
HarmonyOS
NodeJS
WebAssembly
NVIDIA Jetson Orin NX (Support running on both CPU and GPU)
NVIDIA Jetson Nano B01 (Support running on both CPU and GPU)
Raspberry Pi
RV1126
LicheePi4A
VisionFive 2
旭日X3派
爱芯派
etc

with the following APIs

C++, C, Python, Go, C#
Java, Kotlin, JavaScript
Swift, Rust
Dart, Object Pascal

Links for Huggingface Spaces

You can visit the following Huggingface spaces to try sherpa-onnx without installing anything. All you need is a browser. Description URL Speaker diarization Click me Speech recognition Click me Speech recognition with Whisper Click me Speech synthesis Click me Generate subtitles Click me Audio tagging Click me Spoken language identification with Whisper Click me

We also have spaces built using WebAssembly. They are listed below:

Description Huggingface space ModelScope space Voice activity detection with silero-vad Click me 地址 Real-time speech recognition (Chinese + English) with Zipformer Click me 地址 Real-time speech recognition (Chinese + English) with Paraformer Click me 地址 Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large Click me 地址 Real-time speech recognition (English) Click me 地址 VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice Click me 地址 VAD + speech recognition (English) with Whisper tiny.en Click me 地址 VAD + speech recognition (English) with Moonshine tiny Click me 地址 VAD + speech recognition (English) with Zipformer trained with GigaSpeech Click me 地址 VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech Click me 地址 VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech Click me 地址 VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 Click me 地址 VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model Click me 地址 VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large Click me 地址 VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small Click me 地址 VAD + speech recognition (多语种及多种中文方言) with Dolphin-base Click me 地址 Speech synthesis (English) Click me 地址 Speech synthesis (German) Click me 地址 Speaker diarization Click me 地址

Links for pre-built Android APKs

You can find pre-built Android APKs for this repository in the following table Description URL 中国用户 Speaker diarization Address 点此 Streaming speech recognition Address 点此 Text-to-speech Address 点此 Voice activity detection (VAD) Address 点此 VAD + non-streaming speech recognition Address 点此 Two-pass speech recognition Address 点此 Audio tagging Address 点此 Audio tagging (WearOS) Address 点此 Speaker identification Address 点此 Spoken language identification Address 点此 Keyword spotting Address 点此

Links for pre-built Flutter APPs

Real-time speech recognition

Description URL 中国用户 Streaming speech recognition Address 点此

Text-to-speech

Description URL 中国用户 Android (arm64-v8a, armeabi-v7a, x86_64) Address 点此 Linux (x64) Address 点此 macOS (x64) Address 点此 macOS (arm64) Address 点此 Windows (x64) Address 点此

Note: You need to build from source for iOS.

Links for pre-built Lazarus APPs

Generating subtitles

Description URL 中国用户 Generate subtitles (生成字幕) Address 点此

Links for pre-trained models

Description URL Speech recognition (speech to text, ASR) Address Text-to-speech (TTS) Address VAD Address Keyword spotting Address Audio tagging Address Speaker identification (Speaker ID) Address Spoken language identification (Language ID) See multi-lingual Whisper ASR models from Speech recognition Punctuation Address Speaker segmentation Address Speech enhancement Address

Some pre-trained ASR models (Streaming)

Please see

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/index.html
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-paraformer/index.html
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-ctc/index.html

for more models. The following table lists only SOME of them.

Name Supported Languages Description sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 Chinese, English See also sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16 Chinese, English See also sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23 Chinese Suitable for Cortex A7 CPU. See also sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 English Suitable for Cortex A7 CPU. See also sherpa-onnx-streaming-zipformer-korean-2024-06-16 Korean See also sherpa-onnx-streaming-zipformer-fr-2023-04-14 French See also

Some pre-trained ASR models (Non-Streaming)

Please see

https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-transducer/index.html
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-paraformer/index.html
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/offline-ctc/index.html
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/telespeech/index.html
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/whisper/index.html

for more models. The following table lists only SOME of them.

Name Supported Languages Description Whisper tiny.en English See also Moonshine tiny English See also sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17 Chinese, Cantonese, English, Korean, Japanese 支持多种中文方言. See also sherpa-onnx-paraformer-zh-2024-03-09 Chinese, English 也支持多种中文方言. See also sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01 Japanese See also sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24 Russian See also sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24 Russian See also sherpa-onnx-zipformer-ru-2024-09-18 Russian See also sherpa-onnx-zipformer-korean-2024-06-24 Korean See also sherpa-onnx-zipformer-thai-2024-06-20 Thai See also sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04 Chinese 支持多种方言. See also

Useful links

Documentation: https://k2-fsa.github.io/sherpa/onnx/
Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

How to reach us

Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.

Projects using sherpa-onnx

Open-LLM-VTuber

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking face running locally across platforms

See also https://github.com/t41372/Open-LLM-VTuber/pull/50

voiceapi

Streaming ASR and TTS based on FastAPI

It shows how to use the ASR and TTS Python APIs with FastAPI.

腾讯会议摸鱼工具 TMSpeech

Uses streaming ASR in C# with graphical user interface.

Video demo in Chinese: 【开源】Windows实时字幕软件（网课/开会必备）

lol互动助手

It uses the JavaScript API of sherpa-onnx along with Electron

Video demo in Chinese: 爆了！炫神教你开打字挂！真正影响胜率的英雄联盟工具！英雄联盟的最后一块拼图！和游戏中的每个人无障碍沟通！

Sherpa-ONNX 语音识别服务器

A server based on nodejs providing Restful API for speech recognition.

QSmartAssistant

一个模块化，全过程可离线，低占用率的对话机器人/智能音箱

It uses QT. Both ASR and TTS are used.

Flutter-EasySpeechRecognition

It extends ./flutter-examples/streaming_asr by downloading models inside the app to reduce the size of the app.

Note: [Team B] Sherpa AI backend also uses sherpa-onnx in a Flutter APP.

sherpa-onnx-unity

sherpa-onnx in Unity. See also #1695, #1892, and #1859

xiaozhi-esp32-server

本项目为xiaozhi-esp32提供后端服务，帮助您快速搭建ESP32设备控制服务器 Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.

KaithemAutomation

Pure Python, GUI-focused home automation/consumer grade SCADA.

It uses TTS from sherpa-onnx. See also ✨ Speak command that uses the new globally configured TTS model.

Dependencies

No runtime dependency information found for this package.

CVE IssuesActive

Scorecards Score

No Data

Test Coverage

No Data

Follows Semver

Github Stars

5,493

Dependenciestotal

DependenciesOutdated

DependenciesDeprecated

Threat Modelling

No Data

Repo Audits

No Data

Learn how to distribute sherpa-onnx in your own private PyPI registry

pip install sherpa-onnx

Processing...

Done

Start your free trial

8 Releases

PyPI on Cloudsmith

Getting started with PyPI on Cloudsmith is fast and easy.

Learn more about PyPI on Cloudsmith

View the Cloudsmith + Python Docs

Keywords

License

Readme

Supported functions

Supported platforms

Supported programming languages

Introduction

Links for Huggingface Spaces

Links for pre-built Android APKs

Links for pre-built Flutter APPs

Real-time speech recognition

Text-to-speech

Links for pre-built Lazarus APPs

Generating subtitles

Links for pre-trained models

Some pre-trained ASR models (Streaming)

Some pre-trained ASR models (Non-Streaming)

Useful links

How to reach us

Projects using sherpa-onnx

Open-LLM-VTuber

voiceapi

腾讯会议摸鱼工具 TMSpeech

lol互动助手

Sherpa-ONNX 语音识别服务器

QSmartAssistant

Flutter-EasySpeechRecognition

sherpa-onnx-unity

xiaozhi-esp32-server

KaithemAutomation

59Quality

41Maintenance

60Docs

Learn how to distribute sherpa-onnx in your own private PyPI registry

8 Releases

Getting started with PyPI on Cloudsmith is fast and easy.