Search Results (keywords: asr)

youtube-transcript-api

This is an python API which allows you to get the transcripts/subtitles for a given YouTube video. It also works for automatically generated subtitles, supports translating subtitles and it does not require a headless browser, like other selenium based solutions do!

cli
subtitle
subtitles
transcript
transcripts
youtube
youtube-api
youtube-subtitles
youtube-transcripts
asr
captions
python
translating-transcripts
youtube-asr
youtube-captions
youtube-transcript
youtube-video

whisper-timestamped

Multi-lingual Automatic Speech Recognition (ASR) based on Whisper models, with accurate word timestamps, access to language detection confidence, several options for Voice Activity Detection (VAD), and more.

asr
attention-is-all-you-need
attention-mechanism
attention-model
attention-network
attention-seq2seq
attention-visualization
deep-learning
machine-learning
multilingual-models
python
python3
pytorch
speaker-diarization
speech
speech-processing
speech-recognition
speech-to-text
transformers
whisper

deepgram-sdk

The official Python SDK for the Deepgram automated speech recognition platform.

deepgram
speech-to-text
asr
automated-speech-recognition
hacktoberfest
python
speech-recognition
text-to-speech
voice-agent
voice-ai

paddlespeech

Speech tools and models based on Paddlepaddle

SSLspeech
asr
tts
speaker
verfication
speech
classfication
text
frontend
MFA
paddlepaddle
paddleaudio
streaming
beam
search
ctcdecoder
deepspeech2
wav2vec2
hubert
wavlm
transformer
conformer
fastspeech2
hifigan
gan
vocoders
code-switch
kws
punctuation-restoration
self-supervised-learning
sound-classification
speech-alignment
speech-recognition
speech-synthesis
speech-translation
streaming-asr
streaming-tts
vocoder
voice-cloning
voice-recognition
whisper

sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

aarch64
android
arm32
asr
cpp
csharp
dotnet
ios
lazarus
linux
macos
mfc
object-pascal
onnx
raspberry-pi
risc-v
speech-to-text
text-to-speech
vits
windows

vosk

Offline open source speech recognition API based on Kaldi and Vosk

android
asr
deep-learning
deep-neural-networks
deepspeech
google-speech-to-text
ios
kaldi
offline
privacy
python
raspberry-pi
speaker-identification
speaker-verification
speech-recognition
speech-to-text
speech-to-text-android
stt
voice-recognition
vosk

rustfst-python

Library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). Re-implementation of OpenFst in Rust.

fst
openfst
graph
transducer
acceptor
shortest-path
minimize
determinize
wfst
asr
automata
composition
finite-state-acceptors
finite-state-transducers
fsts
kaldi
kaldi-asr
rust
rust-crate
rust-lang
speech-recognition
tokenizer
transducers

paddleaudio

Speech audio tools based on Paddlepaddle

audio
processpaddlepaddle
asr
code-switch
conformer
kws
punctuation-restoration
self-supervised-learning
sound-classification
speech-alignment
speech-recognition
speech-synthesis
speech-translation
streaming-asr
streaming-tts
transformer
tts
vocoder
voice-cloning
voice-recognition
wav2vec2
whisper

rapid-paraformer

Tool of speech recognition.

asr
paraformer
wenet
paddlespeech

nemo-toolkit

NeMo - a toolkit for Conversational AI

NLP
NeMo
deep
gpu
language
learning
machine
nvidia
pytorch
speech
torch
tts
asr
deeplearning
generative-ai
large-language-models
machine-translation
multimodal
neural-networks
speaker-diariazation
speaker-recognition
speech-synthesis
speech-translation

23 packages found