RapidSpeech.cpp

On-device speech AI runtime for ASR, TTS, VAD, and voice cloning. Python-simple, C++-native, GGUF-powered.

RapidSpeech.cpp runs speech recognition, text-to-speech, VAD, speaker embedding, and voice cloning on-device. It gives Python developers a simple API while keeping the runtime pure C/C++, backed by ggml and a unified GGUF model format. No cloud API, no speech server, no heavyweight Python model stack.

Python In 60 Seconds

Install

pip install rapidspeech

GPU wheels:

pip install rapidspeech-metal   # macOS / Apple Silicon
pip install rapidspeech-cuda    # Linux / NVIDIA

Text to speech

python python-api-examples/tts/tts-offline.py \
  --model /path/to/omnivoice-f16.gguf \
  --text "Hello, welcome to RapidSpeech." \
  --output output.wav

Speech to text

python python-api-examples/asr/asr-offline.py \
  --model /path/to/funasr-nano-fp16.gguf \
  --audio /path/to/audio.wav

Python API

import rapidspeech

tts = rapidspeech.tts_synthesizer("/path/to/omnivoice-f16.gguf")
tts.set_params(instruct="male, young adult", language="English", seed=42)
pcm = tts.synthesize("Hello from a native speech engine.")
sample_rate = tts.get_sample_rate()

import rapidspeech

asr = rapidspeech.asr_offline("/path/to/funasr-nano-fp16.gguf")
sample_rate = asr.get_model_meta()["audio_sample_rate"]
pcm = ...  # 1-D float32 mono PCM at sample_rate
asr.push_audio(pcm)
asr.process()
print(asr.get_text())

Why RapidSpeech.cpp

Built for the edge: run speech models locally on laptops, servers, browsers, and device-class hardware.
Python-simple, C++-native: write Python, run a C++/ggml engine underneath.
One model format: ASR, TTS, VAD, and speaker models use GGUF.
NumPy in, NumPy out: ASR takes float32 PCM; TTS returns float32 PCM.
Edge-first backends: CPU, Metal, CUDA, Vulkan, CANN, OpenCL, and WebGPU.

Performance Snapshot

Test environment: Apple M1 Pro, funasr-nano-fp16.gguf, 15s audio.

Configuration	RTF	Wall Time	Notes
CPU -t 4	0.465	12.4s	CPU-only inference
GPU -t 4	0.170	5.2s	Metal acceleration
GPU -t 4 Q4_K	0.756	-	Quantized model: GPU dequant overhead
CPU -t 4 Q4_K	0.530	-	Quantized model CPU inference, 596 MB (3.3x compression)

RTF is processing time divided by audio duration. Lower is faster; RTF < 1 is faster than real time.

Supported Today

Task	Models	Status
ASR	SenseVoice-small, FunASR-nano	Stable
VAD	Silero VAD, FireRedVAD	Stable
TTS	OmniVoice, OpenVoice2, Kokoro	Active
Speaker	CAMPPlus	Stable

In Progress

CosyVoice3, Qwen3-ASR, Qwen3-TTS.

Documentation

Python examples
Technical Notes: architecture, design tradeoffs, backends, model conversion, and binding surfaces.
Browser / WASM examples
Node.js example

Native C++ CLI

Download Models

Models are available on:

🤗 Hugging Face: https://huggingface.co/RapidAI/RapidSpeech
ModelScope: https://www.modelscope.cn/models/RapidAI/RapidSpeech

Build from Source

git clone https://github.com/RapidAI/RapidSpeech.cpp
cd RapidSpeech.cpp
git submodule sync && git submodule update --init --recursive
cmake -B build
cmake --build build --config Release

Build artifacts are located in the build/ directory:

rs-asr-offline — Offline ASR command-line tool
rs-asr-vad-online — VAD-segmented quasi-streaming ASR command-line tool
rs-tts-offline — Offline TTS command-line tool
rs-quantize — Model quantization tool

Core Commands

Offline ASR

./build/rs-asr-offline \
  -m /path/to/funasr-nano-fp16.gguf \
  -w /path/to/audio.wav \
  -t 4 \
  --gpu true

VAD-segmented ASR

./build/rs-asr-offline \
  -m /path/to/funasr-nano-fp16.gguf \
  -v /path/to/silero_vad_v6.gguf \
  -w /path/to/audio.wav \
  -t 4 \
  --vad-threshold 0.5 \
  --silence-ms 600

Text to speech

./build/rs-tts-offline \
  -m /path/to/omnivoice-f16.gguf \
  -t "Hello, welcome to RapidSpeech!" \
  --instruct "male, young adult, moderate pitch" \
  --lang English \
  --n-steps 32 \
  -o output.wav

Quantization

./build/rs-quantize /path/to/input-f16.gguf /path/to/output-q4_k.gguf q4_k

Python

See Python examples for offline ASR, streaming ASR, offline TTS, streaming TTS, VAD, and voice cloning.

🤝 Contributing

If you are interested in the following areas, we welcome your PRs or participation in discussions:

Adapting more models to the framework.
Refining and optimizing the project architecture.
Improving inference performance.

Acknowledgements

Fun-ASR
llama.cpp
ggml
cppjieba — Chinese word segmentation
WeText — text normalization (ITN/TN)
miniaudio — single-file audio I/O

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github/workflows		.github/workflows
assets		assets
cmake		cmake
docs		docs
examples		examples
ggml @ 57ea0bc		ggml @ 57ea0bc
include		include
node-api-example		node-api-example
python-api-examples		python-api-examples
rapidspeech		rapidspeech
scripts		scripts
tests		tests
third_party		third_party
wasm-examples		wasm-examples
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README-CN.md		README-CN.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RapidSpeech.cpp

Python In 60 Seconds

Install

Text to speech

Speech to text

Python API

Why RapidSpeech.cpp

Performance Snapshot

Supported Today

In Progress

Documentation

Native C++ CLI

Download Models

Build from Source

Core Commands

Python

🤝 Contributing

Acknowledgements

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RapidSpeech.cpp

Python In 60 Seconds

Install

Text to speech

Speech to text

Python API

Why RapidSpeech.cpp

Performance Snapshot

Supported Today

In Progress

Documentation

Native C++ CLI

Download Models

Build from Source

Core Commands

Python

🤝 Contributing

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages