Releases: modelscope/FunASR
v1.3.13
What's Changed
- docs(vllm_guide): explain EmbedsPrompt/streaming partial/SPK/concurre… by @qiulang in #3011
- docs(README): make the llama.cpp/GGUF (CPU/edge) runtime a first-class section by @LauraGPT in #3012
- feat: q8_0 GGUF export — half-size encoders, CER unchanged by @LauraGPT in #3013
- docs(README): refresh What's New with June updates by @LauraGPT in #3015
- docs(README): holistic fixes — broken quickstart, stray links, consistency by @LauraGPT in #3016
- docs(README): canonical iic/ ModelScope namespace by @LauraGPT in #3017
- fix(glm_asr): dedup result keys for duplicate audio basenames by @SuperMarioYL in #3014
Full Changelog: v1.3.12...v1.3.13
v1.3.12
What's Changed
- docs(vllm_guide): drop stale repetition_penalty hardcode note by @LauraGPT in #3007
- fix(qwen3-asr): map ISO/short language codes to qwen-asr canonical names by @montvid in #3008
- docs(README): make the quickstart runnable (missing model.generate call) by @LauraGPT in #3010
- docs(vllm_guide): let vLLM pin torch/torchaudio in the installation steps by @qiulang in #3009
New Contributors
Full Changelog: v1.3.11...v1.3.12
FunASR llama.cpp runtime runtime-llamacpp-v0.1.2
Prebuilt self-contained binaries for the FunASR llama.cpp / GGUF runtime — SenseVoice, Paraformer and Fun-ASR-Nano with built-in FSMN-VAD (a whisper.cpp-style on-device ASR, strong on Chinese). Get a model with bash download-funasr-model.sh <sensevoice|paraformer|nano>, then run llama-funasr-cli / llama-funasr-sensevoice / llama-funasr-paraformer. No Python, no build. Docs: runtime/llama.cpp/README.md
v1.3.11
What's Changed
- docs: python wss server now supports multiple concurrent clients by @LauraGPT in #2985
- docs: make README quickstart runnable and output truthful by @LauraGPT in #2986
- docs: fix README streaming example (runnable + actually streams) by @LauraGPT in #2987
- Add llama.cpp / GGUF runtime (Fun-ASR-Nano, SenseVoice, Paraformer) by @LauraGPT in #2988
- docs: link llama.cpp / GGUF (CPU/edge) runtime from Deploy section by @LauraGPT in #2991
- ci: auto-create GitHub Release on version tag push by @LauraGPT in #2995
- docs: CPU benchmark vs whisper.cpp (Chinese ASR) by @LauraGPT in #2992
- feat: accept any audio input (any rate/channels, wav/mp3/flac) via miniaudio by @LauraGPT in #2994
- feat: built-in FSMN-VAD (--vad) — single-binary speech segmentation, no Python at runtime by @LauraGPT in #2998
- fix: FSMN-VAD review findings (MSVC M_PI, short-audio guard, tensor validation) by @LauraGPT in #2999
- feat: B1 packaging — one-command download, standalone convert, CI-friendly CMake by @LauraGPT in #3000
- docs: build note for funasr-common (A1 follow-up) by @LauraGPT in #2996
- ci: cross-platform prebuilt binaries for the llama.cpp runtime by @LauraGPT in #3001
- fix: B1 script portability (HF CLI fallback + friendly missing-dep error) by @LauraGPT in #3002
- test: numerical regression harness (frozen golden vs ggml/VAD/CIF/CTC output) by @LauraGPT in #3003
- feat: print transcription text in the binaries (in-binary detok) by @LauraGPT in #3004
- fix: detok review findings (null vocab guard + utf-8 tokens read) by @LauraGPT in #3005
- fix(glm_asr): warn when vLLM dtype=fp16 (degraded output) by @SuperMarioYL in #2993
- fix(glm_asr): honor sampling params in vLLM generate() by @SuperMarioYL in #2997
Full Changelog: v1.3.10...v1.3.11
FunASR llama.cpp runtime runtime-llamacpp-v0.1.1
Prebuilt self-contained binaries for the FunASR llama.cpp / GGUF runtime — SenseVoice, Paraformer and Fun-ASR-Nano with built-in FSMN-VAD (a whisper.cpp-style on-device ASR, strong on Chinese). Get a model with bash download-funasr-model.sh <sensevoice|paraformer|nano>, then run llama-funasr-cli / llama-funasr-sensevoice / llama-funasr-paraformer. No Python, no build. Docs: runtime/llama.cpp/README.md
FunASR llama.cpp runtime runtime-llamacpp-v0.1.0
Prebuilt self-contained binaries for the FunASR llama.cpp / GGUF runtime — SenseVoice, Paraformer and Fun-ASR-Nano with built-in FSMN-VAD (a whisper.cpp-style on-device ASR, strong on Chinese). Get a model with bash download-funasr-model.sh <sensevoice|paraformer|nano>, then run llama-funasr-cli / llama-funasr-sensevoice / llama-funasr-paraformer. No Python, no build. Docs: runtime/llama.cpp/README.md
v1.3.10
FunASR v1.3.10
New features
- Agent-friendly CLI:
funasr audio.wav --output-format jsonfor structured output - Fun-ASR-Nano: batched VAD-segment decoding (~1.75× faster) (#2979)
- WebSocket 2-pass server: sentence-level timestamps
- serve_vllm.py: new
--vad-model/--spk-modelflags
Fixes
- Fun-ASR-Nano: bf16/fp16 inference no longer crashes; warn on degraded fp16 (#2980)
- Fun-ASR-Nano vLLM: fix CUDA crash from
repetition_penalty - CLI: valid SRT timestamps + correct JSON durations (#2982); use
sentence_infotext (#2983); correct model idFun-ASR-Nano-2512(#2984) - Clearer error for missing audio path (#2981); respect explicit VAD silence threshold; handle
Noneencoder/scheduler configs
Docs
- New CLI reference; clearer vLLM install guidance
Full changelog: v1.3.9...v1.3.10
v1.3.9: Wheel packaging + SenseVoice speaker diarization fix
What's New
Wheel packaging (fixes #2943)
FunASR now publishes a py3-none-any wheel alongside the source distribution. Installation is faster since pip no longer needs to build from source.
Bug fixes
- SenseVoice + speaker diarization: Fixed crash when using
spk_model="cam++"with SenseVoice (auto-falls back to VAD-segment mode since SenseVoice doesn't produce word-level timestamps) - torchaudio >= 2.11 compatibility: Added
soundfileas intermediate fallback for users with newer torchaudio versions that removed legacy backends
Install / Upgrade
pip install --upgrade funasrFull changelog: v1.3.3...v1.3.9
v1.3.3: Agent Integration — OpenAI API + MCP Server + funasr-server CLI
Highlights
This release makes FunASR a drop-in speech backend for AI agents.
New: funasr-server CLI
pip install funasr fastapi uvicorn python-multipart
funasr-server --device cudaOne command starts an OpenAI-compatible /v1/audio/transcriptions endpoint.
New: MCP Server
AI assistants (Claude, Cursor, Windsurf) can now transcribe audio directly.
New: OpenAI-Compatible API
Works with any agent framework: LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")
result = client.audio.transcriptions.create(model="sensevoice", file=open("a.wav","rb"))Bug Fixes
- Fixed
hub="hf"parameter propagation to sub-models (v1.3.2) - Fixed Qwen3-ASR ImportError masking
Upgrade
pip install --upgrade funasrLinks
v1.3.2: HuggingFace Hub Fix + Performance Benchmark
What's New
Bug Fix
- Fixed hub parameter propagation — When using
hub="hf", the parameter is now correctly forwarded to VAD/PUNC/SPK sub-models. Previously, users on HuggingFace would get 404 errors for sub-models. (#2859)
Improvements
- Updated PyPI metadata with better description, keywords, and project URLs
- Added comprehensive benchmark page: https://modelscope.github.io/FunASR/benchmark.html
Benchmark Results (PyTorch, GPU)
| Model | Type | Speed |
|---|---|---|
| SenseVoice-Small | NAR | 170x realtime |
| Paraformer-Large | NAR | 120x realtime |
| Whisper-large-v3-turbo | AR | 46x realtime |
| Fun-ASR-Nano | LLM | 17x realtime |
| Whisper-large-v3 | AR | 13.4x realtime |
Install / Upgrade
pip install --upgrade funasrQuick Start
from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")