Introducing Scribe v2
- Published
ListenListen to this article
Scribe v2 is built for batch transcription, subtitling, and captioning at scale. It improves on the stability and accuracy of Scribe v1, with better handling of long-form audio, pauses, changes in tone, and extended silences.

While Scribe v2 Realtime is optimized for ultra low latency and agents use cases, Scribe v2 is optimized for long and complex recordings, maintaining accuracy across diverse speakers, accents, and delivery styles. The result is consistently reliable transcripts across a wide range of real-world audio conditions.
Scribe v2 achieves the lowest word error rate recorded on industry-standard benchmarks.

Keyterm Prompting for context-aware transcription
Keyterm prompting goes beyond standard Custom Vocabulary by using the transcript’s context. Select up to 100 words or phrases, and Scribe v2 will accurately decide when to transcribe those terms. This makes it well suited for technical domains, brand names, and industry-specific language.

Built-in entity detection with precise timestamps
Scribe v2 includes native entity detection for structured audio analysis.
You can select up to 56 categories across Personally Identifiable Information, health data or payment details. Scribe v2 will automatically detect these instances and their exact timestamps in your transcript, making it easier to review, redact, or process sensitive information at scale.
Learn more in the API documentation: https://elevenlabscreator.arsenaldigitalweb.com.br/docs/developers/guides/cookbooks/speech-to-text/batch/entity-detection
Automatic multi-language transcription
Scribe v2 supports smart multi-language workflows out of the box.
You can send audio that contains multiple languages in a single file. The model automatically detects each language and transcribes it correctly without manual segmentation or configuration.
Additional features for production workflows
Scribe v2 includes a set of features designed for enterprise and developer use cases:
- Smart speaker diarization for clear, intuitive speaker labeling
- Precise word-level timestamps for accurate subtitle alignment and interactive experiences
- Dynamic audio tagging that detects non-speech events such as laughter or footsteps
- Enterprise readiness with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency, and zero retention mode support
Scribe v2, now in ElevenLabs Studio
Scribe v2 is now used in ElevenLabs Studio for more accurate subtitles, captions and transcriptions, supporting teams that manage large libraries of audio and video across marketing, media, research, training, and compliance use cases.

Try it now: https://elevenlabscreator.arsenaldigitalweb.com.br/app/studio
Build with the API
With Scribe v2, developers and enterprises can automate complex audio pipelines, improve accuracy in global content workflows, and scale securely with full compliance and data residency controls.

Scribe v2 is available today via our API and Creative platform.
Try it now: https://elevenlabscreator.arsenaldigitalweb.com.br/app/speech-to-text
Read the docs: https://elevenlabscreator.arsenaldigitalweb.com.br/docs/capabilities/speech-to-text
Sign up here: https://elevenlabscreator.arsenaldigitalweb.com.br/speech-to-text




