Introducing Scribe v2

Published: Jan 9, 2026

ListenListen to this article

0:00

0:000:00

Scribe v2 is built for batch transcription, subtitling, and captioning at scale. It improves on the stability and accuracy of Scribe v1, with better handling of long-form audio, pauses, changes in tone, and extended silences.

While Scribe v2 Realtime is optimized for ultra low latency and agents use cases, Scribe v2 is optimized for long and complex recordings, maintaining accuracy across diverse speakers, accents, and delivery styles. The result is consistently reliable transcripts across a wide range of real-world audio conditions.

Scribe v2 achieves the lowest word error rate recorded on industry-standard benchmarks.

Keyterm Prompting for context-aware transcription

Keyterm prompting goes beyond standard Custom Vocabulary by using the transcript’s context. Select up to 100 words or phrases, and Scribe v2 will accurately decide when to transcribe those terms. This makes it well suited for technical domains, brand names, and industry-specific language.

Built-in entity detection with precise timestamps

Scribe v2 includes native entity detection for structured audio analysis.

You can select up to 56 categories across Personally Identifiable Information, health data or payment details. Scribe v2 will automatically detect these instances and their exact timestamps in your transcript, making it easier to review, redact, or process sensitive information at scale.

Learn more in the API documentation: https://elevenlabscreator.arsenaldigitalweb.com.br/docs/developers/guides/cookbooks/speech-to-text/batch/entity-detection

Automatic multi-language transcription

Scribe v2 supports smart multi-language workflows out of the box.

You can send audio that contains multiple languages in a single file. The model automatically detects each language and transcribes it correctly without manual segmentation or configuration.

Additional features for production workflows

Scribe v2 includes a set of features designed for enterprise and developer use cases:

Smart speaker diarization for clear, intuitive speaker labeling
Precise word-level timestamps for accurate subtitle alignment and interactive experiences
Dynamic audio tagging that detects non-speech events such as laughter or footsteps
Enterprise readiness with SOC 2, ISO 27001, PCI DSS L1, HIPAA, and GDPR compliance, EU and India data residency, and zero retention mode support

Scribe v2, now in ElevenLabs Studio

Scribe v2 is now used in ElevenLabs Studio for more accurate subtitles, captions and transcriptions, supporting teams that manage large libraries of audio and video across marketing, media, research, training, and compliance use cases.

Try it now: https://elevenlabscreator.arsenaldigitalweb.com.br/app/studio

Build with the API

With Scribe v2, developers and enterprises can automate complex audio pipelines, improve accuracy in global content workflows, and scale securely with full compliance and data residency controls.