Product - Features and Deployments

Never Miss A Word with our Speech Recognition API

Set your product apart with our Speech API that delivers a comprehensive range of features, unmatched accuracy and flexible deployment.

With Speechmatics behind you, you have all the tools you need to deliver an exceptional user experience.

Transcription Modes

Our models are built to deliver in real-time, which means you get the very best performance and fast transcription whether you choose batch or real-time modes.

Batch Transcription

Quickly transcribe large quantities of pre-recorded video or audio files. You can easily set up Speechmatics to process thousands of hours of recordings.

Transcribe your pre-recorded files to get the data you need, when you need it. It’s a great way to extract understanding from your audio at pace and with efficiency.

Real-Time Transcription

We offer low-latency, accurate transcription of live audio streams from meetings, calls, or broadcast events.

You’ll get initial transcriptions in milliseconds, with context-driven accuracy improvements over time. Our real-time transcription uses the same core machine learning models to give you the best accuracy.

Deployment Mode

Deliver for diverse customer needs with support for Cloud and on-prem deployments. Switch seamlessly between the two or combine these modes.

On-Prem Deployment

Meet architecture, security and compliance needs by hosting our API in your own environment. Flexibly combine with Cloud if required.

You can deploy Speechmatics using Docker Containers or preconfigured Virtual Appliances. Going On-Prem enables you to improve workflow efficiencies and minimize latencies. This helps to target a wider market with diverse customer needs.

Cloud Deployment

Get instant, secure and scalable access to our API through our cloud deployment.

Avoid the cost and complexity of building a high-availability system from scratch while getting instant access to all our new features, languages and updates.


Partner with Speechmatics to maximize your total addressable market. We deliver for multilingual, multicultural and multinational businesses, with coverage of nearly half the world’s languages across a range of dialects and accents.

Language Coverage

We support 50 languages, covering most native languages with unmatched accuracy.

Accents and Dialects

Whether you need Brazilian Portuguese or Canadian French, we have you covered with a single language model that supports all associated accents and dialects.


Transcribe and translate audio to and from English for over 30 languages using a single API call.

Language Identification

Simplify integration and ensure accurate transcription with automatic detection of the language spoken.


The vocabulary used in different contexts and different domains can vary widely. Our customization options allow you to achieve high accuracy with even the most unique words and phrases.


With a single API call you can generate a concise, accurate summary for your use case.

Custom Dictionary

Boost accuracy for proper nouns, acronyms or industry-specific terms by providing a list of custom words.

Language Model Adaptation

Increase accuracy for a use-case or domain by using a relevant corpus of textual content to customize default models.

Industry Language Packs

We're developing English language packs optimized to industry with sector-specific terminology. Finance is available now, with more to follow soon.

Speaker Labels

Our diarization enriches the transcript with accurate speaker labels, so your users can identify every speaker in a conversation.

Speaker Diarization

Track who said what and when with speaker labelling for each word, available for both batch and real-time transcription. 

Channel Diarization

Capture exactly what was said, even when there is crosstalk between speakers, with separate transcription on each channel.

Formatting and Presentation

Written and spoken conversations vary. From punctuation to the formatting of numbers and dates, our API includes a number of features to accurately transform conversation to transcript.

Numeral Formatting

Identify and correctly format numbers, dates and currencies automatically to improve transcript readability and enable effective post-processing.

Advanced Punctuation and Casing

Improve readability with language-specific capitalization and punctuation including commas, question marks and exclamation marks.

Profanity & Disfluency Detection

Aid comprehensibility and compliance by detecting and optionally removing words that are considered profanities or hesitations.

Transcript Metadata and API

Easily push a variety of media formats to the API and get a rich set of metadata to support your post processing needs.

Word Timings

Get accurate timestamps for every word in the transcript to allow for post-processing and improved end user experience.

Confidence Scores

Collect confidence scores for every word in the transcript to enable efficient human review and editing.

Files Formats

Minimize the resource needed to prepare audio or video files with support for all major audio and video formats along with automatic sample rate detection.

ASR just got an upgrade. Speech Intelligence is here.

Explore the latest breakthroughs in speech and AI, all built on category leading accuracy.

Ready to Understand Every Voice?

Sign up to our free speech-to-text SaaS Portal and we’ll guide you through the integration of our API.