Twilio introduced Speech Recognition


Convert speech to text and analyze its intent during any voice call using Speech Recognition by Twilio. Support for 89 languages and dialects. Available now in public beta.

What is Twilio? Twilio powers the future of business communications by Enabling phones, VoIP, and messaging to be embedded into web, desktop, and mobile software.

Twilio Speech Recognition allows developers to convert speech to text and analyze its intent during any voice call, and is available in public beta.

  • Speech-to-text

    Using a simple <Gather> command, the Speech Recognition API captures your speech in real-time, transcribes it, and returns text.

      <Gather input="speech"
        <Say>Say ahoy to Twilio Speech Recognition!</Say>


  • No training required

    Transcribe a wide range of industry-specific words and phrases out of the box, without any pre-training.

  • Streaming results

    Build responsive applications that act on partial recognition results as your customer speaks.

  • Multiple languages

    Recognizes 89 languages and dialects (and more coming soon) to support your global user base.

<Gather> with Speech

Speech recognition is integrated directly into Twilio’s <Gather> verb so you can update the code you already have in place. Because it supports 89 languages and dialects, you can upgrade your application to support customers across a broad range of regions. Adding speech is as simple as adding a new parameter called “input” as shown in the TwiML below.

If you specify speech as an input, Twilio will add a new parameter called SpeechResult in the request to your action url.


AccountSid AC25e16e9a616a4aXYZ6a7c83f58e30082
ApiVersion 2010-04-01
CallSid CA607dee6b7647243904ebcXYZ64a2a5c2
CallStatus in-progress
Called +18182004120
Confidence 0.77388394
Direction inbound
From +15623000628
Language en-US
SpeechResult      I’d like to learn more about Speech Recognition
To +18182104120


If you’d like to build more responsive applications, we also offer the ability to get speech results in real time as we process speech. To access the real-time voice stream, you can specify a partial results callback:


Once you specify a callback url for partialResultCallback, you will get requests as your customers speak. Since HTTP requests may arrive out of order, we include a sequence number to help you use your customer’s speech as it was spoken.

This allows you to evaluate the speech of your user as they speak to build responsive voice applications. A detailed explanation of Speech Recognition features and TwiML examples can be found here.

Speech Recognition uses a scalable pay-as-you go model, with requests starting at $0.02 per 15 seconds of recognition.

Leave A Reply

Your email address will not be published.