Voice and mail Watson wasn’t sure of max_alternatives #Ibm watson speech to text codeFeatures Additional Information Dependencies Code Language Visual Basic Runtime Windows Legacy (.Net Framework 4.6. Status of the API call is also returned as an output. This activity returns the output in JSON string format. Per word confidence allows you to see a per word confidence breakdown, so you can mark unknown words in the final output with question marks or similar to denote if it’s not confident it has transcribed correctly. Overview This activity uses IBM Watson Speech to Text API to convert audio to text. Watson has support for US and GB variants of speech recognition, wideband, narrowband and adaptive rate bitrates. Luckily it has wide ranging WAV support, something GCP doesn’t, as well as FLAC, G.729, mpg, mp3, webm and ogg. Unfortunately Watson, like GCP, only has support for MULAW (μ-law compounding) and not PCMA as used outside the US. One useful use case is searching through a call recording transcript, and then jumping to that timestamp in the audio.įor example in a long conference call recording you might be interested in when people talked about “Item X”, you can search the call recording for “Item” “X” and find it’s at 1:23:45 and then jump to that point in the call recording audio file, saving yourself an hour and bit of listening to a conference call recording. This reads poorly in CURL but when used with speaker_labels allows you to see the time and correlate it with a recording. Timestamps timestamp each word based on the start of the audio file, This makes the transcription read more like a script with “Speaker 1: Hello other person” “Speaker 2: Hello there Speaker 1”, makes skimming through much easier. Speaker labels enable you to identify each speaker in a multi-party call. “transcript”: “hi Nick this is Nick leaving Nick a test voice mail “ Common Transcription Options speaker_labels=true which The IBM Watson Text to Speech service provides APIs that use IBMs. I’ve got an Asterisk instance that manages Voicemail, so let’s fire the messages to Watson and get it to transcribe the deposited messages: curl -X POST -u "apikey:yourapikey" -header "Content-Type: audio/wav" -data-binary "" “confidence”: 0.831, IBM - 1:35Text to Speech (TTS) is an app that can vocalize text files. Once you’ve grabbed your API key we can start transcribing. Select “Speech to Text” and you can view / copy your API key from the Credentials header. The first thing you’re going to need are credentials. Input formats support PCM coded data, so you can pipe PCMA/PCMU (Aka G.711 µ-law 7 a-law) audio straight to it. Sadly, Watson doesn’t have Australian language models out of the box (+1 point to Google which does), but you can add Custom Language Models & train it. IBM’s offering is a bit more flexible than the Google offering, and allows long transcription (>1 minutes) without uploading the files to external storage. The last time I’d played with Speech Recognition on Voice Platforms was in 2012, and it’s amazing to see how far the technology has evolved with the help of AI. I’ve been using IBM’s Watson’s Speech to Text engine for transcribing call audio, some possible use cases are speech driven IVRs, Voicemail to Email transcription, or making Call Recordings text-searchable.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |