Govivace’s Patented Speech to Text technology accurately transcribes audio in real-time.  

Our Patented Speech to Text (STT) solution Listener is based on state-of-the-art Automatic Speech Recognition (ASR) technology that enables machines to understand and transcribe speech. The Speech to Text solution uses advanced machine learning algorithms and natural language processing techniques to accurately recognize and transcribe speech in real-time. It supports standard telephony as well as web and mobile applications. 

On the back end, Listener’s powerful Automatic Speech Recognition engines utilize state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) models developed by GoVivace speech and NLP scientists. Listener is capable of transcribing audio of any length in an online streaming fashion as well as in offline mode.

Depending on the needs of the clients, the large automatic speech recognition models can also be easily customized to support custom lingo business terms and proper nouns. The automatic speech recognition Engine supports Keyword Spotting and Hint Word Recognition as well when requested. 

The speech-to-text solution can be also provided as a grammar-based automatic speech recognition, where very simple to very large grammar can be processed. It can easily support very large grammars for complex tasks such as dates, complex commands, and yellow pages styled complex directory lookups. Performance tuning is another service, whereby we troubleshoot poorly performing grammar by tuning the acoustic and language models for the preferred service. 

These grammar-based automatic speech recognition engines can work with both pre-compiled grammar that can be referenced by name, and on-the-fly grammar that evolves as the client uses the application and which can be detected if reused. Both kinds of grammar are stored on the server after compilation, to ensure fast processing. GoVivace also offers consulting services for the design and development of complex grammar for our clients. 

GoVivace provides multilingual speech to text and text to speech engines, covering the US, UK  and Indian accented English,  Spanish, Portuguese, French, German and Italian. Further, it provides English mixed code-switching ASR for some major Indic languages like  Hindi, Tamil, Telugu, Kannada, Marathi, Gujarati, Bengali, Malayalam, and Assamese and we are adding more in the upcoming days. Indic ASR supports Indian English accents, considering the speaking habits of the Indian population. Check the Automatic Speech Recognition (speech to text and text to speech) Market Overview

Our distributed client-server architecture supports easy scaling and an ever-growing list of client devices. A load balancer can be used at the front end, and servers added to the system at the back end to allow for redundancy, reliability, and scalability. In addition to this, we also support MRCPv2 for our ASR  solution. Since ASR solutions are used extensively in the commercial world at different levels, businesses can use GoVivace’s ASR plugin (Automatic Speech Recognition plugin) for the UniMRCP servers for their requirements.

On-device solutions are available upon request to support the hardware of your choice.

Key Features of Automatic Speech Recognition:

  • Accurate and Noise robust
  • We provide SDK library and WebSocket-based live transcription with bidirectional streaming to use the software as a service(SaaS) or on-premise service deployment.
  • The same speech recognition engine can be used to build mobile, web and high-volume applications to work around the clock, providing a uniform user experience
  • Keyword Spotting and voice-trigger capability
  • Hint word recognition 
  • Supports a distributed client/server architecture for easy scaling and an ever-growing list of devices
  • Supports multiple languages and multiple accents with speech to text api.
  • Supports code-switched and multilingual Automatic Speech Recognition
  • Supports unlimited vocabulary and continuous streaming of audio of any length

Listener is used in a wide range of applications, including:

  • Conversational AI: Snappy response times and accuracy delight users as if they were talking to a real person.
  • Voicemail:  To build applications that convert voicemail into an email
  • Dictation: To integrate with live dictation systems, note-taking and e-learning applications 
  • Transcription: To transcribe audio to text from the video or audio data of meetings, lectures, and interviews, making it easier to search for and analyze specific information.
  • Keyword spotting and hint word recognition:  For finding a particular keyword in an audio clip or to boast recognition of a word or a set of words. 
  • Call centers:  To automate call center operations, such as call routing, sentiment detection, information retrieval, and customer service.
  • Closed Captioning: To create closed captions for television programs, movies, and other video content, making it accessible to viewers who are deaf or hard of hearing.
  • Voice assistants:  To integrate with voice assistants like VIVI, allowing users to interact through natural language.
  • Language learning:  To integrate with language learning applications helping language learners to improve their pronunciation and speaking skills by providing feedback on their speech.
  • Healthcare:  To transcribe medical notes and records, which can save time and improve accuracy for a wide range of healthcare professionals.

