Language and Accent Id Specification


The purpose of this document is to share the Emotion Identification API specification so that GoVivace potential customers could test their integration. The contents of this document are GoVivace Proprietary and Subject to change.


This API strives to expose Language and Accent Identification routines as a restful web service. The emotion identification process assumes that the audio file input is an 8KHz 16 bit linear PCM file. If wav format is used, the first 44 bytes are just treated like audio and have been found to work fine.


The language and accent identification service accepts post requests with the audio in the body of the message at the specified URI. For example, using the curl command, one could do-

Curl Command

curl –request POST –data-binary “@sample1.wav” “”

where sample1.wav is an 8KHz sampling rate 16 bit linear PCM file. The body of the post would contain the entire audio file in 16bit linear PCM 8KHz format.


Websocket API


For Websocket API

After the last block of speech data, a special 3-byte ANSI-encoded string “EOS” (“end-of-stream”) needs to be sent to the server. This tells the server that no more speech is coming.

After sending “EOS”, the client has to keep the WebSocket open to receive the result from the server. The server closes the connection itself when results have been sent to the client. No more audio can be sent via the same WebSocket after an “EOS” has been sent. In order to process a new audio stream, a new WebSocket connection has to be created by the client.


Python Client

python –uri “wss://” –save-json-filename sample1_language.json –rate 4200 sample1.wav



–save-json-filename: Save the intermediate JSON to this specified file
–rate: Rate in bytes/sec at which audio should be sent to the server
–uri: Server websocket URI
–key: Authentication key
–action: Action value which we want to perform like identify
–file_format: Define file format (default is 8K_PCM16)


“message”:”Language and Accent identification is successful”,

The server sends language and accent identification results and other information to the client using the JSON format. The response can contain the following fields:

status: response status (integer), see codes below
message: status message
processing_time: total amount of time spent at the server side to process the audio
identified_language: depends on maximum identification score from languages_identified
●score: confidence of identified language its less than one
languages_identified: identified more than one languages
identification score: confidence score
language: English, Spanish, Hindi and so on.
accents_identified:contains accent_identification_score and accent

The following status codes are currently in use-

0 – Success: Usually used when recognition result sent
1 – No speech: Send when the incoming audio contains a large portion of silence or non-speech
2 – Aborted: Recognition was aborted for some reason
9 – Not Available: Used when all recognizer processes are currently in use and recognition cannot be performed

Languages supported and their codes:

  • English – 0
  • Thai – 1
  • Bengali – 2
  • Hindustani – 3
  • Russian – 4
  • Japanese – 5
  • Chinese – 6
  • Vietnamese – 7
  • Korean – 8
  • Farsi – 9
  • Arabic – 10
  • Spanish – 11
  • Tamil – 12
  • German – 13