Purpose

The purpose of this document is to share the Gender Identification API specification so that GoVivace potential customers could test their integration. The contents of this document are GoVivace Proprietary and Subject to change.

Introduction

This API strives to expose Gender Identification routines as a restful web service. The gender identification process assumes that the audio file input is an 8KHz 16 bit linear PCM file. If wav format is used, the first 44 bytes are just treated like audio and have been found to work fine.

Usage

The gender identification service accepts post requests with the audio in the body of the message at the specified URI. For example, using the curl command, one could do-

CURL Command

curl –request POST –data-binary @sample1.wav” https://services.govivace.com:7684/GenderId?action=identify&format=8K_PCM16&key=xxxxxxxxxxxxxxxxxxxxxx” Here sample1.wav is an 8KHz sampling rate 16 bit linear PCM file. The body of the post would contain the entire audio file in 16bit linear PCM 8KHz format.

Websocket API

wss://services.govivace.com:7684/GenderId?action=identify&format=8K_PCM16&key=xxxxxxxxxxxxxxxxxxxxxx

(For Websocket API)

After the last block of speech data, a special 3-byte ANSI-encoded string “EOS” (“end-of-stream”) needs to be sent to the server. This tells the server that no more speech is coming.

After sending “EOS”, the client has to keep the WebSocket open to receive a result from the server. The server closes the connection itself when results have been sent to the ..client. No more audio can be sent via the same WebSocket after an “EOS” has been sent. In order to process a new audio stream, a new WebSocket connection has to be created by the client.

Python Client

python client.py –uri “wss://services.govivace.com:7684/GenderId?action=identify&format=8K_PCM16&key=xxxxxxxxxxxxxxxxxxxxxx” –save-json-filename sample1_gender.json –rate 4200 sample1.wav

Options
➢–save-json-filename: Save the intermediate JSON to this specified file
➢–rate: Rate in bytes/sec at which audio should be sent to the server
➢–uri: Server websocket URI
➢–key: Authentication key
➢–action: Action value which we want to perform like identify
➢–file_format: Define file format (default value is 8K_PCM16)

Response

{
“message”:”Gender identification is successful”,
“status”:0,
“gender”:”female”,
“string_confidence”:0.98455590009689331,
“processing_time”:78.363928999999999,
“input_speech_duration”:28.770000457763672
}

Server sends gender identification results and other information to the client using the JSON format. The response can contain the following fields:

●status: response status (integer), see codes below
●message: status message
●processing_time: total amount of time spent at the server side to process the audio
●input_speech_duration: total amount of duration found by the server from the audio
●gender: male or female
●string_confidence: a number between 0 and 1

The following status codes are currently in use-
●0– Success. Usually used when recognition result sent
●1– No speech. Send when the incoming audio contains a large portion of silence or non-speech
●2– Aborted. Recognition was aborted for some reason
●9– Not Available. Used when all recognizer processes are currently in use and recognition cannot be performed

Confidence attempts to indicate the system level expectation that the decision is correct. If the value of the confidence is small that means the system is less sure about the gender of the speaker in the audio.