Purpose

This document shares the working of GRXML based grammar. Here, we explain the different grammars and corresponding output in JSON format.

Introduction

The audio file input is a 8KHz 16 bit linear PCM file.

There are three grammar i.e. answer, location, and duration.

The answer only contains inputs “yes” and “no”.
Location only contains inputs “local” and “area”.
Duration contains inputs “less than”, “more than”, “days” and hours

Example:

Less than 24 days
More than 10 hours
Less than 1 day
Less than 1 hour

Usage

01. For answer grammar

curl –request POST –data-binary “@yes.wav” “http://198.199.70.106:49162/answer?
key=xxxxxxxxxxxxxxxxxx”

Response

{
“status”:0,
“result”:{
“final”:true,
“hypotheses”:[
{
“transcript”:”[noise] yes”,
“likelihood”:1.0,
“final_output”:”rule_ref: yes_no ,output_tag: yes”,
“word-alignment”:[
{
“word”:”[noise]”,
“start”:0.0,
“length”:0.0099999997764825821,
“confidence”:1.0
},
{
“word”:”yes”,
“start”:1.2899999618530273,
“length”:0.52999997138977051,
“confidence”:1.0
}
]
}
]
},
“segment”:0.0,
“segment-start”:0.0,
“segment-length”:1.8199999332427979,
“total-length”:3.1695003509521484,
“id”:”03.27.2019_08.36.47_AM_answer_9_59613404″
}

Note:“In this grammar response, you need to extract the output_tag from the final_output tag.”

Extraction from the JSON

Input: yes

Output: “yes”

02. For location grammar

curl –request POST –data-binary “@area.wav” “http://198.199.70.106:49162/location?key=xxxxxxxxxxxxxxxxxx”
Response

{
“status”:0,
“result”:{
“final”:true,
“hypotheses”:[
{
“transcript”:”[noise] area”,
“likelihood”:1.0,
“final_output”:”rule_ref: location ,output_tag: area”,
“word-alignment”:[
{
“word”:”[noise]”,
“start”:0.0,
“length”:0.0099999997764825821,
“confidence”:1.0
},
{
“word”:”area”,
“start”:0.15999999642372131,
“length”:0.56999999284744263,
“confidence”:1.0
}
]
}
]
},
“segment”:0.0,
“segment-start”:0.0,
“segment-length”:0.72999995946884155,
“total-length”:5.2586245536804199,
“id”:”03.27.2019_08.39.18_AM_location_9_25370808″
}

Input : area

Output: “area”

03. For duration grammar

curl –request POST –data-binary “@lessthanonehour.wav” “http://198.199.70.106:49162/duration?key=xxxxxxxxxxxxxxxxxx”

Response

{
“status”:0,
“result”:{
“final”:true,
“hypotheses”:[
{
“speaking_rate”:2.1367521286010742,
“transcript”:”[noise] less than one hour”,
“likelihood”:1.0,
“final_output”:”rule_ref: less_more ,output_tag: LESS_THAN rule_ref: numbers ,output_tag: 1 rule_ref: hour_day ,output_tag: HOUR”,
“word-alignment”:[
{
“word”:”[noise]”,
“start”:0.0,
“length”:0.0099999997764825821,
“confidence”:1.0
},
{
“word”:”less”,
“start”:1.1899999380111694,
“length”:0.35999998450279236,
“confidence”:1.0
},
{
“word”:”than”,
“start”:1.5499999523162842,
“length”:0.22999998927116394,
“confidence”:1.0
},
{
“word”:”one”,
“start”:1.7999999523162842,
“length”:0.22999998927116394,
“confidence”:1.0
},
{
“word”:”hour”,
“start”:2.0299999713897705,
“length”:0.31000000238418579,
“confidence”:1.0
}
]
}
]
},
“segment”:0.0,
“segment-start”:0.0,
“segment-length”:2.3399999141693115,
“total-length”:3.5537502765655518,
“id”:”03.27.2019_08.40.19_AM_duration_9_57707448″
}
Input : less than one hour
Output : “LESS_THAN 1 HOUR”

JSON Fields

01. Status – Represents JSON status

0 – successful
otherwise – unsuccessful

02. Result

Final – Shows partial and final results
True – final result
False – partial result

03. Hypotheses

speaking_rate – The number of words spoken in a second
transcript – Contains the whole transcript of a segment
likelihood – Represents probabilistic likelihood and used only for debugging
final_output – This field represents the Grammar output tag. This is an optional field (it present only in GRXML based grammar APIs)
word-alignment – Contains information of particular word
word – One best word represents in transcript
start – Starting time of word in seconds
length – Length of the word in seconds
confidence – Scaled probability estimate that the word was identified correctly
segment – Represent the number of the current segment
segment-start – Starting time of the current segment in the second
segment-length – End time of the current segment in seconds
total-length – Total length of speech decoded
id – Represent speech id