Purpose
This document shares the working of GRXML based grammar. Here, we explain the different grammars and corresponding output in JSON format.
Introduction
The audio file input is a 8KHz 16 bit linear PCM file.
There are three grammar i.e. answer, location, and duration.
- The answer only contains inputs “yes” and “no”.
- Location only contains inputs “local” and “area”.
- Duration contains inputs “less than”, “more than”, “days” and hours
Example:
- Less than 24 days
- More than 10 hours
- Less than 1 day
- Less than 1 hour
Usage
01. For answer grammar
curl –request POST –data-binary “@yes.wav” “http://198.199.70.106:49162/answer?
key=xxxxxxxxxxxxxxxxxx”
Response
{
“status”:0,
“result”:{
“final”:true,
“hypotheses”:[
{
“transcript”:”[noise] yes”,
“likelihood”:1.0,
“final_output”:”rule_ref: yes_no ,output_tag: yes”,
“word-alignment”:[
{
“word”:”[noise]”,
“start”:0.0,
“length”:0.0099999997764825821,
“confidence”:1.0
},
{
“word”:”yes”,
“start”:1.2899999618530273,
“length”:0.52999997138977051,
“confidence”:1.0
}
]
}
]
},
“segment”:0.0,
“segment-start”:0.0,
“segment-length”:1.8199999332427979,
“total-length”:3.1695003509521484,
“id”:”03.27.2019_08.36.47_AM_answer_9_59613404″
}
Note:“In this grammar response, you need to extract the output_tag from the final_output tag.”
Extraction from the JSON
Input: yes
Output: “yes”
02. For location grammar
curl –request POST –data-binary “@area.wav” “http://198.199.70.106:49162/location?key=xxxxxxxxxxxxxxxxxx”
Response
{
“status”:0,
“result”:{
“final”:true,
“hypotheses”:[
{
“transcript”:”[noise] area”,
“likelihood”:1.0,
“final_output”:”rule_ref: location ,output_tag: area”,
“word-alignment”:[
{
“word”:”[noise]”,
“start”:0.0,
“length”:0.0099999997764825821,
“confidence”:1.0
},
{
“word”:”area”,
“start”:0.15999999642372131,
“length”:0.56999999284744263,
“confidence”:1.0
}
]
}
]
},
“segment”:0.0,
“segment-start”:0.0,
“segment-length”:0.72999995946884155,
“total-length”:5.2586245536804199,
“id”:”03.27.2019_08.39.18_AM_location_9_25370808″
}
Input : area
Output: “area”
03. For duration grammar
curl –request POST –data-binary “@lessthanonehour.wav” “http://198.199.70.106:49162/duration?key=xxxxxxxxxxxxxxxxxx”
Response
{
“status”:0,
“result”:{
“final”:true,
“hypotheses”:[
{
“speaking_rate”:2.1367521286010742,
“transcript”:”[noise] less than one hour”,
“likelihood”:1.0,
“final_output”:”rule_ref: less_more ,output_tag: LESS_THAN rule_ref: numbers ,output_tag: 1 rule_ref: hour_day ,output_tag: HOUR”,
“word-alignment”:[
{
“word”:”[noise]”,
“start”:0.0,
“length”:0.0099999997764825821,
“confidence”:1.0
},
{
“word”:”less”,
“start”:1.1899999380111694,
“length”:0.35999998450279236,
“confidence”:1.0
},
{
“word”:”than”,
“start”:1.5499999523162842,
“length”:0.22999998927116394,
“confidence”:1.0
},
{
“word”:”one”,
“start”:1.7999999523162842,
“length”:0.22999998927116394,
“confidence”:1.0
},
{
“word”:”hour”,
“start”:2.0299999713897705,
“length”:0.31000000238418579,
“confidence”:1.0
}
]
}
]
},
“segment”:0.0,
“segment-start”:0.0,
“segment-length”:2.3399999141693115,
“total-length”:3.5537502765655518,
“id”:”03.27.2019_08.40.19_AM_duration_9_57707448″
}
Input : less than one hour
Output : “LESS_THAN 1 HOUR”
JSON Fields
01. Status – Represents JSON status
- 0 – successful
- otherwise – unsuccessful
02. Result
- Final – Shows partial and final results
- True – final result
- False – partial result
03. Hypotheses
- speaking_rate – The number of words spoken in a second
- transcript – Contains the whole transcript of a segment
- likelihood – Represents probabilistic likelihood and used only for debugging
- final_output – This field represents the Grammar output tag. This is an optional field (it present only in GRXML based grammar APIs)
- word-alignment – Contains information of particular word
- word – One best word represents in transcript
- start – Starting time of word in seconds
- length – Length of the word in seconds
- confidence – Scaled probability estimate that the word was identified correctly
- segment – Represent the number of the current segment
- segment-start – Starting time of the current segment in the second
- segment-length – End time of the current segment in seconds
- total-length – Total length of speech decoded
- id – Represent speech id