Using Pocketsphinx for voice recognition:
-
Installed pocketsphinx from http://www.pirobot.org/blog/0022/
-
This lmtool can be used to generate dictionary from a corpus or word file: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
-
Created a grammar file following instructions from http://www.w3.org/TR/jsgf/
-
Converted the grammar file to .fsg file using this command: sphinx_jsgf2fsg -jsgf <file name>.jsgf -fsg <file name>.fsg
-
Converted the fsg to a word file using the command: perl fsg2wlist.pl < <file name>.fsg > <file name>.words
-
There are 2 dependencies for the file conversions. You need the fsg2wlist.pl file in the directory of your pocketsphinx code and a sphinx_jsgf2fsg executable in usr/bin directory. (Links at the bottom)
-
Once we complete the process of converting jsgf -> fsg -> word, we can use the word file in the lmtool to generate a dictionary
-
Using this terminal command will start running pocketsphinx and it should be able to recognize words in the dictionary and phrases from the grammar: pocketsphinx_continuous -inmic yes -jsgf <grammar file name>.jsgf -dict <dictionary name>.dic
Using networkX and graphml with pocketsphinx to create dialog:
-
Installed networkX, a python package to create a dialog graph https://networkx.github.io/documentation/development/install.html
-
To figure what the computer should say in response when pocketsphinx detects human speech we created a graph (yED can be used as a visualizer for the graphml). In the graphml, we stored the human speech as edges and computer response as nodes. Examples on how to create graphml here http://graphml.graphdrawing.org/primer/graphml-primer.html
-
Wrote code to print the stdout of pocketsphinx continuous to a log file. We wrote another python program that read the log file, cleaned out the unnecessary terminal stdout and wrote everything else to a word file. The word file contained all the phrases pocketsphinx interpreted the user to say.
-
Wrote a python code that imported the dialog graphml (using networkx). Used a variable called ‘current’ to store the node we are currently on. As pocketsphinx continuous runs in the background, the code compares the contents of the word file with the contents of the edges. If there is a match, ‘current’ becomes the node the edges lead to and the computer responds with the data of the node.
-
Command to run this: pocketsphinx_continuous -inmic yes -jsgf <grammar file>.jsgf -dict <dictionary>.dic 2>./<stdout file name>.log | tee ./<interpreted speech file>.log
Putting everything together using Selenium:
We had to run multiple commands to convert the grammar file, run pocketsphinx continuous and read the graph. We decided to write a script that can run all these commands at once.
-
Installed selenium, chrome and chrome driver to generate a dictionary from the online lmtool without having to do it manually. Links: https://seleniumhq.github.io/selenium/docs/api/py/api.html, https://sites.google.com/a/chromium.org/chromedriver/
-
Wrote a final script that uses subprocess to run pocketsphinx continuous and read the graph
Dependencies:
NetworkX
Selenium
chromedriver and Google Chrome browser
pocketsphinx_continuous and sphinx_jsgf2fsg
fsg2wlist.pl
|