Pocketsphinx.js

Speech Recognition in JavaScript and WebAssembly

View project onGitHub

Welcome to Pocketsphinx.js

Pocketsphinx.js is a speech recognition library running entirely in the web browser. It does not require Flash or any browser plug-in and does not do any server-side processing. It makes use of Emscripten to convert PocketSphinx, an open-source speech recognizer written in C, into JavaScript or WebAssembly. Audio is recorded with the getUserMedia JavaScript API and processed through the Web Audio API.

Features

The features of Pocketsphinx.js are tightly related to the features of PocketSphinx. There are, however, some specifics related to the browser environment.

  • All-JavaScript API,
  • Calls can be made through Web Workers or not,
  • Supports all acoustic models supported by PocketSphinx,
  • Supports most of the command-line parameters of PocketSphinx,
  • Support for Finite State Grammars (FSG) input from JavaScript,
  • Support for Statistical Language Models or JSGF grammars input from files,
  • Support for Keyword spotting,
  • Optional audio recording library for real-time recognition.

Audio Recorder

PocketSphinx.js comes with an audio recorder that can be used independently for any audio-related web application. It is based on the Web Audio API and WebRTC. Its features include:

  • All-JavaScript API,
  • Works on Chrome and Firefox,
  • Audio resampling inside a web worker, without loading the UI thread.

If you're interested, take a look at the documentation.

Current status

The library can be tested on the Live Demo page. It provides a simple API which is fully documented in the source code repository. We also have a live demo in Chinese on the Live Demo page in mandarin, and another Live Demo for Keyword Spotting.

Using the library for real-time recognition implies using bleeding-edge Web technologies that really are just emerging. A general introduction of these technologies and their current status can be found in this overview of audio in the browser.

Live demo

There is a live demo here, one in mandarin Chinese here, and a demo of keyword spotting. It works on Chrome and Firefox as live audio capture uses parts of the Web Audio API that are currently only available in these browsers.

License

It's open-source (MIT license, with PocketSphinx also under a BSD-style license), and available on Github.