My final project for Stanford CS 224S was on subvocal speech recognition. This was my last paper at Stanford; it draws on everything I learned in a whirlwind of CS grad school without a CS undergraduate major. Pol Rosello provided the topic; he and I contributed equally to the paper.
We describe the first approach toward end-to-end, session-independent subvocal automatic speech recognition from involuntary facial and laryngeal muscle movements detected by surface electromyography. We leverage character-level recurrent neural networks and the connectionist temporal classification loss (CTC). We attempt to address challenges posed by a lack of data, including poor generalization, through data augmentation of electromyographic signals, a specialized multi-modal architecture, and regularization. We show results indicating reasonable qualitative performance on test set utterances, and describe promising avenues for future work in this direction.