Decoding audio sounds, representing single letters, tested on audio captchas.

This package contains few tools in python to help to decode audio captchas - or other audio signal representing separated letters. Tools provide :
  • Segmentation of audio signal to single letter sounds (now very simple based on signal power level).
  • Creation of sound descriptive vector - using MFCC (with big help of numpy, scipy and talkbox library (slightly modified)
  • Support for training SVM classifier library libsvm - GUI tool to help with preparing training data and generating training data in correct format
  • High level wrapper around SVM classifier and other functions (audio decoding, segmentation, MFCC) to be able to get letters from a mp3 sample

How does it work

  1. Need to gather enough samples of sounds, manuallly provide transcription to correct letters and create training set
  2. Train the classifier using it's training tools ( will do all the work).
  3. Create configuration file which referes classifier model, scaling data and couple of other parameters from training process (especially fixed sound duration for a single letter and relative power level for sound segmentation).
  4. Use this configuration to recognize letters in other sound samples


GUI tool for creating training sets:


Developed and tested on Ubuntu 9.10 64 bit. Not sure about other platforms. Requires: numpy, scipy, pymad, pyao, pyqt4 - all are available as packages in Ubuntu.


Source (including libsvm and talkbox) is available from launchpad here. Contains also some test and training data. SVM library and tools are compiled here for 64bit linux platform. If not right for you, you need to compile libsvm yourself.
Last site update on 30/08/2013