Audio Text Spotting System

Santosh Singh

Abstract


Audio Text Spotting speaker dependent isolated word speech recognition system. To prepare a system which could take audio (speech) as an input and then spot the words spoken. As a standard test data is not available so I will have to generate the test data myself, this was done by asking a speaker to speak a set of random words and storing them in the wave format. The conditions would encompass speaking with various ambient noises around, different intervals in between the two parts of each word. The model then will be used for speech recognition. Speech recognition is the process of taking the spoken word as an input to computer program. It is the technology by which words spoken by human beings are converted into electrical signals and these are transformed into coded pattern to which the meaning has been assigned it popularly known as "sound recognition". In the present i have focused on the human speech. The difficulty in using speech as input to a computer lies in the fundamental differences between human speech and more traditional forms of computer input. While computer programs are commonly designed to produce a precise and well defined response upon receiving the proper input, the human speech is anything but precise. Each human speech is different and hence identical words can have different meanings if spoken by different persons. Text spotting can be divided into two classes: "template matching "(speaker dependent) and feature analysis (speaker independent). The electrical signal from microphones is accessed and digitized using JAVA program. To determine the meaning of the speech input, the computer program attempts to match the input with a digitized speech sample, or template that has a known meaning. The program contains the input template, and attempts to match this template with the actual input using a conditional statement. The main objective of this project is to prepare a system which could take an audio (speech) as an input and then spot the words spoken. As a standard test data is not available so I will have to generate the test data myself, this was done by asking a speaker to speak a set of random words and storing them in the wave format. The conditions would encompass speaking with various ambient noises around, different pitch

levels and with different intervals in between the two parts of each word. The model then will be used for speech recognition. In this project Firstly I have written a Java program which will take the inputs (Human Speech) from the microphone and saves it on

the secondary storage media in the windows wave format.


Refbacks

  • There are currently no refbacks.