In today’s blog, we explore a new Biometric Technology:  Voice Recognition

The first step in voice recognition is for an individual to produce an actual voice sample.  Voice production is a facet of life in which we take for granted every day, and the actual process is complicated.  The production of sound originates at the vocal cords.  In between the vocal cords is a gap.

When we attempt to communicate, the muscles which control the vocal cords contract.  As a result, the gap narrows, and as we exhale, this breathe passes through the gap, which creates sound.  The unique patterns of an individual’s voice is then produced by the vocal tract.  The vocal tract consists of the laryngeal pharynx, oral pharynx, oral cavity, nasal pharynx, and the nasal cavity (1).

It is these unique patterns created by the vocal tract which is used by voice recognition systems.  Even though people may sound alike to the human ear, everybody, to some degree, has a different or unique annunciation in their speech.

To ensure a good quality voice sample, the individual usually recites some sort of text, which can either be a verbal phrase or a series of numbers.  The individual usually has to repeat this a number of times.  The most common devices used to capture an individual’s voice samples are computer microphones, cell (mobile) phones, and the land line based telephones.

As a result, a key advantage of voice recognition is that it can leverage existing telephony technology, with minimal disruption to an entity’s business processes.  In terms of noise disruption, computer microphones and cell phones create the most, and land line based telephones create the least.

There are also other factors which can affect the quality of voice samples other than the noise disruptions created by telephony devices.  For example, factors such as mispronounced verbal phrases, different media used for enrollment and verification (using a land line telephone for the enrollment process, but then using a cell phone for the verification process), as well as the emotional and physical conditions of the individual.  Finally, the voice samples are converted from an analog format to a digital format for processing.

The next steps are unique feature extraction and creation of the template.  The extraction algorithms look for unique patterns in the individual’s voice samples.  To create the template, a “model” of the voice is created.  In voice recognition systems, stochastic models, particularly Hidden Markov models, have been utilized.  With this type of modeling, statistical profiles are created by comparing various voice samples to determine any repeating patterns.

The final step is verification of the individual.  At this stage, the live voice sample submitted for verification is compared to the statistical profiles created, and a probability score is then computed which describes the likelihood that the individual is who he or she claims to be.

In tomorrow’s blog, we look at some of the applications of Voice Recognition.