How Automatic Speech Recognition Works and Learns From You (Infographic)

Ivan Widjaya·October 21, 2014

The development of automatic speech recognition technology (ASR) and its complementary system known as interactive voice response technology (IVR) have both been major milestones in improving the way in which our machines communicate with us and vice versa.

These technologies have both been used in places such as voice recognition systems for computers (particularly for the disabled), more efficient text editing software and, most notably the designs of many modern smartphones and other mobile devices.

Samsung S-Voice and Apple SIri — photo credit: Mike Lau

A major and famous example of this last use of ASR and IVR is the Siri Interface found in newer model iPhones by Apple.

Now that we know what we’re talking about (and you’ve almost certainly seen these technologies in use at least at some point), let’s see just how the dual marvels of ASR and IVR manage to work so accurately when we speak to them.

Also, if you want even more information and better context on these fascinating technologies, check out this excellent infographic from the people at West Interactive.

How Automatic Speech Recognition (ASR) works - infographic

Table of Contents

The Basics of ASR Technology

The core process by which automatic speech recognition intuits what we’re saying to it in a meaningful, responsive way follows the following fairly straightforward steps:

You speak to an ASR enabled device
The device creates a wave from the sounds you make
The ASR software then cleans up background noise and normalizes sound volume
The resulting filtered wave form (clean sound sequence of what you said) is broken down into what are called phonemes. (These are the essential sounds that form the letters of our words, there are 44 of them in English)
Each phoneme is like a single link in a chain and by analyzing them in sequence, your device deduces complete words and then whole sentences that it “understands”.

Examples of How ASR is Applied

There are all sorts of uses for ASR technology but its two primary subdivisions can be labelled as “directed dialogue conversations” and “natural language conversations”

Directed dialogue conversations: These represent a simpler form of ASR/IVR technology in use and are found in situations where a computer voice asks you to select from a limited menu of word choices that it understands. A good example of this that we all know is your typical automated online banking menu.

Natural language conversations: These are a considerably more sophisticated form of ASR and represent systems with which we more openly interact in what is a basic type of conversation. Siri from the iPhone is an excellent example of natural language chats at work.

So How Does Natural Language Work?

Making natural language conversation work effectively is very hard. A typical 60,000 word vocabulary of a natural language ASR program can have as many as 216 trillion possible word combinations!

Thus, in order for your ASR program to know what you’re trying to say, what it does is react to a certain preselected list of tagged keywords that give it context for the gist of what you’re asking it. For example, if you say the word “forecast”, it will deduce that you’re likely also saying “weather” instead of “whether” and thus want a weather forecast.

This is the essential algorithm of natural language ASR at work and it becomes more complicated with larger word vocabularies and keyword lists, thus requiring more training.

The Tuning Test: How ASR is trained to “learn” from you

Any ASR system can either be “tuned” (trained) by humans or can be made to learn on its own on the fly through what is called active learning.

Human Tuning: This consists of human programmers manually reviewing the conversation and word logs of an ASR system to identify which new words and phrases have been used more often and then adding them to its dictionary as means of “teaching” the ASR.

Active Learning: This is a more sophisticated learning process in which your device’s ASR/IVR system is programmed to store and analyze data from past conversations and adopt it to new verbal exchanges. Thus, your ASR learns to adopt to your specific speech patterns and interpret them contextually. For example, the system might see that you repeatedly cancel the auto-correct on a certain word and thus learn to interpret that word as “correct” in future conversations.

Opportunities

Everything You Need to Know About Becoming a Medical Scribe

Exploring the Diverse World of Thailand Wholesale Jewelry

Alternative Assets Investing During Recession: Tips and Ideas

Marketing

How Digital Marketing Made It Easier for Shoppers to Participate at Auto Stocks

Small Business Growth Hacks: Using Kitcast.tv for Affordable and Effective Digital Signage

Why Durability Matters in Promotional Table Covers

Management

The Process Behind Judging and Selecting Winners for Business Trophies

Role of Trucking in Sustainable Transportation

Flying High: Key Components of Successful Aviation Business Operations

Technology

Keeping Your Home Computer Safe: The Importance of Security

PDF Drive Free eBooks

Digital Thinking: Are Phone Calls Still Important to Businesses?

Miscellaneous

Business Owners: Why Estate Planning is Essential

Ashley Stark of Peri Hair Care: Adjacent Entrepreneurship Fosters Sustainable Growth and Scalability

“Find Solar Installers Near Me” service to be launched, according to Solar Power Systems

How Automatic Speech Recognition Works and Learns From You (Infographic)

The Basics of ASR Technology

Examples of How ASR is Applied

So How Does Natural Language Work?

The Tuning Test: How ASR is trained to “learn” from you

10 Essential Characteristics of Successful Entrepreneurs (Infographic)

3 Notorious Productivity Killers and How to Fight Them (Infographic)

The Ultimate Guide To MarTech (Infographic)

What Uses The Most Energy In Your Office? (Infographic)

The Process Behind Judging and Selecting Winners for Business Trophies

Hit-and-Run Accidents: Legal Options for Victims

Role of Trucking in Sustainable Transportation

Understanding Liability in Lyft Accidents: Who Is Responsible?

Keeping Your Home Computer Safe: The Importance of Security