How Automatic Speech Recognition Works and Learns From You (Infographic)

Ivan Widjaya·October 21, 2014

The development of automatic speech recognition technology (ASR) and its complementary system known as interactive voice response technology (IVR) have both been major milestones in improving the way in which our machines communicate with us and vice versa.

These technologies have both been used in places such as voice recognition systems for computers (particularly for the disabled), more efficient text editing software and, most notably the designs of many modern smartphones and other mobile devices.

Samsung S-Voice and Apple SIri — photo credit: Mike Lau

A major and famous example of this last use of ASR and IVR is the Siri Interface found in newer model iPhones by Apple.

Now that we know what we’re talking about (and you’ve almost certainly seen these technologies in use at least at some point), let’s see just how the dual marvels of ASR and IVR manage to work so accurately when we speak to them.

Also, if you want even more information and better context on these fascinating technologies, check out this excellent infographic from the people at West Interactive.

How Automatic Speech Recognition (ASR) works - infographic

Table of Contents

The Basics of ASR Technology

The core process by which automatic speech recognition intuits what we’re saying to it in a meaningful, responsive way follows the following fairly straightforward steps:

You speak to an ASR enabled device
The device creates a wave from the sounds you make
The ASR software then cleans up background noise and normalizes sound volume
The resulting filtered wave form (clean sound sequence of what you said) is broken down into what are called phonemes. (These are the essential sounds that form the letters of our words, there are 44 of them in English)
Each phoneme is like a single link in a chain and by analyzing them in sequence, your device deduces complete words and then whole sentences that it “understands”.

Examples of How ASR is Applied

There are all sorts of uses for ASR technology but its two primary subdivisions can be labelled as “directed dialogue conversations” and “natural language conversations”

Directed dialogue conversations: These represent a simpler form of ASR/IVR technology in use and are found in situations where a computer voice asks you to select from a limited menu of word choices that it understands. A good example of this that we all know is your typical automated online banking menu.

Natural language conversations: These are a considerably more sophisticated form of ASR and represent systems with which we more openly interact in what is a basic type of conversation. Siri from the iPhone is an excellent example of natural language chats at work.

So How Does Natural Language Work?

Making natural language conversation work effectively is very hard. A typical 60,000 word vocabulary of a natural language ASR program can have as many as 216 trillion possible word combinations!

Thus, in order for your ASR program to know what you’re trying to say, what it does is react to a certain preselected list of tagged keywords that give it context for the gist of what you’re asking it. For example, if you say the word “forecast”, it will deduce that you’re likely also saying “weather” instead of “whether” and thus want a weather forecast.

This is the essential algorithm of natural language ASR at work and it becomes more complicated with larger word vocabularies and keyword lists, thus requiring more training.

The Tuning Test: How ASR is trained to “learn” from you

Any ASR system can either be “tuned” (trained) by humans or can be made to learn on its own on the fly through what is called active learning.

Human Tuning: This consists of human programmers manually reviewing the conversation and word logs of an ASR system to identify which new words and phrases have been used more often and then adding them to its dictionary as means of “teaching” the ASR.

Active Learning: This is a more sophisticated learning process in which your device’s ASR/IVR system is programmed to store and analyze data from past conversations and adopt it to new verbal exchanges. Thus, your ASR learns to adopt to your specific speech patterns and interpret them contextually. For example, the system might see that you repeatedly cancel the auto-correct on a certain word and thus learn to interpret that word as “correct” in future conversations.

Opportunities

Part-Time Nursing Work: A Path To Professional Growth

Starting an Auto Transport Broker Business

Everything You Need to Know About Becoming a Medical Scribe

Marketing

The Benefits of Outsourcing Your PR Needs to Public Relations Agencies

The Growing Importance of Press Release Services in the Digital Age

How a Brand Ambassador Hub Can Drive Growth

Management

6 Tech Hacks You Must Consider for a Winning Resort Experience of Your Guests

How a Brand Ambassador Hub Can Drive Growth

Top Trends in Employee Benefits: What to Include in Your Package

Technology

6 Tech Hacks You Must Consider for a Winning Resort Experience of Your Guests

How APIs Enhance Application Functionality

Finding Out the Best Internet Connection for Streaming Movies in 2024

Miscellaneous

Strategies for Unlocking Competitive Advantage

Bulk Storage Containers: The Ultimate Guide to Maximizing Space

How Rotem Eylor of Republic Floor Created a Thriving Company that Attracts Top Talent and Drives Continuous Growth

How Automatic Speech Recognition Works and Learns From You (Infographic)

The Basics of ASR Technology

Examples of How ASR is Applied

So How Does Natural Language Work?

The Tuning Test: How ASR is trained to “learn” from you

10 Essential Characteristics of Successful Entrepreneurs (Infographic)

3 Notorious Productivity Killers and How to Fight Them (Infographic)

The Ultimate Guide To MarTech (Infographic)

What Uses The Most Energy In Your Office? (Infographic)

6 Tech Hacks You Must Consider for a Winning Resort Experience of Your Guests

The Benefits of Outsourcing Your PR Needs to Public Relations Agencies

The Growing Importance of Press Release Services in the Digital Age

How APIs Enhance Application Functionality

How a Brand Ambassador Hub Can Drive Growth