Dynamic IVR Versus Chatbots For Restaurant Order-Taking

The motivation for Dymos

Root motivations: cost savings and efficiency
Background
- Earliest known efforts
- How accurate are chatbot approaches?
Hybrid approaches (machine/human)
Reasons for Sub-par Accuracy with Chatbot Approaches
Who Guides the Conversation: Chatbot or Human User?
Ingredients for Successful Order-taking Dialogues
Introducing Dymos: “Dynamic IVR”
Dymos versus chatbots
Dymos versus humans
Conclusion

Dec 14, 2023

Motivations for Chatbot-Style Ordering for Restaurants

What are the chief reasons for building a human-like AI that can take orders for restaurants? For most restaurant chains, and especially for quick-service restaurants the top two reasons are probably cost savings and efficiency during busy times. A cloud-based service that provides an unlimited number of AI agents working in parallel can scale in ways that are hard to match with human labor. Data centers provide the infrastructure to support cloud-based software services, housing racks full of server computers that handle multiple simultaneous dialogues with restaurant customers who engage in verbal conversations with a service. Mobile apps allow human user speech processing that effectively takes place at the “edge” (on the device), with textual representations sent to the remote cloud service. Intercom systems at fast food restaurant drive-thru lanes can handle customer speech input in a similar way. Telephone order taking works in much the same way; the telephone services platform (e.g. Twilio) provides speech recognition as well as speech generation – a sort of client itself of the AI order-taking service. In each case a human customer uses speech and language to have a conversation with a cloud-based AI agent, which uses a combination of AI natural language processing techniques and dialogue management intelligence to assist and guide the human customer through the process of completing an order for food and drinks.

The level of cost savings to the service provider (restaurant(s)) is the difference between labor costs for human order takers versus cloud service costs. A one-to-one comparison may only show a small difference, whereas the cloud service advantage increases as customer numbers (demands on the service) increase. (This assessment is based on current public cloud platform service costs for computational processing and data storage as of 2023).

If we assume an equivalent level of quality of order-taking service for the human approach versus the machine: the efficiency of AI in the cloud quickly outmatches that of human order-takers as restaurant customer demand increases (e.g. multiple simultaneous incoming phone and/or digital orders). Cloud services are designed for the express purpose of scaling upwards to meet computational demand whereas retail business such as restaurants are hard-pressed to rapidly increase human resources to meet peak demand. Call centers are uniquely capable of scaling upwards in this regard but this is within obvious limits.

Thus, if an equal outcome is assumed, decision-makers of many restaurants and chains have shared objectives: cost savings for the company and ultimately the consumer as well as greater effectiveness by serving greater numbers of simultaneous customers. It is recognized that for some companies the prospect of using machines – the robotization of the order taking task – is understandably incongruous with other objectives such as preserving the “human touch”.

Background of Voice and Natural Language Understanding for Restaurant Ordering

Voice activated assistants like Siri, Alexa, and Google Assistant popularized the speech and language modality for many forms of user interaction, but to what extent have AI agents with chatbot capabilities penetrated the domain of ordering at restaurants?

As early as 2014 Domino’s deployed “Dom”, a voice controlled pizza ordering sidekick for their mobile apps. Dom used speech and language. In 2017, Starbucks debuted voice ordering within the mobile iOS app and for Amazon Alexa.

Lacking sufficient data regarding the order-taking accuracy of the Domino’s and Starbucks initiatives, we fast forward to 2019, when McDonald’s purchased AI company Apprente, incorporating it into McD Tech Labs and field testing the technology at multiple locations in drive-thru lanes to enable AI voice ordering. The reported accuracy for this effort during 2021 was evidently around 80%, and a year or so later had apparently not improved. It is unknown to this author whether or not this initiative is still underway, but the lack of widespread adoption is an indicator that a wildly successful outcome was not achieved.

More recently, a joint effort involving Google and Wendy’s has apparently achieved an 85% order accuracy/completion rate. While notable, this still falls short of a standard that had been set by the CEO of McDonald’s and which most would agree is fairly reasonable – that of a 95% order completion rate as handled independently by the AI service.

Several companies have reported better accuracy/completion rates for voice ordering but to date the author is unaware of any that have achieved industry recognition for having “arrived” with the technology.

The Hybrid Approach: Clear Winner?

For at least three major efforts at separate companies, it has been revealed that their order-taking approach is actually a hybrid – i.e. they employ humans who form a resource pool that can take over mid-process for orders that are failing – either based on detected criteria or at the customer’s request. Although this hybrid approach is effective it is not optimal – first, the customer experiences a transfer during the dialogue, and second, the necessity for providing human resources as backup reduces the overall cost savings. Hybrid approaches may thus be viewed as a step along the way towards the ideal of fully-automated solutions. The industry needs an automated agent that can independently handle order-taking end-to-end.

Reasons for Sub-par Accuracy with Chatbot Approaches

An AI agent, or chatbot, that can handle 100% of the orders from customers (such that the user stays with the dialogue until the order is completed) may be said to have a 100% completion rate. For simplicity we use the term “accuracy”. Most such AI agents will start the order-taking process with a question along the lines of “welcome to our restaurant, what can we get for you today?”. Typical user responses involve either requests for a single item or requests for multiple items, and the user may qualify their requested items with modifiers such as “large”, or “two”, etc.

Reasons for dialogue abandonment may include:

the user’s initial request is misinterpreted by the chatbot.
the chatbot fails to ask follow-up questions to clarify all modifiers (attributes) for an item. E.g. the user asked for a “diet coke” and the chatbot did not ask the user to choose “small”, “medium”, “large” etc. The result is that the user gets a different size than desired.
speech recognition errors: the user’s voice input is not recognized with sufficient accuracy to determine the natural language text equivalent (this may also occur when extraneous speech occurs in the immediate vicinity of the customer, as can happen, e.g. with several people speaking at once in a vehicle).
general NLU (natural language understanding) errors: at some turn during the dialogue the user says something that gets misunderstood by the chatbot. This can lead to any of a number of outcomes depending on how the chatbot handles it. For instance some chatbots would ask a clarifying question, after which the user repeats their input, and the dialogue can be salvaged; in other cases the chatbot might misclassify the user input and create an order summary with wrong items or item modifiers.

Due to the unstructured (“free form”) nature of most chatbot dialogues, other reasons for failure are too numerous to mention.

Who Guides the Conversation: Chatbot or Human User?

It is worth noting that such dialogues involve an underlying tension: initially the chatbot usually asks the user to describe or name whatever they want for the order; the user responds – usually with a sentence of the form “give me a <first item>, and two <second item>”, for instance. The chatbot has effectively “handed control” to the user for that first turn. However, subsequently during the dialogue, the chatbot by necessity must take back control and guide the conversation with the user for any items the user has requested, that need their attributes clarified (e.g. small, medium, etc.). Finally, the chatbot must always resume control as the guide of the conversation at the final steps, e.g. when it asks “would you like anything else?”. Some further follow-up steps may exist and also involve the chatbot guiding the dialogue as it handles customer payment and perhaps details of pickup or delivery.

Ingredients for Successful Order-taking Dialogues

The context of a chat-style approach to order-taking differs from visual approaches most significantly in that the AI service cannot assume that the user or customer has visual access to the menu or other information of the restaurant. This implies that the user may need assistance via prompting by the AI agent regarding available menu items or regarding other information relevant for the order.

The ingredients needed for a successful order-taking dialogue include:

Prompt the user sufficiently so that the user knows what to say at each turn
Elicit input from the customer that names or identifies each main item (along with all sub-items) of the order. E.g. the customer wants a cheeseburger combo meal with a diet Coke and fries.
Elicit additional input from the customer that describes or clarifies all attributes of each food or drink item. Examples include drink sizes, pizza crust type, etc.
Confirm order contents with the user and prompt the user as necessary for additional order items
Prompt the user as needed for order “metadata” such as pickup or delivery details and payment information

It can be seen that the objectives of restaurant order-taking are fairly consistent and that there is a substantial amount of underlying structure. Indeed, the experience of the author has confirmed this – the most common departures from a structured dialogue experience involve “pop-up” user questions such as “does that come in a large?”. Such questions aside, the problem domain seems well-suited for a conversational approach that is guided by the AI agent service rather than by the human user.

Introducing “Dymos”: Dynamic IVR that Provide a “Guided Ordering Experience” for the Customer

“Dymos”, for “dynamic IVR ordering service” is the name of our new cloud service for restaurant order taking. Dymos provides a “guided ordering experience”. The guided ordering approach is characterized by turns wherein the user has been presented with a set of numbered options (“multiple choice questions”) from which a selection must be made. Although numeric options are usually presented to the user, verbal user input can in fact be non-numeric and in some cases as with descriptions of delivery addresses it must include words such as street names and city names.

Dymos, as offered by Software Engineering Concepts, Inc., is the first known embodiment of guided ordering for this area.

Dymos goes beyond traditional IVR techniques via the use of classic AI techniques in limited ways. Fundamentally, Dymos’s reference knowledge is database-based. While specifics are beyond the scope of this article, highlights include:

Natural language understanding (NLU) algorithms are used to process non-numeric user input
Reference knowledge about restaurant menus is stored and presented to the user in a helpful format. This reference knowledge is organized using a highly structured approach that represents a menu ontology along with menu item instances.
Advanced dialogue management algorithms allow the Dymos agent to guide the dialogue towards the objectives of each order-taking instance.

In addition, Dymos solves the “menu-absent” problem that is inherent to conversational order-taking by providing sufficient menu detail to customers at each turn.

Dynamic IVR Versus Chatbots

The reader is probably familiar with takeout-style restaurants such as sub shops and poke restaurants where the restaurant’s servers guide the customer who stands at a walk-up counter. It is understandable that such an approach eliminates any possible confusion regarding the outcome – the finished sub or poke bowl has been prepared before the customer’s eyes, with 100% accuracy.

Dymos is similar – because the customer is guided from start to finish, the expectation of perfect accuracy becomes much easer to fulfill.

Does Dymos concede any ground to competing order-taking chatbot approaches? There are a few possibilities: with chatbot approaches the user is able to provide more information during the first turn by responding with a full sentence that describes multiple menu items with multiple attribute values – this perhaps gives the chatbot approach an advantage with respect to speed of the order-taking process. A second area involves user pop-up questions as mentioned above (e.g. in the middle of an order the user asks “how late are you open?”). Since the Dymos service platform actually has inherent underlying capabilities for such pop-up questions, future versions may include this feature.

Dynamic IVR Versus Humans

Having noted the built-in capability of cloud-based services to scale to meet demand, it can be stated that Dymos has an inherent advantage in its ability to scale to handle multiple customers across multiple restaurants.

Dymos has encyclopedic knowledge of a restaurant’s menu – this also includes any and all dynamic changes to its existing information. Such updates can be made by a restaurant manager using the Dymos online dashboard. Menu price changes, specials, discounts, etc. are all instantly updated in the reference database used by Dymos. It could be said that this capability for dynamic knowledge is difficult to attain by restaurant employees or call center specialists.

Conclusion

Dymos is not for every restaurant: Dymos is better suited to QSRs (“Quick Service Restaurants”), fast casual restaurants, and to pizza shops, sub and sandwich shops and other restaurants that offer takeout/pickup and/or delivery. The underlying Dymos technology is multi-channel (clients can be online/web, app, telephone, tablet, kiosk, or intercom at drive-thru lanes).

Dymos is currently being positioned as a telephone-based solution.

Interested parties may contact us to learn how the Dymos API may be leveraged for your situation.

Please contact us at info@softwareengineeringconcepts.com if you think Dymos is a good fit for your restaurant or chain.