Clara’s Promise: Bringing transparency to conversational intelligence

April 2017

  • Per Response Success Rate — 99.2%
  • Median Response Time — 15.2 minutes
  • 80th Percentile Response Time — 40.4 minutes

2016 was the year of bots, conversational interfaces, and intelligent assistants. It showed the world a glimpse of the promise that exists in an important upcoming computing paradigm we call conversational intelligence. It also reminded us that building and evaluating conversational intelligences is hard. How are you supposed to know whether you can trust one, and with what tasks? Will the respective intelligence respond gracefully, or awkwardly? Who will win user trust and introduce these potentially life-changing interfaces into our daily lives?

At Clara, we know that anything less than near-perfect isn’t good enough for a successful conversational interface. Over the last three years, we’ve defined, built, and delivered on the exceptional level of service that our customers require and deserve. Having now managed several thousands of calendars and scheduled more than a million meetings, we’re ready to introduce a public promise based on what we’ve learned matters most to winning user trust. No one in the industry is capable of meeting a bar as high as the one we’ve set with Clara; no one failing to meet this bar can succeed.

Publishing quality metrics is standard practice for all sorts of established business services and products. It’s something consumers and businesses have come to expect. What will we do to help consumers evaluate the intelligence of this new category of interface?

Today we’re proud to announce Clara’s Promise: a public commitment to publishing our internal quality metrics on a monthly basis, and the thresholds we strive to uphold. We believe its parameters can be a useful heuristic when assessing your interactions with conversational intelligences of all kinds: Clara (machine-human), human, and machine-only. The spirit of Clara’s Promise is to create transparency into Clara’s performance and set the industry precedent for publishing quality metrics.

Clara’s Quality Metrics

Clara’s system-wide intelligence:
- Per Response Success Rate: 97% or more
Clara’s system-wide 24/7 response time:
- Median Response Time: 30 minutes or less
- 80th Percentile Response Time: 60 minutes or less

Measuring intelligence is challenging. We’ve made a tremendous investment to set the standard for Clara’s performance, and to be able to measure it in the above terms. We are publishing these metrics in the infancy of this industry to drive others to higher quality and showcase our hard-earned expertise. Our ambition is to transition these public metrics to an SLA for our customers in the future. We’re eager to see how quickly the industry moves to a place where transparency on quality metrics of conversational intelligence is standard practice.

FAQ

Why does having a Promise matter?

Precisely measuring the quality of work of an intelligent agent (person or system) is difficult and isn’t done widely. Known attempts to do this range from the predominantly qualitative performance evaluations of human EAs to direct measurement of Mechanical Turk task success.

One major advantage of cooperative intelligence over a purely human solution is that quality can be measured at a granular level, and that measurement can and should be exposed to customers. Another interesting advantage is that it can set a standard against which we can compare humans and machines alike and ultimately raise the quality bar on all.

What is an error?

Typically when you think about an error, you think in binaries. Things work, or they don’t work. I clicked the button and got a result, or I got an error. Conversational interfaces are more challenging than most: you have to correctly interpret a huge number of variables at a time to produce an intelligent — fully correct — output.

As a new company in a new space, we are diligently working to clearly define the full scope of potential errors throughout the scheduling process. This means there is an ever-growing list of error categories as we grow Clara’s expertise and release additional features. By providing insights into our learnings we hope to create transparency into Clara’s performance and set the industry precedent for transparency in quality metrics. We will share more details on this in the near future.

How do you know there was an error?

Our Clara Remote Assistants — CRAs — monitor outgoing Clara responses for errors. When an error is found (usually by a CRA, otherwise by a meeting guest that notifies us of the error) a CRA labels the error and its category.

Why do you measure the Success Rate per response?

We realize that success isn’t just determined by a meeting being successfully scheduled. Success is defined by Clara following every one of your preferences, adhering to every grammatical rule, and interacting with your contacts flawlessly every time. If Clara forgets that you only like coffee meetings before 10am, then Clara didn’t deliver. We are setting a high bar on quality and defining quality from your perspective to ensure we don’t just get a meeting on your calendar, but we deliver what you need from a trusted partner every time.

What does a greater than 97% Per Response Success Rate mean in practice?

While we aspire to 100% accuracy in the long term, we believe the goal of a 97% per response success rate is a great starting point given our strict error definitions and focus on your experience as a user. 97% per response success rate is 1.5 errors out of every 50 responses — meaning that a vast majority of responses are error free. Moreover, since CRAs are involved in detecting the errors, we’re often able to fix them before they impact the scheduled meeting or are even noticed by you, making your perceived service quality higher.

How does Clara’s performance compare to a human’s?

Humans and software are talented in different and complementary ways. Software delivers consistency of performance, scalability, responsiveness, and availability; human teammates are irreplaceable as creative and emotional partners. Because Clara combines software with human intelligence, Clara performs with a significantly lower error rate and higher speed/availability than you could achieve on your own, or with the help of a human admin. The intelligence community continues to discover that the most competent problem-solvers in the world are human and machine.

How does Clara compare to 100% automated conversational intelligences?

The latest in machine learning technology still isn’t ready to take on a task as complex as setting up meetings without significant supervision. That’s because understanding human language is a very difficult task for AI/machine learning tools. We believe that 100% automated solutions that attempt to understand human language and take proper action based on that understanding fail at a rate that is not acceptable in such a trust-based task as representing you in your interactions with others.

Why does Clara ever make errors? Can I do anything to help prevent them?

The best thing you can do to help Clara’s accuracy is write concisely and professionally, the same way you would with a human. Another great tip is to reduce ambiguity by clearly explaining your preferences to Clara, just as you would when onboarding any other type of assistant. Clara is a living, breathing system, and its accuracy is improving every day. Just like humans, Clara is learning from its mistakes. That being said, Clara is the most accurate solution on the market and we look forward to understanding the benchmarks used by other conversational intelligences currently in the market.

How is Clara’s Response Time measured?

Clara’s response time is measured by taking the date-time of every incoming message (from you the user or your meeting participants Clara is interacting with) that requires a response and subtracting it from the date-time at which Clara’s response was sent. This means our response time takes into account our AI/machine learning, any human interactions needed to ensure our AI/machine learning is accurate, and our rigorous QA processes.

Do these metrics apply to my account specifically?

No, these are system-wide metrics. Your experience in the short term may vary slightly above or below the system-wide metrics. Your experience in the long term will approximate the system-wide metrics.

How do I know if you’re meeting the metrics you strive for? What happens if you don’t meet the metrics?

We will update our performance on our quality metrics on a monthly basis via the ‘Monthly Performance’ section of this blog post for 2017. After that period we will reevaluate progress of Clara towards meeting these goals, and the industry overall towards better transparency.

If we don’t meet the system-wide metrics we strive for during the previous month, we will inform you of immediate and long-term actions we are taking to remedy the situation.

The spirit of this Promise is to provide transparency into Clara’s performance. We will not at this time be providing automatic refunds or discounts if we fail to meet the metrics. Our billing terms remain unchanged: if you decide to stop using Clara for any reason, please email support@claralabs.com, and we’ll stop billing you starting on the first day of your next billing cycle.

Are there any exclusions to the metrics?

Yes. The metrics do not apply to any unavailability, suspension or termination of Clara (i) caused by factors outside of our reasonable control, including any major event or Internet access or related problems beyond the demarcation point of Clara (for example, the customer’s email service’s outage); (ii) that result from any actions or inactions of the customer or any third party; (iii) that result from the customer’s equipment, software or other technology and/or third party equipment, software or other technology.

Clara’s Quality Metrics — Historical Tracking

April 2017

  • Per Response Success Rate — 99.2%
  • Median Response Time — 15.2 minutes
  • 80th Percentile Response Time — 40.4 minutes

March 2017

  • Per Response Success Rate — 98.0%
  • Median Response Time — 15.9 minutes
  • 80th Percentile Response Time — 42.1 minutes

February 2017

  • Per Response Success Rate — 97.3%
  • Median Response Time — 17.3 minutes
  • 80th Percentile Response Time — 47.3 minutes

January 2017

  • Per Response Success Rate — 97.2%
  • Median Response Time — 18.5 minutes
  • 80th Percentile Response Time — 47.8 minutes

December 2016

  • Per Response Success Rate — 97.2%
  • Median Response Time — 17.8 minutes
  • 80th Percentile Response Time — 63.8 minutes