This is part of a series of posts on How Clara’s human-in-the-loop system works behind the scenes.


The Clara worker lifecycle. The sandbox block is the subject of this article.


The Clara human-in-the-loop (HITL) platform combines the best of machine and human intelligence to gracefully handle the nuance of natural conversation. Unlike other HITL systems that are geared toward offline data annotation, Clara’s human workers are actually integrated into the real-time product itself. We source, onboard, and maintain the workforce as part of the platform.


While the number of workers on our platform grows more slowly than the number of customers or the volume of email they generate, we nonetheless need to scale the workforce supply over time to handle increasing demand. New workers on the platform can take several weeks to spin up and in the worst case, new workers can negatively impact customer experience. This led us to build what we call the sandbox practice environment.


The sandbox allows Clara’s human workers to:


  • never commit production work, protecting customers from mistakes and leaving them free to explore our tools


  • compare their work with that of a production worker


  • use the same latest and greatest software we’ve built for production workers


  • practice doing new kinds of tasks we build into our software


In the remainder of this post, we’d like to tell you a little bit about how this system works and our learnings after deploying it into production.




You can think of the software our workers use as a workflow tool that enables consistent scheduling for and from anyone, anywhere. This software structures the data-entry work that must be done to handle a message during a scheduling negotiation. At the same time, our platform also seamlessly integrates machine learning predictions and automation results for review and correction by human workers. As you can imagine, this leads to many specialized tools that a worker must become handy with in order to be effective. Examples include:

  • annotation tools for selecting, labelling, and reviewing predictions


  • automation review


  • meeting context (a message viewer) and customer context (a preference viewer)


  • “override” tools for manual scheduling, such as an enhanced calendar


It takes a worker some time to become efficient at working with the full suite of facilities we’ve built, and without sufficient practice using these tools will lead to a degraded customer experience. Indeed, the combination of our unique tools with the strong incentive against making mistakes can lead to what we call “freak-out factor” when new workers first join the platform.


The sandbox practice environment


All new workers enter the sandbox after sourcing, testing, and other qualifications. The sandbox, depicted in the diagram below, works as follows.


The sandbox architecture. Tasks are replicated and used for practice by candidate workers.

Before a task is assigned, all the of the state required to service the task is replicated. In the most extreme case, this may include parts of the conversation thread and the calendar (at the time of processing) needed for context. The task is then distributed to a production worker as well as any sandboxers ready to pick it up when it becomes available. We jokingly call the ensemble of possible outcomes that arise from a single state snapshot the “scheduling multiverse.”


After the task is consumed and processed by the sandboxers, the production worker’s results are committed and move on to the next stage (e.g., forming Clara’s final output, review by another worker, or combined with other tasks as needed) while the sandboxers output is persisted but unused.


Feedback is provided to sandboxers in several ways. First, each sandboxer receives a personal slack notification in real-time after they complete a task, telling them the differences between their submission and the production worker’s committed result. Second, sandboxers can see an aggregate score comparing their work to production work. Third, our worker success team helps bring promising workers up to speed by reviewing sessions, surfacing coachable moments, and helping define operating guidelines.


We’ve also used the sandbox to seed “gold” examples with answers known a priori to enable quicker practice and coaching cycles.


System impact


After deploying this practice environment we started to see results immediately. For one, new candidate workers no longer negatively impacted the customer experience while they were learning how to use our tools. At a high level, we found that workers could convert to production two weeks ahead (in terms of accuracy) of the cohorts before them. Here are some other key takeaways.


Reduced freak-out factor

Before we had the sandbox environment, workers would experience freak-out factor from the pressure of learning a new set of tools while simultaneously being cautious about their accuracy on production. After introducing the sandbox we noticed a marked improvement in the key metrics at the top of the worker funnel. The figure depicts the funnel in terms of completed work for fixed sourcing strategies. It is evident that more workers start processing tasks than before we rolled out this feature.



23% better starting accuracy


How much more accurate are new workers once they get on the platform? The following figure shows the percentage increase in sandboxer mistake-rate vs. non-sandboxer as a function of the number of tasks they’ve processed. The figure shows a 23% improvement in accuracy over their predecessors for the first sessions they start processing. The downward trend indicates the convergence in performance between the two types of workers after many tasks have been processed.



2x higher graduation rate


We say a worker has graduated when they exceed an accuracy threshold after some fixed time on the platform. If workers do not graduate, they are let go. When we compare the graduation rate of sandboxers vs. the graduation rate before the sandbox (for the same accuracy standards), we find that two times more workers make it to this final stage of the funnel today.




Ideally the sandbox output could be compared exactly with the output of production workers. In our platform today, this is only true of some of the tasks in our system, like annotation tasks, that are non-subjective in nature. Other tasks may require judgment calls by the workers or may have multiple reasonable solutions. In these cases, the sandbox provides a facility for practice but the impact of the feedback loop is lessened.


Further, spending time practicing in the sandbox is a substantial upfront investment for a worker, so appropriate incentives must be paired with such a strategy.


Clara is growing

We’re excited to grow our platform and team. Find out more about open roles here:

Learn more about our CEO Maran Nelson here:

and read about our recent $7M series A announcement here: