Efficient crowdsourcing with Cost Forecasting - new paper

We have a new paper out:

In this paper, we studied how to make crowdsourcing more efficient when members of the crowd can provide creative input and not merely perform basic, rote work.

What is crowdsourcing?

Crowdsourcing is the process of distributing work to a large number of individuals (the crowd) usually by dividing that work into many small tasks. Labeling images is a classic example: You have millions of photographs of objects that you need to be labeled, perhaps to use as training data for an image recognition neural network. Instead of sitting and slowly labeling these yourself, you use a crowdsourcing platform such as Amazon Mechanical Turk to send the images out to many hundreds or even thousands of people who perform the labeling for you. Generally, performing each microtask (for example, providing a label for a single image) leads to a small payment, although not all crowdsourcing schemes reward the crowd with money. Given the costs of rewarding the crowd (with or without money), we want to be as efficient as possible when performing our crowdsourcing.

Crowdsourcing drudgery and creativity

Being able to divide a large job into many easy-to-distribute microtasks is one of the classic ingredients for a successful crowdsourcing, and clearly-defined microtasks are crucial for using efficient crowdsourcing algorithms. But the vast majority of microtask work is quite rote—it’s drudgery. Labeling images, transcribing audio recordings, filling in true-false or multiple-choice questions and the like. Asking people to perform such purely rote work is limiting: people are capable of creativity in ways that machine learning simply is not. A crowd can provide new and even unexpected information. But if you only ever ask people to perform boring, rote work, you will never have the chance to leverage this innate creative capacity.

There are many examples of creative crowdsourcing. Wikipedia is a crowdsourced encyclopedia for instance. But it’s hard to leverage algorithms for such problems because algorithms need clear microtasks and a task like writing an encyclopedia is highly unstructured and open-ended.

So we ask: is it possible to combine creativity with microtasks and get the best of both worlds, creative input from the crowd and the use of efficient algorithms? Yes, we argue it is.

Combining rote and creative crowdsourcing: challenges and solutions

There is a middle ground between completely rote, structured crowdsourcing (labeling images) and completely open-ended crowdsourcing (writing an encyclopedia) that we call the “Reply & Supply” framework. Here, there are two types of tasks:

  1. Reply tasks, where crowd participants simply complete microtasks (for example, answer an existing true-false question); and
  2. Supply tasks, where participants propose new microtasks for other members of the crowd (for example, write a new true-false question).

While the crowd should work within the Reply & Supply guide rails, they have more creative input in the overall problem than they would with purely rote tasks.

But, of course, there are problems with Reply & Supply. Most algorithms that efficiently assign microtasks to the crowd need a fixed and known number of microtasks. More than that, with a growing set of microtasks, you can quickly become overrun with unfinished work, so growth needs to be controlled 2.

To help address these problems, constraining task growth appropriately is the focus of our new paper. We propose a method called Cost Forecasting for choosing when to request a new microtask versus when to request an answer to an existing microtask. Combining this decision process with an efficient microtask crowdsourcing algorithm then allows algorithmic crowdsourcing to be used when the set of microtasks is growing.

The abstract:

Allowing members of the crowd to propose novel microtasks for one another is an effective way to combine the efficiencies of traditional microtask work with the inventiveness and hypothesis generation potential of human workers. However, microtask proposal leads to a growing set of tasks that may overwhelm limited crowdsourcer resources. Crowdsourcers can employ methods to utilize their resources efficiently, but algorithmic approaches to efficient crowdsourcing generally require a fixed task set of known size. In this paper, we introduce cost forecasting as a means for a crowdsourcer to use efficient crowdsourcing algorithms with a growing set of microtasks. Cost forecasting allows the crowdsourcer to decide between eliciting new tasks from the crowd or receiving responses to existing tasks based on whether or not new tasks will cost less to complete than existing tasks, efficiently balancing resources as crowdsourcing occurs. Experiments with real and synthetic crowdsourcing data show that cost forecasting leads to improved accuracy. Accuracy and efficiency gains for crowd-generated microtasks hold the promise to further leverage the creativity and wisdom of the crowd, with applications such as generating more informative and diverse training data for machine learning applications and improving the performance of user-generated content and question-answering platforms.

Congrats to Abigail, who was amazing to work with!

Check out the paper for more.

  1. Additional links:

  2. There is also an inherent time bias where early tasks have more opportunities to be addressed by members of the crowd than later tasks. ↩︎

Jim Bagrow
Jim Bagrow
Associate Professor of Mathematics & Statistics

My research interests include complex networks, computational social science, and data science.