By Yoon Jae Seo
A clutter of languages and overlapping voices echo through the floor of a typical call center in Noida, India. Employees continuously work, answering queries and solving issues until their timer hits for a short lunch break. Agencies like Innova Communications, one of the estimated 350,000 call centers in the country, manage customer service for a multitude of clients. Call centers like the one in Noida and its employees benefit from multinational corporations’ need to optimize cost and quality of customer service.
Recently, a new type of service is undergoing rapid expansion in the developing world: data annotation agencies. Data annotation is the manual/semi-manual job of labeling individual images. For instance, one may see a cat in the picture and label it “cat.” It also applies to more subjective types of annotations, such as annotating the emotions of voices and facial expressions. Data annotation is essential in the development of services and products that utilize AI—in the supervised learning stage, one can feed such annotations into the AI system. The machine learning algorithm spots patterns for when output is correct or incorrect, and, after enough data, can determine the correct output without prior labeling.
Due to the industry-wide implementation of AI technologies, companies both large and small need a workforce that can consistently provide labeled datasets. As data annotation for most industries does not require a specialized skill or education level, many companies look for the most cost-effective labeling. Unsurprisingly, firms are turning toward the developing world. To meet this demand, a plethora of data annotation agencies have recently been established, contributing to the forecasted compound annual growth rate (CAGR) of 26.9% for the industry over the next seven years.
Critics of data agencies frequently highlight the imbalance of profits shared by these agencies and their employees and the industry’s lack of social mobility. Despite these claims, multinationals favor data agencies over freelance platforms for three key reasons. First, freelancers enforce homogeneity in data sets, which severely undermines the reliability of the labels. A former annotator interviewed by Vice stated that “If your answers differ a little too much from everybody else, you may get banned.” Enforced bias in data annotation is a significant issue in AI training, which requires accurate labeling for optimization.
Second, the current sampling of data that serves as the backbone of AI is biased toward specific races, ethnicities, and origins. Joy Buolamwini’s research analyzing biases in the facial recognition softwares of IBM and Google reported “...Error rates of no more than 1% for lighter-skinned men. For darker-skinned women, the errors soared to 35%.” Concerningly, AI freelancers have also “failed to correctly classify the faces of Oprah Winfrey, Michelle Obama, and Serena Williams.” Such biases result in removal of content deemed acceptable by users but unacceptable by annotators, such as on streaming platforms.
Lastly, corporations prefer contracting with specific agencies rather than freelancers because doing so significantly reduces the operational costs of collecting data. Attempting to collect various labeled data sets from scattered sources and repeatedly reorganizing them is very expensive. If corporations need to alter their plans, there is no need to individually communicate with every freelancer. These agencies also carry a greater sense of credibility, professionalism, and reliability than freelance platforms.
This is not to say data annotation agencies are perfect. Annotation is a tedious and repetitive process, and often mentally draining. Agencies also undeniably take a disproportionate level of profits compared to the meager wages of their employees. However, the level of transferable skills developed by annotators is improving. Samasource, a prominent data agency, requires its employees to achieve proficiency in computer use. Other agencies developed curriculums to enhance their employees’ technical abilities. Such curriculums are doubly important considering many annotators have never used a computer prior to their employment, opening endless opportunities for upward mobility.
Regarding income inequality, annotating companies indeed take a disproportionate amount of income compared to their employees. The question is whether or not this is noteworthy when it is evident that these agencies provide stability and a life-changing opportunity for many of their employees. A daily average wage of $8 is a typical source of income for Samasource employees in countries that these agencies operate in. For comparison, individuals employed in Kenya, India, and Uganda earn $4.98, $5.75, and $2.17 in daily wages, respectively. Considering this, agencies like Samasource provide their workers with a stable and above-average wage (Samasource also claims to have more than quadrupled the incomes of their employees).
Overall, data annotation is being conducted by individuals far outside the core of technological advancement. However, corporations will continue to rely on these agencies, rather than freelance platforms, to supply a continuous source of annotated data due to accuracy and pricing considerations. Although underpaid on a western scale, in return for their work, these annotators will obtain stable pay to support their families and transferable skills that relay opportunity for upward mobility.