Not everything that appears artificial is artificial. Take “artificial intelligence” (AI) or “machine learning” for example. Both terms convey to us that something is artificial, created by machines, in this case by “software.” If we ask ourselves about the “non-artificial” factors in AI, we usually think of the inventors – the software developers who get the applications off the ground. We hardly ever take a deeper look behind the scenes of AI. Because if we did, we would realize that behind many of the applications are people who train the software by loading it with data. Only with this essential human support can the AI, for example, recognize a car or distinguish houses from trees on satellite images. The working conditions of these people – known in the industry as “clickworkers” – are often precarious because, even in the AI business, rich countries are keen to outsource work to poor countries.
Dr. Milagros Miceli’s research takes her right into the inner workings of machine learning, focusing on the working conditions of clickworkers in the countries of the Global South who make our AI look so machine-like and artificial, even though in many places this is far from being the case. Sociologist and computer scientist Milagros Miceli heads the Data, Algorithmic Systems, and Ethics research group at the Weizenbaum Institute. An alumna of TU Berlin, she completed her doctorate at the University in 2022. The unjust treatment of data workers was also the focus of her doctoral thesis, which she wrote at TU Berlin’s Chair of Internet and Society.
At the symposium “Critical Stances Towards AI: For a Critical and Self-Determined Approach to Digital Technology” she offered insights into her research.
In this interview, Milagros Miceli talks about the role of human workers in the AI industry and the research conducted in this field.
What exactly is the work done by data workers?
We coined the term “data work” to refer to what others have called micro work, click work, or ghost work. In my opinion, these other designations convey the idea that this kind of work is small-scale. Data work refers to the work involved in producing and maintaining training-data for machine learning. It includes tasks such as data generation. It ranges from collecting publicly available data from the internet to workers writing short texts, recording their own voices, or even uploading videos and pictures of their homes, families, cities, friends, or their own faces. Another task is the so-called data annotation. This involves interpreting data and labeling it according to predefined categories. Data workers also verify algorithmic output. This includes, for example, reviewing search engine results or classifying machine-generated recommendations in terms of their usefulness or appropriateness. And, last but not least, workers sometimes pose as AI to users. We refer to this as “AI impersonation”: It is often used with chatbots or so-called “smart cameras.”
Is it possible to say how many people work as clickworkers globally, in which countries this work is mostly performed, and how the system is structured? What is so unjust about it?
It is difficult to determine the exact number of people working as data workers around the world, since this type of work is often performed online, in various forms, and on different platforms. The platform user numbers can only tell us a little about the exact number of workers because, although many people register with the platform, only a few of them get offered regular assignments. Considering the research into the number of people working online and the user base of the platforms, I estimate that the number of people doing data work could be as high as around ten million.
The number of data workers is on the increase around the world, as this type of work can be carried out in many countries, so long as an internet connection is available. However, countries with a well-developed online infrastructure and a larger pool of internet users tend to have more clickworkers. Countries where clickworking is particularly prevalent include the USA, Venezuela, India, the Philippines, Bangladesh, and Kenya. Clients are constantly on the lookout for low-cost labor. Platforms such as Amazon Mechanical Turk and data companies such as Sama are targeting countries with high unemployment rates. Places where economic, environmental, and political crises make workers dependent. Platforms, data companies, and especially clients often benefit from the cheap work provided by data workers and thereby generate large profits, while the data workers themselves work under poor conditions.
The latter are often remunerated on a “pay per task” basis, with pay varying according to the nature of the task and its complexity. But, most importantly, pay varies according to the workers’ locations. OpenAI, for example, used the company Sama to outsource the data work for ChatGPT to Kenyan workers. The work to be performed was a classification or annotation task: Workers had to read, interpret, and separate text fragments. Many of these text fragments included graphic descriptions of violence, abuse, torture, and murder. Naturally, tasks such as these are not pleasant. OpenAI or Sama paid the Kenyan data workers less than two dollars per hour; and after the workers had worked with this highly disturbing material, the companies failed to offer them any form of psychological support whatsoever.
What exactly does your research involve?
My research is far-reaching and intense, and this is certainly due to my training as a sociologist. I find it unacceptable to write about precarized workers from the comfort of my study. Therefore, much of my research consisted of fieldwork, that is, I visited the places where data work takes place and spent time with the workers. For several years now, and after I was able to secure funding, I have been able to carry out my studies in close collaboration with the data workers themselves: They are involved as co-researchers, and we jointly develop the research questions, conduct the study, and finalize the results. We are currently conducting research with data workers in Syria, Argentina, Brazil, Kenya, and Germany.
What is the state of research in this field? And what does the exchange with US researchers, in particular, look like?
I started researching six years ago when almost no one was talking about the human work behind AI systems – especially not from a computer-science perspective. In the meantime, our research is considered groundbreaking in this field and there have been numerous press articles reporting on our results. Over the years, I have worked with many researchers from the US and other countries. Collaboration has been particularly close with Julian Posada of Yale University and Adriana Alvarado of IBM Research, both of whom also attended the symposium in New York. In addition to heading a research group at the Weizenbaum Institute, I also work at the DAIR Institute founded by former Google Ethical AI head Timnit Gebru. This is a non-profit institute with the goal of being proactive with AI and finding ways to use it for the benefit of people where possible, warning of potential harm, and blocking it when it does more harm than good.
What measures must be taken to improve conditions for the workers? What can each and every one of us do?
The crucial point is to understand that data work is not low-skilled work. In most cases, data workers have specific expertise ranging from proficiency in multiple languages to geographic expertise. Many of them have university degrees. All of them are experts on the data they work with. Clients benefit from this specific expertise. Viewing data workers as experts rather than low-skilled workers can help change the narrative. This may influence policymaking, wage negotiations, and even the quality of data sets and AI systems.
In many cases, educational and research institutions are contractors for data work. This means we can all play our part in combating precarious working conditions. For example, by ensuring that our research budgets allow for fair pay and give data workers enough time to complete their tasks. It is also important to keep in mind that a single bad review can cost them their jobs. Data workers are often an essential part of AI research. The way we treat them should be considered a matter of research ethics. In the same way that institutional ethics committees monitor how researchers treat research participants, they should also monitor how data workers are treated in academic research projects. We can all advocate for this at our own institutions.
Author: Bettina Klotz