AI & Work

Who Trains the Trainers? Inside the Shadow Workforce of AI

Who Trains the Trainers
Who Trains the Trainers

By
Stuart Kerr, Technology Correspondent, LiveAIWire

Every time an AI system correctly identifies a pedestrian in a
camera feed, accurately transcribes a difficult accent, or appropriately
refuses a harmful request, a human being made that possible. Somewhere, at
some point, a person looked at an image, listened to a recording, or read a
text and labelled it, rated it, or flagged it. This work is invisible in most
accounts of artificial intelligence, but it is foundational to almost
everything the technology can do.

The global data annotation and content moderation workforce is
estimated to number in the millions, scattered across dozens of countries,
working through platforms that connect task supply to task demand at
industrial scale. These workers are the shadow workforce of AI: indispensable
to the systems that are reshaping the economy, but largely excluded from its
benefits, poorly protected by labour law, and almost entirely absent from
public discourse about artificial intelligence.

What Data Workers Actually Do

The tasks that constitute AI data work are diverse. At the simpler
end, workers draw bounding boxes around objects in images, transcribe audio
clips, or answer questions designed to generate training data for language
models. At the more complex end, they assess the quality of AI outputs, judge
whether a model response is accurate, helpful, and safe, and provide the comparative
rankings used in reinforcement learning from human feedback. The most
psychologically demanding work is content moderation: reviewing material
flagged by automated systems for potential policy violations, including
graphic violence, child sexual abuse material, extremist content, and
self-harm material.

A 2023 Time magazine investigation into the content moderation
workforce at a facility in Nairobi working for a major AI developer found
workers earning less than two dollars per hour while reviewing some of the
most disturbing content on the internet. The investigation prompted
significant public debate and several legal proceedings, and remains one of
the most detailed accounts of conditions in this sector.

The Global Geography of AI Labour

AI data work is distributed across the globe in a pattern
reflecting labour cost differentials and the availability of specific
language and cultural competencies. Countries including Kenya, the
Philippines, India, Venezuela, and Pakistan are significant hubs. Platforms
including Scale AI, Appen, and Sama connect this workforce to technology
company clients. The International
Labour Organisation’s research on digital labour platforms

documents the precarious nature of employment: tasks allocated
algorithmically, earnings fluctuating unpredictably, access to work
suspendable without appeal, and most workers having no access to social
protection including sick pay, maternity leave, or pensions.

What This Means for You

Every AI product that consumers use in daily life has been shaped
by this workforce. When you ask an AI assistant to help you draft an email
and it declines to assist with a harmful request, that refusal reflects the
work of human raters who assessed similar outputs and trained the model to
behave appropriately. The ethical properties of AI systems are not emergent
from the technology itself; they are the product of human labour applied at
scale. As LiveAIWire has explored in coverage of AI
and the gig economy
, the structural features that make AI data work
exploitative are features of platform labour more broadly: algorithmic
management, misclassification, and the transfer of risk from platforms to
workers.

The RLHF Problem: Who Are the Humans in the
Loop?

One of the most consequential forms of AI data work is the human
feedback that trains large language models through reinforcement learning
from human feedback. Human raters presented with pairs of AI outputs judge
which is better across dimensions including accuracy, helpfulness, and
safety. Their judgments reflect their values, their cultural context, and the
economic pressures they operate under. The Algorithmic Justice League
and others have documented how cultural and linguistic assumptions embedded
in training data and human feedback shape AI system outputs in ways that can
disadvantage users from different cultural contexts. These are not random
errors; they are systematic patterns reflecting whose judgments were used to
train the system and whose were not.

Organising for Better Conditions

The shadow workforce of AI is not passive. In Kenya, Uganda, and
the Philippines, workers at data annotation facilities have formed
associations, staged work stoppages, and engaged in legal proceedings to
challenge their employment terms. In the United States, content moderators at
companies including Cognizant and Accenture have organised through unions and
filed lawsuits alleging inadequate mental health support. Technology
companies have responded with improved support services and increased pay at
facilities under public scrutiny, but critics argue these measures are
insufficient relative to the scale of the workforce and the revenues
generated by the systems their labour makes possible.

Toward Visibility and Accountability

The EU AI Act includes requirements for transparency about
training data, and several European countries are developing platform work
directives that would extend employee protections to gig workers. The
structural parallel with those who generate foundational resources for
commercial AI is instructive: in both agricultural data and content
annotation, contributors receive a fraction of the value they generate and
retain minimal control over its use.

The question of how AI development is paid for, in labour as well
as capital, is one of the defining ethical questions of the technology’s
current phase. As LiveAIWire has examined in coverage of AI
and data equity in agriculture
, communities that generate the
foundational data on which commercial AI depends deserve governance
frameworks that recognise their contribution. Research from the Data and Society Research
Institute
continues to advance understanding of how this shadow
economy functions and how policy might address its inequities. Answering
these questions requires acknowledging that AI systems are not built by
algorithms alone.

What Would Fair AI Labour Look Like?

The contrast between the economic value generated by AI systems
and the labour conditions of the workers who enable them has prompted debate
about what fair AI labour practices would actually require. Researchers,
labour organisations, and some technology companies have proposed frameworks
that include minimum pay floors above those dictated by local market
conditions, portable benefits for workers whose employment is contingent and
variable, limits on the psychological harm associated with content moderation
work, and transparency about how worker data is used by the platforms that
employ them.

Some technology companies have made voluntary commitments in these
areas. Google, Microsoft, and others have published supplier codes of conduct
that specify requirements for data annotation vendors, including provisions
on pay, working hours, and mental health support. The enforceability of these
commitments across complex global supply chains is limited, and verification
depends on auditing processes that critics argue are insufficiently
independent. The gap between published standards and working conditions on
the ground, documented repeatedly by investigative journalism, suggests that
voluntary commitments are not sufficient without external
accountability.

The case for treating AI data workers as a matter of policy
concern rather than purely a private contracting matter rests on the same
logic that justifies labour regulation more broadly: the market, left to
itself, does not adequately account for the costs imposed on workers by poor
conditions, particularly when those workers lack bargaining power and their
situations are not visible to the end consumers who benefit from their
labour. The policy question is not whether standards are needed but what form
they should take and who should enforce them.

About the Author

Stuart Kerr is the Technology Correspondent at LiveAIWire,
covering artificial intelligence across society, policy, and industry. About
LiveAIWire
.