In Venezuela, María spends her early mornings labeling short video clips of illicit content so that an AI model can learn what not to show you online. Each task could pay less than one cent. She has no idea which model her work trains, only that her accuracy determines whether she will be paid at all.
AI may be the world’s newest export, but the labor behind it looks remarkably old: repetitive, underpaid, and outsourced. Before a model can reason, someone must teach it what to say and how to say it. That “someone” is rarely a well-paid researcher in Silicon Valley. More often, it’s a gig worker in the Majority World, or areas outside the US, Canada, and England, tasked with labeling repetitive content for cents per task. Many work invisibly in so-called “data farms,” crowded digital sweatshops where thousands perform the same task: they’re all training AI to mimic human reasoning.
When companies outsource annotation work to regions with weak labor protections, cost-saving mechanisms come at a price. In fact, outsourcing to third-party and regional vendors is a well-known method of risk mitigation utilized for training frontier models. Instead of adhering to gig work regulation in the country of origin, companies shift the legal burden to the local, regional vendors themselves. Regional annotation vendors typically have fewer or more relaxed quality assurance mechanisms. Frequently, the industry sees poorly conducted annotation emerging from these vendors due to a variety of issues: poor working conditions, lack of contextual country- and language-specific clues, or over-prescription to western cultural norms due to leading or minimal instructions.
The global data annotation workforce is highly concentrated in East Africa, South Asia, and Latin America. According to a 2023 World Bank Report, online gig work comprises up to 435 million workers globally, or 12% of the global workforce. This distribution at first glance reduces costs for AI firms, but also undermines data quality and ethical standards. Annotators often work without proper training, contracts, or vetting. The unevenness across geopolitical lines magnifies bias: datasets reflect narrow cultural perspectives, and those biases migrate into the systems that shape daily life, from drafting CVs to recommending job candidates.
Industry magnates argue that higher-quality data can be secured through better management and training without overhauling the global labor model. Centralized platforms, they suggest, are efficient at scaling data collection and ensuring some degree of quality assurance. But efficiency is not the same as equity, or safety. As long as incentives favor speed and low cost, annotation practices will prioritize volume over fairness or representation.
A better approach lies in building decentralized data annotation ecosystems. Instead of funneling tasks through centralized platforms, communities can participate directly in the creation and governance of datasets, with knowledge of what kind of models their work trains. Mechanisms for consent, local oversight, and equitable pay can embed diverse cultural and linguistic perspectives at the point of data creation, countering the dominance of narrow, Western viewpoints, centralized in the English language.
At the regional level, the European Union has moved furthest toward binding rules with its 2024 Platform Work Directive, which creates a presumption of employment for many platform workers and requires algorithmic transparency in task allocation. No equivalent regulatory framework exists in Latin America, where fragmented labor codes often exclude digital freelancers. The Comunidad Andina (CAN) and MERCOSUR trade blocs have yet to establish common standards on digital platform labor, even when partnering with the EU on transatlantic trade blocs, leaving gig workers exposed to the dynamics of global labor arbitrage.
International regulatory bodies are likewise poised to play a stronger role. The International Labour Organization (ILO) has recognized these risks through its AI for Fair Work campaign, which spotlights an action research methodology “designed to hold companies within the AI production network accountable.” Even further, as recent as June 5th, 2025, the ILO convened to draft a binding agreement on what is considered “fair work” in the platform economy. With robust cross-border standards, the international community can begin to shift the system away from gray area, low-oversight, low-pay models and toward practices that are safer for workers and safer for society.
Making the labor that sustains artificial intelligence visible requires more than acknowledgment. Regulation is one step, but reform must also reach the data itself. Rather than relying on opaque, centralized pipelines, data ecosystems can be designed for co-governance, democratizing how annotation work is valued and rewarded. Transparent allocation, shared oversight, and local quality checks can embed equity at the point of data creation. Recognizing this labor, and building the infrastructure to value it, is essential if AI is to reflect the world it claims to model.
The views expressed herein are my own and do not necessarily reflect those of my employer.