Artificial intelligence has always cycled between promise and collapse: the so-called AI “springs” and “winters.” Each wave begins with technical breakthroughs and ends when the limits of data and human supervision become clear. Today, as companies race to scale model training, the same tension is resurfacing. The new frontier isn’t algorithmic at all—it’s human. Who controls and understands the labor behind the data may determine whether this era of AI sustains its spring or slides back into another winter.

These days, it seems like everyone is entering the data annotation space. Earlier this year, Meta partially acquired Scale AI in an attempt to internalize dataset production for Large Language Model training, widely interpreted as a move to control the Data Annotation Gold Rush. Google soon followed by reducing its reliance on Scale as a vendor due to concerns about net neutrality and data provenance. Amazon’s MTurk is currently undergoing a consolidation into a new service called “SageMaker Ground Truth,” a gig platform focused specifically on human-in-the-loop and automation datasets. Just last week, xAI laid off 500 data annotation generalists from Grok training, with a preference for specialized AI tutor teams. Most recently, Uber entered the field with “Scaled Solutions,” a new division devoted to annotation after a pilot rollout in India in 2024 that is now slotted to roll out in the US.

Each of these developments reflects a broader pattern shift: the “gigification” of human expertise. In today’s AI economy, annotation is treated as endlessly scalable yet ultimately disposable. By framing it as piecework for drivers or delivery workers to pick up between shifts, companies trivialize a task that demands precision, cultural context, and sustained judgment. In chasing efficiency, the industry risks undermining the very human intelligence its models depend on.

The problem with framing annotation as gig work is that the task itself is rarely simple. Annotation is cognitively demanding, culturally specific, and dependent on rigorous quality-control pipelines, processes that take even the most mature firms years to refine. As annotation expands into domains such as computer vision, robotics, and 3D/4D modeling, the expertise required to perform this work increases dramatically. Training models to understand movement, depth, and spatial relationships is far more complex than labeling a static image. In many cases, annotators hold advanced degrees, possess specialized software skills, and can command rates exceeding $60 per hour (in some cases much more).

Take three-dimensional annotation, for example. Unlike its 2D counterpart, 3D annotation captures depth, distance, and volume, key inputs for vehicles and robotics systems such as warehouse automation like Symbiotic’s Warehouse Robot, Tesla’s Robotaxi, or May Mobility autonomous shuttles for last-mile transit issues. Integrating 4D perception, which tracks object movement over time, adds yet another level of complexity. These models also rely on multimodal data streams — weather conditions, map APIs, sensor inputs — each requiring precise synchronization and interpretation during the fine-tuning process.

Annotation for these systems takes place in specialized tools like CVAT, Label Studio, and Dataloop, solely focused on 3D and 4D annotation. Developing and maintaining such platforms requires significant engineering effort and capital investment. Despite projected market growth, estimated to reach $6.2 billion by 2030 with a CAGR of 22.2%, annotation tooling remains rudimentary, especially for higher-dimensional data.

The bottleneck lies in quality assurance. Sustaining continuous data pipelines demands robust validation, feedback, and reputation systems. Some companies have begun experimenting with blockchain-based reputation tracking to secure reliability and provenance across distributed workforces. Yet even advanced tools struggle with frame-loading performance issues, limited automation, and inconsistent user experience.

If we look at the trajectory of the field, low-complexity tasks such as image tagging, text ranking, and simple bounding boxes are quickly becoming relics of the early era of AI. The next frontier of model-fine-tuning depends not on scale alone, but cultivating skilled, fairly compensated human contributors who understand what models cannot yet see.

The last AI winter came from overestimating algorithms, the next will come from underestimating people. The race to scale data labeling risks forgetting that human comprehension is what makes intelligence possible. The difference between another AI spring and a new AI winter will hinge on whether we treat annotators as gig workers or as partners in intelligence.

The views expressed herein are my own and do not necessarily reflect those of my employer.

Fine Lines - News You May Have Missed this Week

Keep Reading

No posts found