The Small-Town Youth Labeling Big AI Models
Author | Sleepy.md
In Datong, Shanxi, a city that was once supported by coal and has now shaken off the coal dust, a sharp pickaxe has replaced the coal mines, heading towards another invisible mine.
Inside the office building of Jinmao International Center in Pingcheng District, there are no longer mine shafts or coal trucks. Instead, there are thousands of closely arranged computer workstations. Shanghai Runxun Cloud Sonic Valley Big Data Smart Service Center occupies several floors, with thousands of young employees wearing headphones, staring at screens, clicking, dragging, and selecting.
According to official data, as of November 2025, Datong City has put into operation 745,000 servers, introduced 69 callout data labeling enterprises, driven more than 30,000 people to employment, with an output value of 750 million yuan. In this digital mine, 94% of the practitioners are locals.
It's not just Datong. In the first batch of data labeling bases identified by the National Bureau of Statistics, counties in the western region such as Yonghe County in Shanxi,Bijie in Guizhou, and Mengzi in Yunnan are listed. In the data labeling base in Yonghe County, 80% of the employees are women. Most of them are rural stay-at-home moms or rural youth who cannot find suitable jobs.
A hundred years ago, Manchester's textile factories in the UK were crowded with landless farmers. Today, in the computer screens of these remote county towns, young people who cannot find a place in the real economy are sitting in front of them.
They are engaged in a futuristic yet extremely primitive piece-rate work, producing the necessary data feed for the AI giants in Beijing, Shenzhen, and Silicon Valley.
No one sees any problem with this.
A New Assembly Line on the Loess Plateau
The essence of data labeling is to teach machines about the world.
Autonomous driving needs to recognize traffic lights and pedestrians, and large models need to distinguish between cats and dogs. Machines themselves have no common sense and must have a human draw a box on the image to tell them "this is a pedestrian" before they can learn to recognize it after digesting millions of images.
This job does not require a high education level, only patience, and a finger that can click incessantly.
During the golden age of 2017, a simple 2D box could cost more than one cent, and some companies even offered a high price of half a yuan. A fast-clicking labeler could earn five to six hundred yuan by working ten hours a day. In the county town, this is definitely considered a high-paying and decent job.
But as large models evolved, the harsh reality of this pipeline began to emerge.
By 2023, the unit price of simple image annotation had been driven down to 3 to 4 cents, a drop of over 90%. Even for more challenging 3D point cloud images, where the points are so dense that the edges require significant zooming to be discerned, annotators must meticulously draw a three-dimensional box in space that encompasses length, width, height, and orientation angle to seamlessly wrap around a vehicle or pedestrian. However, the price for such a complex 3D box is only 5 cents.

The direct consequence of this price plunge is a dramatic increase in labor intensity. In order to hold onto a monthly salary of two to three thousand dollars, annotators must continuously and tirelessly improve their speed.
This is by no means an easy white-collar job. In many annotation centers, the management is so strict that it's suffocating; employees are not allowed to answer phone calls during work, and mobile phones must be locked in storage compartments. The system meticulously records every employee's mouse movements and idle time, and if there is a break of more than three minutes, a backend warning will strike like a whip.
Even more frustrating is the tolerance rate. The industry's passing grade is usually above 95%, with some companies even requiring 98%-99%. This means that if you draw 100 boxes and make 2 mistakes, the entire image will be sent back for rework.
Dynamic images consist of frames, with vehicles changing lanes being obscured, forcing annotators to use their imagination to identify each one; in 3D point cloud images, any object with more than 10 points must be boxed. In a complex parking spot project, if the lines are too long or something is missed, quality inspection will always find faults. It's common for an image to be reworked four or five times. In the end, after spending an hour's work, you only earn a few cents.
An annotator in Hunan province posted her settlement statement on social media, showing that after a day's work, she drew over 700 boxes at a rate of 4 cents each, earning a total of 30.2 yuan.
This is an extremely fragmented scene.
On one side are the shiny tech giants at conferences discussing how AGI will liberate humanity; on the other side, in county towns on the Loess Plateau and in the mountains of the southwest, young people stare at screens for eight to ten hours a day, mechanically drawing boxes, thousands, tens of thousands, and even dreaming at night, their fingers tracing lane lines in the air.
Someone once said that the facade of artificial intelligence is a roaring luxury car, but when you open the door, you'll find a hundred people pedaling bicycles inside, gritting their teeth and pedaling hard.
No one thinks there's anything wrong with this.
The Piecework Craftsman Teaching Machines "How to Love"
After breaking through the bottleneck of image recognition, large models have undergone a deeper evolution, needing to learn to think, converse, and even show "empathy" like humans.
This has given rise to the most critical and expensive part of large-scale model training — RLHF (Human Feedback-based Reinforcement Learning).
In simple terms, it involves having real people score AI-generated responses, telling it which answers are better, more aligned with human values, and emotional preferences.
The reason ChatGPT looks "human-like" is because behind it, there are countless RLHF annotators teaching it.
On crowdsourcing platforms, such annotation tasks are often clearly priced: a unit cost of 3 to 7 RMB. Annotators need to provide extremely subjective emotional scores to AI responses to assess whether the response is "warm," "empathetic," or "considerate of the user's emotions."
Someone earning a mere couple thousand RMB per month, struggling in the mud of reality, barely able to attend to their own emotions, is now required in the system to act as AI's emotional mentor and arbiter of values.

They need to forcibly break down warmness, empathy, and other highly complex, subtle human emotions into cold scores ranging from 1 to 5. If their scores do not align with the system's predefined correct answers, their accuracy will be deemed insufficient, leading to deductions from their meager piecework wages.
This is a cognitive drain. Human emotions, morals, and compassion, so intricate and nuanced, are being forcibly squeezed into the algorithm's funnel. In the ice-cold realm of quantification and standardization scales, they are drained of their last bit of warmth. While you marvel at the cyber behemoth on the screen having learned to write poetry, compose music, show care, and even donned a skin of melancholy sensitivity; off-screen, that group of once lively humans has, through daily mechanical judgments, regressed into emotionless scoring machines.
This is the most secretive side of the entire industry chain, never appearing in any funding news or tech whitepapers.
No one thinks there's anything wrong with this.
985 Master's Degree Holder vs. Small-town Youth
Low-level assembly line work is being crushed by AI's treads, causing this cybernetic conveyor belt to spread upwards, beginning to engulf higher-order brain labor.
The appetite of large models has changed. No longer satisfied with chewing on basic common sense, they now require devouring human expertise and advanced logic.
On various major job recruitment platforms, a new type of part-time job has begun to appear frequently, such as "Large Model Logical Reasoning Annotation" and "AI Humanities Trainer." This part-time job has an extremely high threshold, often requiring a "master's degree or above from Project 985/Project 211 universities" and involving professional fields such as law, medicine, philosophy, and literature.

Many graduate students from prestigious universities are attracted to and joining the outsourcing groups of these tech giants. However, they quickly realize that this is not some easy mental exercise but rather a form of mental torture.
Before formally taking on tasks, they must read through dozens of pages of scoring dimensions and evaluation criteria documents, and undergo two to three rounds of trial annotations. Upon meeting the standards, during formal annotation, if their accuracy falls below the average level, they will lose their qualification and be kicked out of the group chat.
Most suffocating of all is that these standards are not fixed at all. Faced with similar questions and answers, scoring them with the same thinking process may yield completely opposite results. It's like working on a never-ending exam paper with no standard answer. Accuracy cannot be improved through self-effort or study; one can only spin in place endlessly, depleting both mental and physical energy.
This is the new form of exploitation in the era of large models—class folding.
Knowledge, once seen as a golden ladder to break barriers and climb upwards, has now become a more complex digital fodder offered to algorithms for chewing. In the face of the absolute power of algorithms and systems, the master's students from elite universities in their ivory towers and the young people from small towns on the Loess Plateau have embarked on the most bizarre convergence path.
Together, they plummet into this bottomless cyber-mining pit, stripped of their halos, erasing differences, all turned into cheap gears on the conveyor belt that can be replaced at any time.
It's the same overseas. In 2024, Apple directly laid off a 121-member AI voice annotation team in San Diego. These employees were responsible for improving Siri's multilingual processing capabilities. They once thought they stood at the core business edge of a tech giant, only to instantly plunge into the abyss of unemployment.
In the eyes of tech giants, whether it's a middle-aged lady running a grocery store in a small county or a logic trainer with a prestigious education, fundamentally, they are all "consumables" that can be replaced at any time.
No one thinks there is anything wrong with this.
A Trillion-Dollar Tower of Babel, Built with a Few Cents of Exploitation
According to data released by the China Information and Communications Research Institute, the Chinese data annotation market reached a scale of 6.08 billion yuan in 2023 and is expected to reach 20-30 billion yuan by 2025. It is predicted that by 2030, the global data annotation and service market sales will skyrocket to 117.1 billion yuan.
Behind these numbers are tech giants such as OpenAI, Microsoft, and ByteDance, with valuations reaching the trillions of dollars.
However, this sky-high wealth has not flowed to those who truly "feed" AI.
In China's data labeling industry, a typical inverted pyramid outsourcing structure is evident. At the top are the tech giants tightly holding the core algorithms; the second level consists of large data service providers; the third level comprises data labeling centers and small to medium-sized outsourcing companies scattered across the country; only at the bottom do we find the piece-rate earning foot soldiers - the labeling workers.
Each outsourcing layer takes a hefty cut. When the big factories offer a unit price of 0.5 RMB, after layers of exploitation, what ends up in the hands of a labeling worker in a county town may be less than 0.05 RMB.
In his book "Techno-Feudalism," former Greek Finance Minister Yanis Varoufakis put forth a penetrating viewpoint: today's tech giants are no longer capitalists in the traditional sense but "Cloudalists."
They do not own factories and machinery but algorithms, platforms, and computing power, the digital territories of the cyber era. In this new feudal system, users are not consumers but digital serfs. Every like, comment, and browse on social media is free labor supplying data to the Cloudalists.
Meanwhile, the data labeling workers in emerging markets are the lowest-tier digital serfs in this system. They not only have to produce data but also clean, categorize, and rate massive raw data, transforming it into high-quality feed that large models can digest.
This is a secretive cognitive enclosure movement. Similar to how the Enclosure Acts of 19th century England drove farmers into textile factories, today's AI wave is pushing young people who cannot find a place in the physical economy in front of screens.
AI has not flattened the class divide; instead, it has established a "Data and Blood-Sweat Conveyor Belt" from small counties in central and western China directly to the headquarters of tech giants in Beijing, Shanghai, Guangzhou, and Shenzhen. The narrative of technological revolution is always grand and magnificent, but its foundation is forever the scaled consumption of cheap labor.
No one seems to think there's anything wrong with this.
A Tomorrow Without the Need for Humans
The most brutal conclusion is fast approaching, faster and faster.
With the rise of large-scale model capabilities, tasks that once required human labor day and night to complete are being taken over by AI itself.
In April 2023, Li Xiang, the founder of Ideal Auto, revealed at a forum that in the past, Ideal used to manually label approximately 10 million frames of autonomous driving images in a year, with outsourcing costs close to one billion. However, after they employed large models for automated labeling, what used to take a year to accomplish can now be done in about 3 hours.
Efficiency is 1000 times that of humans, and it was achieved as early as 2023. In the last March alone, Ideals released the next-generation MindVLA-o1 automatic annotation engine.
A grimly true self-deprecating saying circulates in the industry: "The more intelligence, the more artificial." But now, there has been a cliff-like 40%-50% drop in outsourcing for data annotation by tech giants.
Those young people from small towns who have sat in front of computers for countless days and nights, their eyes bloodshot from the strain, have personally raised a behemoth. And now, this behemoth is turning around, shattering their rice bowls.
As night falls, the office buildings in Datong's Pingcheng District remain as bright as day. The young people on shift silently exchange their weary shells in the elevator lobby. In this folded space imprisoned by innumerable polygons, no one cares about the epic leap of the Transformer architecture on the other side of the ocean, nor does anyone understand the roar of computing power behind the hundred billion parameters.
Their gaze is welded to the backstage's red/green progress bar representing the "passing line," calculating whether the meager piecework numbers can patch together a decent life by the end of the month.
On one side, the closing bell of the Nasdaq and the continuous coverage by tech media have the giants raising their glasses in celebration of AGI's advent; on the other side, these digital serfs who have fed AI with their flesh and blood can only, in the midst of aching sleep, nervously wait for the behemoth they have raised with their own hands to nonchalantly kick away their rice bowls on an ordinary morning.
No one thinks there's anything wrong with this.
You may also like

Japan’s Three Megabanks Plan Joint Stablecoin Issuance in Fiscal 2026
MUFG, SMBC, and Mizuho reportedly plan to jointly issue fiat-pegged stablecoins in fiscal 2026, signaling Japan’s growing push into bank-led digital payment infrastructure.

Humanity Discloses H Token Dual-Chain Attack Details, With Losses on Ethereum and BSC Exceeding $36 Million
Humanity said the H token attack across Ethereum and BSC caused more than $36 million in losses after leaked ProxyAdmin keys enabled malicious contract upgrades and token minting.

White House Discusses CLARITY Act With Law Enforcement Ahead of Senate Vote
The White House discussed the CLARITY Act with law enforcement ahead of a Senate vote, focusing on illicit finance risks and developer protections.

$75 billion in foreign capital has fled, and South Korean retail investors have absorbed it all using leverage

Bitcoin Trading Guide 2026: Strategies for Experienced Traders

What Is XAUT and PAXG? Why Tokenized Gold Is Booming in 2026

Cryptocurrency CEXs are flocking to sell US stocks, and traditional brokerages are facing an "uninvited guest."

Will the SpaceX IPO Hurt Bitcoin? Here's What Traders Are Watching

Foreign selling in the South Korean stock market accelerates, with cumulative net sales reportedly reaching $75 billion this year
On June 9, The Kobeissi Letter, citing Goldman Sachs data, reported that global investors are selling South Korean stocks at an unusually rapid pace. In the latest trading session, foreign investors sold about $801 million worth of Kospi constituent stocks again; total foreign outflows last week reached about $10 billion, and the market has been in net foreign selling on nearly every trading day over the past month. According to the data cited in the report, foreign investors have sold about $75 billion worth of South Korean stocks so far this year. Meanwhile, South Korean retail and institutional investors together recorded roughly $69 billion in net buying over the same period, suggesting that the market’s main buying support has come from domestic capital rather than returning overseas funds. The information currently disclosed still mainly comes from The Kobeissi Letter’s retelling and Goldman Sachs data summaries, while public details on the statistical period and the specific definition of “selling” remain relatively limited.

Fortune Warns of Strategy’s Financing Structure Risks as Bitcoin Premium Narrows
Fortune warned that Strategy’s Bitcoin treasury model faces growing financing risks as MSTR’s net asset premium narrows and preferred stock dividend pressure increases.

Ferrari Challenge Le Mans: Carl Moon to Dominate in WEEX Livery

Sahara AI Responds to SAHARA’s Sharp Drop: No Contract or Product Security Issues Found, Internal Investigation Underway
Sahara AI responded to SAHARA’s 60% price drop, saying no token contract or product security issues have been found and an internal investigation is underway.

WEEX Deposit/Withdrawal Dynamic Island: Your Asset Status, Always in Sight

Scaling Crypto Derivatives: The Digital Asset Infrastructure Behind High-Volume Trading
In the fast-moving digital asset ecosystem, derivatives platforms face an extreme architectural test. High-leverage futures markets demand more than just standard security—they require absolute operational precision, zero-latency matching engines, and ironclad structural scalability, all while navigating intense market volatility.
As global platforms scale to meet these demands, the industry is shifting away from rigid, monolithic setups toward a more agile, "decoupled" infrastructure philosophy.
The Blueprint for High-Volume Copy TradingFor elite global exchanges like WEEX (founded in 2018), this architectural choice becomes critical when scaling high-volume retail features like social copy trading. When thousands of users automatically mirror the real-time strategies of elite traders simultaneously, it triggers sudden, monumental spikes in concurrent transactional volume.
To prevent execution latency or settlement bottlenecks during these peak volatility events, a platform's primary engine must remain entirely dedicated to risk management, copy-trade synchronization, and order matching.
The Architectural Rule: New-generation platforms must separate front-end user execution engines from heavy backend infrastructural overhead to eliminate operational friction.
By separating these layers, platforms can maintain complete sovereignty over their trading environments and user experiences while strategically aligning with institutional-grade infrastructure ecosystems. This strategic framework allows modern exchanges to leverage advanced Digital Asset Custody infrastructure such as Cobo’s behind the scenes, ensuring that backend wallet management scales elastically alongside trading spikes.
Capitalizing on Market Momentum and 400× LeverageIn a derivatives arena where platforms offer up to 400× leverage on perpetual contracts, capital efficiency and market agility are core business metrics. To capture market momentum, an exchange needs the ability to rapidly expand its asset offerings, supporting everything from legacy crypto assets to sudden, trending altcoins across a massive library of trading pairs.
Adopting a flexible, scalable Wallet-as-a-Service (WaaS) solution such as Cobo’s could completely rewrite the development timeline for high-growth exchanges. Instead of spending months of engineering capital building out custom backend wallet architectures for every new blockchain network, platforms can deploy localized infrastructure in days.
This agility allows platforms to instantly scale their listings to over a thousand trading pairs without compromising security or delaying time-to-market. It mirrors the exact operational advantages seen during high-velocity market events, similar to how advanced wallet infrastructure empowers platforms during sudden asset surges; allowing exchanges to pass that speed and liquidity directly to their global user base.
A Mature Foundation for GrowthThe synergy between trusted infrastructure ecosystems and global trading platforms represents the natural evolution of a maturing crypto market. As WEEX continues to scale its global spot and derivatives offerings for over 6 million users, adopting robust backend paradigms proves that platforms no longer have to compromise between cutting-edge trading velocity and uncompromised structural security.

Morning Report | BitMine increased its holdings by 126,971 ETH last week; trader Eugene announced his exit from the crypto market

Wang Chuan: How can one not feel anxious after the neighbor Old Wang made thirty times profit by investing in storage stocks? (Seven) - A quarter-century cycle

Get Paid to Onboard? Try WEEX’s New Homepage with Rewards for Registration, Deposit & Trade

WEEX Custom Layout: Build Your Perfect Trading Workspace in Seconds
Japan’s Three Megabanks Plan Joint Stablecoin Issuance in Fiscal 2026
MUFG, SMBC, and Mizuho reportedly plan to jointly issue fiat-pegged stablecoins in fiscal 2026, signaling Japan’s growing push into bank-led digital payment infrastructure.
Humanity Discloses H Token Dual-Chain Attack Details, With Losses on Ethereum and BSC Exceeding $36 Million
Humanity said the H token attack across Ethereum and BSC caused more than $36 million in losses after leaked ProxyAdmin keys enabled malicious contract upgrades and token minting.
White House Discusses CLARITY Act With Law Enforcement Ahead of Senate Vote
The White House discussed the CLARITY Act with law enforcement ahead of a Senate vote, focusing on illicit finance risks and developer protections.
