A huge store data breach containing information on approximately one billion Chinese residents could be one of the biggest personal information breaches in history.
Portions of the leaked data surfaced last week on a cybercrime forum known to someone selling the cache for 10 bitcoins, or about $200,000, and were allegedly hijacked from a Shanghai police database stored in Alibaba’s cloud.
Although details of the breach remain scarce, parts of the data have been verified as authentic, suggesting that at least some of the data is real. The origin of the data and how it ended up in the hands of an underground seller, whose motives are not known, are still unclear.
News of the alleged violation went largely unreported in mainland China, where restrictions on speech and expression are tightly controlled, and internet access is censored and severely restricted.
The breach, if genuine, raises questions about the vast scale of China’s surveillance state, the largest and most expansive in the world, and Beijing’s ability to keep such data secure.
Here’s what we’ve learned so far.
How did the data leak?
In a since-deleted post on the cybercrime forum, the seller claimed to have downloaded the data from a cloud storage server hosted by Alibaba, the cloud computing arm of the Chinese e-commerce giant. When contacted by TechCrunch on Monday, Alibaba said it was looking into the allegations.
Exactly how the data was leaked is unclear, but experts say the database may have been misconfigured and exposed through human error since April 2021 before it was discovered. This would seem to rule out a claim that the database credentials were inadvertently published as part of a tech blog post on a Chinese developer site in 2020 and later used to siphon off the billion dollars. police database records, as no password was required to access them. .
Bob Diachenko, a Ukrainian security researcher, told TechCrunch his own surveillance records show the database was also exposed through a Kibana dashboard, a web-based software used to view and search huge databases. Elasticsearch, end of April. If the database did not require a password as believed, anyone could have accessed the data if they knew its web address.
Security researchers frequently scan the Internet for inadvertently exposed databases or other sensitive data, often to collect bounties offered by the companies they help secure. But threat actors also run the same scans, often with the goal of copying data from an exposed database, deleting it, and offering the data back for a ransom payment – a coercion tactic. increasingly common used by criminal divers in recent years. Diachenko said that is what happened on this occasion; a malicious actor found, looted, and deleted the exposed database, and left a ransom note demanding 10 bitcoins for its return.
“My guess here is that the ransom note didn’t work and the threat actor decided to get money elsewhere. Or, another malicious actor stumbled across the data and decided to put it in sale,” Diachenko said.
Little is known about the vendor or why the data was uploaded. It is not uncommon to see large amounts of personal data for sale on cybercrime forums and on the dark web, but rarely for such sensitive data or in such quantity.
What does the data look like?
TechCrunch looked at a larger sample of data uploaded by the vendor containing three files, approximately 500 megabytes in total, each containing 250,000 individual records.
The data itself is formatted in JSON, a standard file format for Elasticsearch databases, making it easy to read and analyze. The format of the database suggests that it was meticulously maintained and uploaded, rather than created by simply aggregating information from multiple data sources, a common technique used by information sellers and data brokers. However, some data may come from external sources, such as food delivery orders.
What also makes the data likely to be genuine is the sheer size of the data and the fact that the level of detail would be difficult – but not impossible – to fake.
TechCrunch translated the police records, which were written in Chinese, and redacted personally identifiable information.
The files appear to contain detailed police reports dating from 1995 to 2019, including names, addresses, phone numbers, ID numbers, gender, as well as the reason the police were called. Records viewed by TechCrunch include granular contact details where incidents occurred or police reports were made – and the names of informants who made the reports – that match the precise addresses also listed in each record, as well as race. and the ethnic origin of individuals. (The Chinese government has incarcerated more than a million of its own citizens, mostly from minority Muslim ethnic groups, including Uyghurs and Kazakhs, in what the Biden administration has declared a “genocide.”)
The files contain criminal complaints and allegations, ranging from serious crimes involving violence to relatively mundane crimes, such as detailed reports of credit card fraud, internet scams and gambling, which are illegal in China. Several records seen by TechCrunch show police reports cracking down on the use of VPNs, or virtual private networks, used to access sites blocked by China’s censorship system and, as such, banned in China. A recording showed a Shanghai resident being accused of using a VPN to post critical remarks about the government on Twitter, which is banned in China. It is not known what happened to the individual next.
The data also contained full web addresses to photos stored on the same server, none of which were accessible at the time of writing, but associated data often indicates what was uploaded, such as a person’s residency documents or her passport when she leaves the country. These web addresses are formatted in a manner consistent with how Alibaba’s cloud service stores files.
Many of the records we reviewed appeared to contain information about the children, based on their birthdates and ages listed in the data.
Without (unlikely) confirmation from the Chinese government, it is unclear whether the seller’s claims are genuine and whether the data was obtained from the Shanghai Police Department, as claimed. The Wall Street Journal, New York Times and CNN verified parts of the data by calling people whose information was found in the database, lending weight to its authenticity.
What is the impact ?
This alleged breach, if found to be legitimate, could be very damaging to Beijing and raises questions about the government’s cybersecurity measures and the impact the breach will have on individuals.
It comes at a time when China is strengthening the protection of personal data. Last September, China passed the Personal Information Protection Law, its first comprehensive privacy and data protection legislation, widely seen as China’s equivalent to Europe’s GDPR privacy rules. The law restricts how companies can collect personal data and is expected to have a dramatic effect on the advertising activities of the nation’s biggest tech giants, but allows broad exceptions for government agencies and departments that make up the vast capabilities surveillance of China.
Beijing has reportedly already censored information about the alleged breach, and Chinese messaging apps WeChat and Weibo are blocking messages and mentions such as “data leak” and “database breach”. The Chinese government has yet to comment on the breach.
This isn’t the first security breach involving a massive dataset of Chinese residents that has been left exposed to the internet without a password. In 2019, TechCrunch reported that a smart city installation in China was dumping content from a facial recognition database of nearby residents.
You can reach this reporter on Signal and WhatsApp at +1 646-755-8849 or email email@example.com.