Too Late, The Data Has Been Compromised
The New Data Overlords I
This could have been another Journal of Data & Society, algorithm, and data bashing article, but I did not dare to construct anything too coherent or too long, so it reads more like a package of thoughts. Maybe if I emphasised structure, or maybe if I leaned more on Zotero and Grammarly, I would have sent it off to viXra, where the crank that always nods, but in its current state, this piece should rather be well ‘hidden’ on email servers.
Data is the new fuel for growth in multiple industries, from manufacturing to retail to financial services. But unlike other assets, it doesn’t necessarily fuel job growth, but rather, profit growth. What should be clear is that tech platform companies are not the only ones in the digital surveillance business. Data brokers such as credit bureaus, healthcare firms, and credit card companies collect troves of data from customers, and often without their knowledge.
Of course, it is no secret that the internet has led to the proliferation of data. The Web (Internet) in fact started in the “data acquisition and control” group at the European Organization for Nuclear Research (CERN), in Geneva, Switzerland. It began with a computer programmer who had a brilliant idea for a new software project. In December of 1990, to facilitate the sharing of knowledge, Tim Berners-Lee started a non-profit software project that he called the “WorldWideWeb.”
Data is now and has always been uniquely suited to “grey-market activities because they need not carry any trace of where they have come from or where they have been along the way”. Some refer to this as data laundering, I like data trafficking, and it’s all been well underway for some time now, in fact since the early start of the internet, spyware and spam companies would trade or “sell questionably derived data to middlemen, who then add it to the databases powering the marketing campaigns of major corporations”.
Privacy is not always up to the user to defend. It needs regulation. The responsibility can’t, in any sense, be offloaded to the user. If you are one of those purest that value your privacy and never uploaded your contacts to sites for privacy reasons, you will soon come to see that that doesn’t matter, simply one of your hundreds of contacts who had your phone number could upload it to Facebook without your consent, nothing you do really matters, your profile is waiting for you to appear and register. Your personal information has been backfilled through your friends with your associated email and phone number.
After reading about the subject for some years, I have become slightly conspiratorial. Every so often, it feels like these ‘hacks’ we experience are window-dressing for large scale data laundering where these large companies can decry plausible deniability. Not long after the Equifax breach, there was a Capital One breach with 100 million accounts illegally accessed after some Amazon S3-bucket misconfiguration allowing the breacher to assume an IAM role.
Without being overly sardonic, there is some truth that after the Equifax and Facebook breach we have already lost all the most important data that citizens have intrusted in companies, and these companies’ share prices are still soaring, really disincentivising any investments in infosec. Hackers also stole personal information from 104,000 taxpayers, IRS says. These are just a few of the most apparent examples of the last few years. It leads us to question the incentive structure embedded in the acquisition, processing, and sharing of data.
If data is the new oil, these ‘hacks’ are the oil spills that lead to the EPA. We are still waiting on our EPA, and I think like the EPA it will be a little too late. Most people in security research readily accepts that everything has either been hacked or will be hacked and that there is not much that can be done to stop it, there is a famous quote from former FBI Director Robert Mueller or former Cisco CEO John Chambers “There are two types of companies: those that have been hacked, and those who don’t know they have been hacked.” My care is not so much in the hacking, which will not stop due to a lack of organised incentive structures. Instead, my problem is with the aggregators of data. Companies love the “We cannot find any evidence that the stolen information has been misused”. Sure wait 24 hours, I can assure you it is not stolen by an academic.
All (a lot) our data has been leaked, someone has to collect and merge them. Just last year, hundreds of millions of official documents were leaked from one insurance company in the Fortune 500. These things now almost never make the news, and rarely appears before and article on The Latest Banksy Masterpiece.
Similar to hedge-funds with convoluted models, Facebook’s corporate structure, contracts, and shareholder incentives have become a large behemoth that can’t be that easily tamed. Every week these founders and CEOs question whether they are doing the right thing, just to be validated by increases in stock price. It’s even more concerning when we analogise and realise that the sector that has experienced the most meteoric rise in stock price over the most prolonged period is Tobacco companies.
Again take note, you have no way to control your own information in the way that the system of incentives has been set up. Even something as private as DNA is no longer in your own control. When my siblings signed themselves up for 23andme, a large part of who I am appears online. My entire family tree can now be identified to the person. Let’s also say you’re a privacy purest and a caveat emptorer, you might recommend your friend to drop MasterCard that knowingly sells there data to hedge funds and the government, and instead opt for a card from Privacy.com that used pseudonymous billing, the problem is that there is a cost and you won’t for example be able to obtain rebates and compensation for faulty products. Moreover, if your details are compromised, good luck on getting your funds back.
It is ironic that Google and other Data Mining companies are getting the most pushback because they arguably have better security that any fortune 500 company. And the reason is that they are a data-first company, so we should expect more of these concerns to emerge as more and more companies take this data-first approach. These new data mining companies have immense power, even monopolistic in some domains. Their customers and users are always kept on the backfoot. Anecdotally, if you don’t pay for the Yelp premium service, your restaurant might magically obtain a lot of 1-star reviews which clears up as soon as you pay the extortion fee. Glassdoor is purported to play the same game.
An even crazier trend is afoot, companies have now started to get onto the security band-wagon, only to collect a further quantity of datapoints. It is known that 2-factor authorisation has been used to collect phone numbers for targeted advertising. And of course, the apology soon follows after these events, however, by that time it is too late, we don’t change our phone numbers that often, less than 99% of the population would read the article concerning the leak, although all there data has by that time been sucked up into a PyTorch model.
Tim Cook, as a way to distance himself from these (other) giants, have referred to their activities as the “data industrial complex”. This won’t stop until we take strong regulatory action. We don’t really put technical expertise first in politics, Japan cyber-security minister has for example, never even used a computer. The problem also extends further than just the internet. Satellite imagery with machine learning enhancements will see to better fidelity images piling in in the next decade. This ship departed with SpaceX getting FCC approval for 7,500 satellites back in 2018. The race for satellites will probably follow the rate of CCTV cameras, as long as they serve a purpose for someone somewhere. Now it is satellites, and tomorrow it is surveillance drones
Like hedge-funds with convoluted models, Facebook’s corporate structure, contracts, and shareholder incentives have become a large behemoth that can’t be that easily tamed. Every week these founders and CEOs question whether they are doing the right thing, just to be validated by increases in stock price. However, the sector that has experienced the most meteoric rise in stock price over the longest period is Tobacco companies.
A few years ago, I gave Facebook the benefit of the doubt about their care for users’ rights, ethics, morality and all that jazz. I put the bad press down to corporate guerrilla tactics, and the fact that the media hated what Facebook has done to their industry. From around 2016 onwards my opinion has completely shifted.
Facebook gets hacked more often than not, and at this point, it seems that they are losing their grip on the behemoth. In 2018 a few hundred thousand users had around 18 datapoints stolen from there profile, including their name, email, phone number, friend lists, groups you are a member of, education, website, religion, hometown, the ten most recent locations you were tagged, and the 15 most recent search bar queries. The data was reported to be stolen because Facebook didn’t receive any payment in kind (tongue in cheek).
These datasets quickly lose their value due to the velocity with which they can duplicate to reach the far corners of our interconnected earth. These hackers, probably some analytics or marketing company, participated in Facebook’s business model and didn’t end up paying for the user’s whose information they stole. If they were a little bit less cheap, we wouldn’t even know of the breach.
It usually is around this point where people say, yeah but why is it so bad for my data to be out there in the open? Well, let me tell you, the world is turning into a village of privilege with multiple doors guarded by digital gatekeepers. Your social media and a large list of other data is verifiably being used by credit rating agencies, banks, loan providers, marketers, and recruiters. When you get turned down for a house, a service, a car loan, a student loan, a job, or anything else of substance, you would often not even realise that it is happening and will just receive the “Dear Johnny.. we’re sorry” email. For example, Australia’s Department of Human Services established a “robodebt” system that was said to be extremely punitive towards those already in poverty. The officials were able to raise their hands and say, we are not the gatekeeper, it’s simply the way the robot has been programmed. However, this intangible program led to real suicides and other events.
As a quaint example reported by the New York Times, we have scores that determine how long we wait on a call to be put in touch with a company representative (based on income, spending, age etc.), and for example, we have scores to determine whether we can return an item to a store. A company called Sift tracks a whopping 16,000 factors using data from partnerships with AirBnB and OkCupid that allows it to decide whether or not you can be trusted.
“Sift does have a file on you, which it can produce upon request. I got mine, and I found it shocking: More than 400 pages long, it contained all the messages I’d ever sent to hosts on Airbnb; years of Yelp delivery orders; a log of every time I’d opened the Coinbase app on my iPhone. Many entries included detailed information about the device I used to do these things, including my IP address at the time. Sift has this data because the company has been hired by Airbnb, Yelp, and Coinbase to identify stolen credit cards and help spot identity thieves and abusive behavior.”
If you don’t have an AirBnB profile with mobile number or email attached you are most likely at a disadvantage, and could wait twice as long as someone with a profile with good reviews, and amiable text message exchanges with hosts. The corporate world is filled with “secret scores”, the Chinese social credit system is happening in the west but behind closed proprietory doors. We might not like the ‘excutions’ happening out in the public, but as long as we don’t see the panopticon, we are at ease.
When your credit score functions as a loyalty reward programme, there might be some issues. One of the many social credit scoring systems coming out of China is Ant Financial Sesame Credit score. It essentially functions as a loyalty reward program. Ant’s is a private scoring system, whereas local governments have their own versions that they are using. Participants with high scores earn privileges like renting a bike without leaving a deposit or deferring payment for medical expenses. The data is almost certainly being hoovered up by the government to be used for more official purposes. Among others, Sesame draws on datapoints like how much time do you spend gaming (bad) if you are a parent (good), and they also linked up with partners such as data sites as far back as 2015, to allow users not just to assess each other’s superficial appeal, but also their social credit score.
Data has also become a gatekeeper in deciding who will or at least could win the elections as recent evidence has shown. It has also turned into a tool to actually win the election. The infamous Cambridge Analytica scandal has led to many more wealthy individuals who want to play noopolitics. As Brad Parscale the Trump digital campaign manager showed social media marketing does not just work to for selling products and fashion but also to sell moods and ideologies. All it takes is simple iterative A/B tests; we are at the end creatures in Skinner’s box. If data is the new oil, customers are the dead trees; platforms are the oil rigs, alternative data companies the wells, and corporate America the faithful combustor.
Every large US carrier sells customer’s location data without consent. It’s known to be used by law enforcement, and the price has significantly dropped, you can now do it for $300, and all you need is the phone number. You have the right to request the data that these companies have on you, for instructions you can go to the privacy policies and Ctrl-F for “request”. These companies have started honouring these requests due to the 2020 California Consumer Privacy Act.
You should also consider checking into your LexisNexis file. It’s astounding the information they collect about you, from medical to insurance claims to driving record, credit files, criminal records, known associates and names of people who have used your social security number. They mostly or maybe exclusively collect this data from public files. An anecdote: “I had rented a house with another person who moved out after three months, and he was listed as a “known associate”. I didn’t know him. The rub is when the information they collect is wrong. I was in a situation where I was repeatedly turned down for jobs; I was perfectly qualified for. If I hadn’t interviewed with a friend who asked me about a criminal charge that I had no knowledge of and had never committed I would have never known about it and never been able to get it removed. And that removal process would be the subject of another story. I’m currently having the same issue in a beef with my cable company and having inaccurate information removed from a credit file is a Herculean task. Most people just give up.”
Many say the lack of concern is due to there being no strong federal laws in the US that concern itself with user privacy. Hospitals can share patient data as long as they follow federal privacy laws, which contain limited consumer protections, it seems that “[t]he data belongs to whoever has it.” Independently Microsoft, IBM, and Amazon have access to millions of health records, as part of pilots and programmes to prove the power of their algorithms – really just expensive ads to lure them to their cloud systems. The fact is that the data is out there, and it took something as simple as testing your data on our algorithm to wiggle it loose. And look, these hospitals are in dire straits, they clamour for these lucrative opportunities.
Amazon, Google, IBM and Microsoft are vying for hospitals’ business in the cloud storage market in part by offering algorithms and technology features. To create and launch algorithms, tech companies are striking separate deals for access to medical-record data for research, development, and product pilots. Of course, that’s what the marketing teams say, but they could be up to something more nefarious. In late 2019 a whistleblower working on project Nightingale said more than 50 million Americans’ personal records have been transferred from Ascension to Google. This data includes full personal details including name and medical history. This was two years after the UK’s watchdog for data barked at DeepMinds transfer of 1.6mn records from the Royal Free Hospital in London. In Google’s case, the data will probably be used 4-5 years in the future at which point the connection between what the trafficked data and the new product won’t be easily made. They do want to buy FitBit so we might see something interesting on the horizon. One data broker, IQVIA, for example, says it has more than 800mn patient records.
“Healthcare providers can legally sell their data to a now-dizzyingly vast spread of companies, who can use it to make decisions, from designing new drugs to pricing your insurance rates to developing highly targeted advertising. Chances are, at least one of you is being monitored by a third party like data analytics giant Optum, which is owned by UnitedHealth Group, Inc. Since 1993, it’s captured medical data—lab results, diagnoses, prescriptions, and more—from 150 million Americans. That’s almost half of the U.S. population. It’s written in the fine print: You don’t own your medical records. Well, except if you live in New Hampshire. It’s the only state that mandates its residents own their medical data.”
Many of these companies are now moving into offering left-field services to grow their moats. Facebook has suggested that it’s interested in providing transaction alert tools and other services via Messenger. Some apps like Mint and Quicken already does this. However, these are financial service companies, so one might expect it but Facebook? Once you have reached the market saturation of Facebook, all you can do is find new ways to collect more data. We already know that FB purchases “offline” third-party data from data brokers. We have to imagine what the implication would be for banks that also run as insurance companies (or partner with some) that shares transaction information with insurance guys to help them identify whether a person participates in some sinfull behaviour like drinking, that might have some health consequences. What is the concern for this type of digital conglomerate?
At the end of the data, banks are nothing more than some product ledgers with attached identities, and banks realise that they are becoming artifacts and are hence forced to innovate. Nothing but anti-trust regulation stops Facebook from becoming a bank. In the meantime, banks are trying to become Facebook. All banks sell your data, maybe credit-unions don’t, I am not sure, I don’t have the inside scoop. But a quick check will show that many bank websites have around 30 trackers.
Facebook committed to ending its third-party merchant data integration with advertisers. Now they are directly partnering with the banks to gather this data and more, with the stated purpose of offering products based on this data. What is the repercussions of a leak, could it, for example, trigger a “data run”, where customers remove their money & data en-masse? Facebook said it wouldn’t use the bank data for ad-targeting purposes or share it with third parties. “We don’t use purchase data from banks or credit card companies for ads,” said spokeswoman Elisabeth Diana. Companies like Facebook are worried about competing with services like Venmo who might tomorrow hire a developer to include a Messenger-like chat functionality which could curtail Facebook’s main revenue streams.
Google recently made an app for creditors to lock you out of your financed phone if you don’t make payments. They have also developed the ability to add your bank account to Google Pay which would allow them to track your spending habits, as nothing stated in the terms of service precludes them from doing that. The problem is that these are the same techno-utopians that want you to have your drivers license on your phone, and now they also work on technology that would allow you to get locked out of your bank account, and drivers’ license.
We also live in a society that is increasingly forcing users to adopt digital means of payments, and even then, to switch from using cards to mobile phones that could in real-time add more metadata to a transaction. In the future, we might circle back to cash realising that the initial state was the best state, of course, there is movement in the crypto space, but it always seems to teether on being too unsecure versus too decentralised. We might find ourselves on a Futurama episode where Bender discovers a phone booth realises that he doesn’t have to carry his phone with him. As a user on HN puts it, “A society without cash is a society in which every person has no choice but to get the permission of someone they don’t know and will never meet each and every time they seek to obtain food, water, shelter, or transportation, and that permission can be revoked instantly, silently, and invisibly at any time.”
“Our own information — from the everyday to the deeply personal — is being weaponised against us with military efficiency,” warned Cook. “These scraps of data, each one harmless enough on its own, are carefully assembled, synthesised, traded and sold. “Taken to the extreme this process creates an enduring digital profile and lets companies know you better than you may know yourself. Your profile is a bunch of algorithms that serve up increasingly extreme content, pounding our harmless preferences into harm.” “We shouldn’t sugar-coat the consequences. This is surveillance,” he added. I share the sentiment, but also revel in the hypocrisy of Apple paying billions to make Google probably the most data-hungry company to be the default search engine or when they expand into China. What he really means to say is that Privacy is their comparative advantage. We have a bleak future ahead as the war for data has turned very hot.
REST API Design Rulebook by Mark Masse