The algorithm used in the Facebook data breach trawled though personal data for information on sexual orientation, race, gender and even intelligence and childhood trauma
The algorithm at the heart of the Facebook data breach sounds almost too dystopian to be real. It trawls through the most apparently trivial, throwaway postings the likes users dole out as they browse the site to gather sensitive personal information about sexual orientation, race, gender, even intelligence and childhood trauma.
A few dozen likes can give a strong prediction of which party a user will vote for, reveal their gender and whether their partner is likely to be a man or woman, provide powerful clues about whether their parents stayed together throughout their childhood and predict their vulnerability to substance abuse. And it can do all this without delving into personal messages, posts, status updates, photos or all the other information Facebook holds.
Some results may sound more like the result of updated online sleuthing than sophisticated data analysis; liking a political campaign page is little different from pinning a poster in a window.
But five years ago psychology researchers showed that far more complex traits could be deduced from patterns invisible to a human observer scanning through profiles. Just a few apparently random likes could form the basis for disturbingly complex character assessments.
When users liked curly fries and Sephora cosmetics, this was said to give clues to intelligence; Hello Kitty likes indicated political views; Being confused after waking up from naps was linked to sexuality. These were just some of the unexpected but consistent correlations noted in a paper in the Proceedings of the National Academy of Sciences journal in 2013. Few users were associated with likes explicitly revealing their attributes. For example, less than 5% of users labelled as gay were connected with explicitly gay groups, such as No H8 Campaign, the peer-reviewed research found.
The researchers, Michal Kosinski, David Stillwell and Thore Graepel, saw the dystopian potential of the study and raised privacy concerns. At the time Facebook likes were public by default.
The predictability of individual attributes from digital records of behaviour may have considerable negative implications, because it can easily be applied to large numbers of people without their individual consent and without them noticing, they said.
Commercial companies, governmental institutions, or even your Facebook friends could use software to infer attributes such as intelligence, sexual orientation or political views that an individual may not have intended to share.
To some, that may have sounded like a business opportunity. By early 2014, Cambridge Analytica CEO Alexander Nix had signed a deal with one of Kosinskis Cambridge colleagues, lecturer Aleksandr Kogan, for a private commercial venture, separate from Kogans duties at the university, but echoing Kosinskis work.
The academic had developed a Facebook app which featured a personality quiz, and Cambridge Analytica paid for people to take it, advertising on platforms such as Amazons Mechanical Turk.
The app recorded the results of each quiz, collected data from the takers Facebook account and, crucially, extracted the data of their Facebook friends as well.
The results were paired with each quiz-takers Facebook data to seek out patterns and build an algorithm to predict results for other Facebook users. Their friends profiles provided a testing ground for the formula and, more crucially, a resource that would make the algorithm politically valuable.