Last week, AOL deliberately made available online the private search histories of half a million of its users. Not surprising - we think - the move caused uproar and resulted in the company pulling the data over the weekend.
Among the worries that have been expressed are:
* How easy it might be to
identify some of the users, even though the information was supposed to
be anonymous
* The possibility that google's recent successful court battle with the US government to keep such information private may be undermined
* The possibility that google's recent successful court battle with the US government to keep such information private may be undermined
In addition, there's the very likelihood that spammers and other bad hats will make illicit use of the information. And that fear remains very real even though AOL removed the data because - surprise, surprise - it's still readily available after a whole bunch of mirror-download sites sprung up!
In a piece headlined, AOL Proudly Releases Massive Amounts of Private Data, TechCrunch gives AOL a thoroughly-deserved roasting, saying,
The utter stupidity of
this is staggering. AOL has released very private data about its users
without their permission. While the AOL username has been changed to a
random ID number, the ability to analyze all searches by a single user
will often lead people to easily determine who the user is, and what
they are up to. The data includes personal names, addresses, social
security numbers and everything else someone might type into a search
box.
The most serious problem
is the fact that many people often search on their own name, or those
of their friends and family, to see what information is available about
them on the net. Combine these ego searches with porn queries and you
have a serious embarrassment. Combine them with “buy
ecstasy” and you have evidence of a crime. Combine it with an
address, social security number, etc., and you have an identity theft
waiting to happen. The possibilities are endless.
The Paradigm Shift says that the data contains,
...hundreds of searches
from people looking to kill themselves and even more scary are searches
from users that seem to be looking to commit murder.
A blog at Caltech reckons it knows what motivated AOL's action,
In their desperation to
gain recognition from the research community, AOL decided they would
compromise their integrity to provide a data set that might become
often-cited in research papers: "Please reference the following
publication when using this collection..." is the message before the
download.
The first indicator of what AOL had done looks to have come on the Geeking with Greg blog, subtitled, Exploring the Future of Personalised Information. But the author, Greg Linden, clearly hadn't considered the implications of AOL's actions. His Friday blog, A chance to play with big data, says simply,
...the new AOL Research
site has posted a list of APIs and data collections from AOL.
Of most interest to me is data set of "500k User Queries Sampled Over 3 Months" that apparently includes {UserID, Query, QueryTime, ClickedRank, DestinationDomainUrl} for each of 20M queries. Drool, drool!
You know, just the other day, I was watching a Google Tech Talk where a researcher was lamenting the difficulty of getting access to big data. It is exciting to see two of the giants, Google and AOL, making this kind of data available.
Of most interest to me is data set of "500k User Queries Sampled Over 3 Months" that apparently includes {UserID, Query, QueryTime, ClickedRank, DestinationDomainUrl} for each of 20M queries. Drool, drool!
You know, just the other day, I was watching a Google Tech Talk where a researcher was lamenting the difficulty of getting access to big data. It is exciting to see two of the giants, Google and AOL, making this kind of data available.
Perhaps the most succinct take we've so far read came from our own web master. His email about this matter contains only a single link (to TechCrunch) and this one-liner,
Glad I'm not an AOL customer...
Thoughts? Share them with us in this thread in the HEXUS.community.
HEXUS.links
HEXUS.community - discussion thread about this news briefTechCrunch - AOL Proudly Releases Massive Amounts of Private Data
Geeking With Greg - A chance to play with big data
The Paradigm Shift - AOL Search Data Shows Users Planning to Commit Murder
Caltech Blog - AOL Releases Search Logs from 500,000 Users
HEXUS.lifestyle.headline - New AOL video portal promises 45-plus channels