Wrong email address or username
Wrong email address or username
Incorrect verification code
back to top
Search tags: data
Load new posts () and activity
Like Reblog Comment
show activity (+)
review 2017-09-21 22:11
Everybody Lies by Seth Stephens-Davidowitz
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are - Seth Stephens-Davidowitz

This is an engaging and informative book about the huge amount of data available online and what it tells us about society. I read it alongside Dataclysm and found Everybody Lies to be by far the better of the two, presenting a wealth of information in a cohesive fashion and making fewer unfounded assumptions. The author was a data scientist at Google, and draws in large part on the searches people make on the site, along with information from sites including Facebook and Pornhub.

There’s a lot of interesting stuff in the data, from the rate of racist searches in the rust belt predicting the rise of Donald Trump, to common body anxieties and whether they actually matter to the opposite sex, to an estimate of how many men are gay and whether that varies by geography (it appears not), to rates of self-induced abortions. This is a great book to read if you love unusual factoids, whether on sexual proclivities or how sports fans are made.

The author also writes in a compelling way about the uses of Big Data itself, and while he waxes evangelical about it (evidently preferring to spend all his time immersed in statistically significant data, he finds novels and biographies “too individual” and unrepresentative), there are certainly a lot of possibilities there. In health, for instance, compiling early searches about symptoms with later searches for how to handle a diagnosis can help doctors detect pancreatic cancer at an earlier stage, while epidemics can be tracked through symptom searches. The author is also interested in how applying data can revolutionize a field, discussing at length the data that predicted the success of the racehorse American Pharoah.

Overall, the writing is engaging and the book hangs together well, being informative while mostly resisting the urge to speculate. But the author does make a couple of assumptions worth pointing out. One is that people’s Google searches are made in earnest and for personal reasons. Certainly, you might search for “depression symptoms” out of concern that you or someone you know is depressed. But you also might want to be prepared in advance to identify warning signs, or might have encountered something in the media that sparked your interest, or you might be a student writing a paper on the topic. On the other hand, if you’re intimately familiar with depression already, you’re unlikely to google the symptoms. None of this means the author’s finding a 40% difference in rates of depression symptom searches between Chicago and Hawaii isn’t relevant, but data that’s both over- and under-inclusive serves better as a starting point for research than a definitive conclusion.

The other assumption is that everybody lies: the book insists on it, based largely on the fact that typically rosy social media posts fail to reflect all those unhappy or hateful searches. Selectively sharing information doesn’t necessarily seem to me to be lying, but the author appears invested in proving the book’s title. For instance, he discusses a particular type of tax fraud: in areas where few tax professionals or people eligible for the scheme live, tiny percentages of people who could benefit from this lie tell it, while in areas with high concentrations of both, the rate of cheating is around 30%. The author concludes that “the key isn’t determining who is honest and who is dishonest. It is determining who knows how to cheat and who doesn’t.” This bleak view of the world fails to account for the 70% who don’t cheat even in areas with high levels of knowledge; finding that significant numbers of people cheat if they know how is a far cry from finding that everyone does.

So, like the author of Dataclysm, Stephens-Davidowitz is probably a better statistician than sociologist. But if you’re interested in Big Data, or in getting a peek at the thoughts and anxieties people ask Google about because they’re not comfortable sharing with others, this is the book I recommend. You’ll certainly get a lot of interesting tidbits from it, along with perhaps new inhibitions about typing things into Google!

Like Reblog Comment
review 2017-05-05 00:00
Dear Data
Dear Data - Giorgia Lupi,Stefanie Posave... Dear Data - Giorgia Lupi,Stefanie Posavec,Maria Popova If you didn't now by now, data visualization is beautiful. The correspondence of Giorgia and Stefanie is even more beautiful, and colorful, and significant. In a world of fast internet and social media, data brings people closer and makes their relationships more meaningful.
Like Reblog Comment
show activity (+)
review 2017-02-26 16:28
You'll never think about WMD the same way again
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy - Cathy O'Neil

Weapons of Math Destruction is a fascinating – and terrifying – read. I love titles that work on many levels and this one is brilliant. Brilliant pun and yet incredibly relevant once you start reading about the damage these mathematic models are doing to society in all areas from the justice system to education to consumer profiling.


These models, used in all walks of life, supposedly to streamline or enhance an area are built with inherent biases, resulting in incredible damage to many people, often the most marginalized among us. And because there is no transparency in how these models are determined or even, in many cases, an awareness that people are being modelled, there are no opportunities to push back. Even those that do demand accountability tend to be dismissed because, well, the model has spoken, and is taken as the word of God.


I had no idea how widely or how perniciously these models are being employed. This is a very important book that I think all us of should be reading. I appreciate how O’Neil lays these concepts out in an easy-to-follow manner, with plenty of real-world examples. Plus, she gets into the ethics of this kind of data mining and possible ways it could be used for good. It makes for an engaging read. I don’t read non-fiction all that often but this was one of those books like Freakonomics or The Brain That Changes Itself that will stay with me for a long time.

More posts
Your Dashboard view:
Need help?