
A recent study and research paper by the MIT Senseable City Lab -- called, Tasty Data -- has discovered that restaurant data alone can be used to accurately predict location-based factors such as daytime population, nighttime population, number of businesses, and overall consumer spending within a specific geography.
They started by pulling restaurant data from Dianping (Chinese equivalent of Yelp) for 9 Chinese cities: Baoding, Beijing, Chengdu, Hengyang, Kunming, Shenyang, Shenzen, Yueyang, and Zhengzhou. They then paired their Dianping data with other available data (such as aggregated mobile phone data) and used machine learning to search for any correlations.
Below is a diagram of "nighttime population" in Beijing. They are using a 3 km2 grid.

If you're a regular reader of this blog, you'll know that I like these kinds of studies. By 2020, it is estimated that 1.7MB of data will be created every second by every person on earth. The numbers are staggering. And yet, "official" data sources, such as census data, remain slow and fairly limited. Studies like this one continue to show us what's next.
Image: MIT Senseable City Lab


Economists at Facebook, Harvard, Princeton and NYU recently analyzed anonymous Facebook data in order to study our social connectedness. The New York Times’ Upshot wrote about it here and it is a must read.
There are a number of interesting takeaways from the study. One of them is that geography, distance, and political boundaries actually matter a great deal when it comes to our connectedness.
In other words, Americans are more like to be connected to someone nearby – within county or state boundaries – than they are to someone further away who may be infinitely more similar. This may seem somewhat intuitive.
But at the same time, having a dispersed network also suggests certain things. Here’s the relationship that they discovered:
These networks are important in part because of other patterns that are correlated with them. Counties with more dispersed networks — where a smaller share of Facebook friends are located nearby, or among the nearest 50 million people — are on average richer, more educated and have longer life expectancies. Places that are more closely connected to one another also have more migration, trade and patent citations between them.
Counties that are more geographically isolated in the index are more likely to have lower labor force participation and economic mobility, and they have higher rates of teenage births. Some of the most economically distressed parts of the country appear to be the most disconnected: Among the 10 U.S. counties with the highest share of friends within 50 miles, six are in Kentucky.
Again, it is worth checking out the full article. There’s also an interactive map to play around with.


“A NEW commodity spawns a lucrative, fast-growing industry, prompting antitrust regulators to step in to restrain those who control its flow. A century ago, the resource in question was oil. Now similar concerns are being raised by the giants that deal in data, the oil of the digital era.“
The Economist just penned an interesting piece arguing that the world’s most valuable resource is no longer oil, but data. That’s why the five most valuable publicly traded companies in the world are all tech/data companies.
But the point they are really making is that current antitrust remedies are poorly suited to this new precious commodity. For example, in today’s world authorities need to be thinking not just about firm size, but about the extent of their data collection.
There’s a reason firms with no (meaningful) revenue get acquired for huge numbers. Yes, sometimes it’s just for the talent. But it’s also because of the data they control and the potential threat they pose.
So much of what we do today leaves a digital trace. And those traces are hugely valuable. I suspect we will be hearing more about this as the data economy continues to spawn tech giants.