Data Science & Urban Planning

Here I describe some inklings of ideas around Data Science & urban planning based on recent conversations* I’ve had, and casual reading I’ve been doing. I will touch on Las Vegas, Brooklyn, the Hubway visualization competition, and FourSquare.

Metric: Return on Community
This weekend’s NYT magazine has an article by Timothy Pratt: “‘If You Fix Cities, You Kind of Fix the World'”. That sounded like a potential Data Science problem to me. The subtitle had me thinking for a bit: “Tony Hsieh’s plan to make Las Vegas more like Brooklyn.” “More like Brooklyn” = “fixing the world”? I love Brooklyn, but still! Is our ideal that the world be one big Brooklyn? Interesting thought experiment.

[FYI: Tony Hsieh is the chief executive of Zappos.]
[Note: I read the print edition. The online version has a different title.]

In any case, it’s a worthwhile mission. Here’s a quote :

“Every factory in the world is doing everything to maximize R.O.I”–return on investment–Hsieh said. “We’re doing everything to maximize R.O.C.”

“What’s R.O.C?” [Jake] Bronstein asked.

“Return on community,” Hsieh answered.

So that’s the metric he’s optimizing for. I wonder about what data you could collect to measure that. The article isn’t really about that, but it’s the sort of thing that gets my mind going. And so that seed is planted.

Hubway Visualization Competition
In the meantime, a couple days ago Kaz (of Data Science for Change fame), drew my attention to this Data Visualization competition: http://hubwaydatachallenge.org. Boston has a bike-sharing system called Hubway where people can check bikes in and out of subway stations. (New York is getting one of these bike programs too).

Hubway’s released a data set which seems very rich. Here I’m grabbing info from their page because I think the dataset is sufficiently interesting that I want you to see it. As you read it, think about what kinds of problems you could solve with it; what kinds of questions you would ask of it (they actually give some examples):

The Hubway trip history data

Every time a Hubway user checks a bike out from a station, the system records basic information about the trip. Those anonymous data points have been exported into the spreadsheet. Please note, all private data including member names have been removed from these files.

What can the data tell us?

The CSV file contains data for every Hubway trip from the system launch on July 28th, 2011, through the end of September, 2012. The file contains the data points listed below for each trip. We’ve also posed some of the questions you could answer with this dataset – we’re sure you.ll have lots more of your own.

  • Duration – Duration of trip. What’s the average trip duration for annual members vs. casual users?
  • Start date – Includes start date and time. What are the peak Hubway hours?
  • End date – Includes end date and time. Which days of the week get the most Hubway traffic?
  • Start station – Includes starting station name and number. Which stations are most popular? Which stations make up the most popular origin/destination pairs?
  • End station – Includes ending station name and number. Which stations are the most asymmetric – more trips start there than end there, or vice versa? Are they all at the top of hills?
  • Bike Nr – Includes ID number of bike used for the trip. What does a year in the life of one Hubway bike look like?
  • Member Type – Lists whether user was an Annual or Casual (1 or 3 day) member. Which stations get the most tourist traffic, and which get the most commuters?
  • Zip code – Lists the zip code for annual members only. How far does Hubway really reach? Which community should be the next to get Hubway stations?
  • Birthdate – Lists the year in which annual members were born. Are all of the Hubway rentals at 2:00am by people under 25?
  • Gender – Lists gender for annual members only. Are there different top stations for male vs. female Hubway members?

Time and Space Data
We’ve talked about time-stamped event data in the past.  We now add the dimension of space (location). Expand your thinking now to {user, action, timestamp, location}, where in the case of Hubway, location is a station in a subway system. But location could be recorded in latitude or longitude for example, if a user searched on Google maps from a mobile device.

Now, of course, this data set is even richer and provides user level information such as gender, zipcode and birthdate of the user; as well as bike information. It also shows start and end stations, which means that frequent routes can be captured. (Of course, we don’t know what path a biker took to go from point A to point B, but it’s a good approximation, and may not matter initially. If you think it does matter, think why and what additional info you’d want). What kinds of important problems could be solved by analyzing or visualizing this data set?

Foursquare
Another nice example of time-stamped spatial data comes from Foursquare. Foursquare is a NYC-based company that allows users to “check-in” to locations around the city. Did Foursquare invent the notion of “checking in” to an app? I’m not sure. But this is the terminology that GetGlue then used to discuss “checking in” to a TV show. Blake Shaw, who has his PhD from Columbia in Computer Science, is a data scientist at FourSquare, and he recently gave a talk at DataGotham where he shows some really nice visualizations. He also is scheduled to be at Strata in NYC this week. Analysis of Foursquare data also makes its way into the academic literature. See, for example, recently from Cambridge University: Talking Places, Modeling and Analyzing Linguistic Content in Foursquare.

Urban Planning & ROC
Let me tie things back to urban planning or ask you to do so: what can or should urban planners do with this type of data? The visualizations and analysis can be beautiful as well as insightful. What other data sets would be useful for urban planners? Could any of these be used to maximize ROC (return on community)? How would you even define ROC?

Students– if you want to visualize this Hubway dataset for question 1 of the homework due on Wednesday, October 24th, instead of the Flowing Data exercises, you’re welcome to do so. Though you already have a lot of other stuff on your plate.

*Recent conversations = with Professor Mark Hansen, next week’s guest speaker; Kaz Sakomoto (urban planning Masters student); and Michael Waite (a PhD candidate in mechanical engineering, who specializes in energy efficiency-related problems); not-so-recent conversation = Blake Shaw, Foursquare data scientist.

42 comments

  1. This goes to the heart of one of the questions that I have been wondering. As government budgets shrink for city planning projects, are corporations going to take on the burden? I think that Zappos and other like-minded companies are setting the tone for possible private-public partnerships in the future of city planning.

    The private and public sector have had different opinions/expectations for as long as they have existed (ROI vs ROC perhaps?) and companies have a long history in planning communities such as the company town model of the industrializing US past ie. Pullman. In Japan privatization of their rail lines have created some of the most well maintained and dependable commutes albeit at much higher costs than what we are used to in NYC.

    There have been studies showing positive correlations between happy workers and productivity, so there is no question why progressive companies like Zappos would not want to ensure happiness amongst those it employs. Creating a work environment that promotes conviviality is one aspect, but doing so on a citywide scale brings in a whole slew of variables. The Towne Terrace example can be seen as an over-sight, and I’m sure it will not be the last.

    Yes the road to building vibrant communities are not easy, and I’m not saying that Zappos should stop. Perhaps planning communities can be seen as a high-dimensional problem? The only problem is that unlike integers or strings is that these models directly affect humans every time they are run.

    As we move forward I still wonder if there is enough of data for a training set to plan model communities? How to weight location and other features? I think we are getting closer to understanding building communities, but I do tip my hat off to Hsieh.

  2. Eurry Kim · · Reply

    This post reminded me of two really cool social research studies:
    1. William H. Whyte’s “The Social Life of Small Urban Spaces,” which you can watch here: http://vimeo.com/6821934
    2. A Radiolab segment on the study of cities by the Sante Fe Institute:

    http://www.radiolab.org/2010/oct/08/

    Whyte’s piece was a 1979 observational study of the utilization of public spaces (parks, plazas). He studies human behavior in the context of the structure of the public space. He tracks sun beams, height of sitting spaces, width of sitting spaces, tree coverage, mobility of sitting spaces, access to food, presence of water (e.g., fountain, pond), time of usage, … Some of the findings from his research were integrated into the regulations set by the New York City Planning Commission. Whyte was essentially studying the factors that make up Community. In light of Whyte’s study, I think Community refers to human presence. It’s this presence of human beings within a space that allows for interaction, “friction,” and serendipitous encounters. Think “Field of Dreams”: “build it and they will come.” To me, for a city to increase its ROC, then it needs to provide the types of spaces that are conducive to people being SOCIAL. Present. Outside of their gated communities. I’m reminded of my college town, Irvine. Irvine was uber-planned and you can see that the planners heavily weighted the importance of the SOCIAL in the layout of the city. Several neighborhoods surround social “centers,” which include grocery stores, shops, and always a Starbucks. So, the residents of the surrounding neighborhoods would have no choice but to travel to the center to buy groceries, watch movies, and grab grande no-foam beverages. Though, it’s a bit ironic that there is somewhat of an “Irvine Bubble,” in which people feel SO comfortable in their space that they never want to leave Irvine nor do they care much about what is going on outside of Irvine. So, while the micro-ROC is doing well, macro-ROC is not so hot.

    Meanwhile, the Santa Fe Institute is studying the characteristics of people living in cities — how many patents do they have, how much energy do they consume, how quickly do they walk/talk? And the Radiolab segment is colored with the fact that cities rarely ever die. So, there’s something about the organic population growth of a city that perpetually renews it and keeps it relevant. And while utilization of technology is probably concentrated within large cities (quick adoption of new technology -> city relevance), Community is so much more than FourSquare check-ins and Facebook “I was here” posts. I would argue that these “check-ins” actually inhibit ROC. I think we’ve all been in the situation in which a friend is updating his/her status on Facebook while he/she is at a restaurant with friends. Is this being social? Is this building community? I don’t think so. Hmm… it might be more interesting to study restaurants. How many people in each party? What time was the reservation/walk-in? What was the overall turnover? How much revenue per day? Eating is such a social activity — is there a better measure of Community?

    1. Eurry, I enjoyed reading your post. I was initially thinking of the datasets NYC provides, but they miss that crucial component you talk about. Very few datasets provide social interaction data, and those that do were limiting (calendar of events, service requests).

      But as you say, colocation is not the best metric of community. In that first video, Whyte follows people in an open public space but comments on how there is not much “mixing”. Data from restaurants and public places where people meet are probably the best places to start, but I’d imagine what people are saying about the place or event (either on Facebook or Twitter or Yelp) is far more useful and much more interesting as data, where we essentially have recorded conversations that can be mined in many different ways.

  3. James McNiece · · Reply

    I think Foursquare-type data will prove to be an immensely powerful tool for urban planners. It allows them to see how people actually use their communities, like which routes they follow to go about their daily business, and which services & establishments are the most in-demand. This could then be used to zone space more effectively so that roads were in the right place and the supply of public parks, restaurants & bars, retail establishments, and commercial space matched a community’s demand for them.

    To illustrate this by way of a similar example, I read an interesting article a couple of years ago in which the author used Google Earth images of a de-urbanizing Detroit to speculate on good locations for roads. He used the images to find places where citizens had worn down the grass so much that their tracks were visible from the air. Here’s a link to the article: http://www.sweet-juniper.com/2009/06/streets-with-no-name.html.

    However, even though I do believe that Foursquare-type data would make urban planning more effective, I think a similar improvement could be achieved at lower cost by loosening restrictions on development and relaxing zoning rules. This would allow the choices of citizens and entrepreneurs to determine the landscape of the city directly, as over time only those service providers that received the financial support of members of the community (or its frequent visitors) would remain in business.

  4. Yegor (it2206) · · Reply

    Adding spatial dimension to data offers obvious advantages to urban planners. For example, ability to visualize traffic statistics and tracking people’s movements based on the cell-phone data empowers urban planners to understand where demand lies and thus create an optimally designed transportation system. Spatial data can help businesses pick the right location and therefore grow, nourishing whole communities. Spatial data can also be used to understand people’s behavior and create optimal action plans for emergency situations, as in Sandy’s case.

    Examples of spatial-data-based visualizations that might benefit cities were also brought up by Mark Hansen during his presentation on October 24. One example was the piece of work by Spatial Information Design Lab here at Columbia University, which visualized neighborhoods and even specific blocks, from which the existing prison population comes. As it turned out, sometimes government is paying millions of dollars to keep inhabitants of some specific blocks in prisons. Such a visualization works as a call to action, stimulating us to think on how to deal with crime in general [http://www.justiceatlas.org/]. But what makes this approach especially beautiful is the way it can be extended to other areas. For example, similar data could be used to visualize where homeless people mostly come from.

    All in all, I feel that urban planners will be faced with 2 main challenges: deciding (1) what data to collect and (2) how to use data creatively in order to get useful insights. Answers to both of these will change the face of the cities in the coming years. Answering both of them will require an education that could teach urban planners how to deal with data intelligently.

    1. Well summarized. I’ ll add my two cents on top of his. As is commonly said, more data usually trumps better algorithms, and I believe this idea is also valid in the field of urban planning. Spatial and temporal data set like ones from the Hubway can allow planners to “visualize” their answers to all sort of interesting problems being asked, the problem such as traffic, movements, etc. And the well-designed questions with the help of the appropriate data usually leads to sensible solutions, which lead to insights, which then can help lawmakers to make right decisions to make a better society.

      Now as Yegor mentioned, deciding what data to use to answer questions is yet another interesting sub problem. Data like one from Foursquare are volunteer-based while the ones from Hubway are passively obtained. Depending on questions you’re trying to solve, choosing the “right” data is important.

      Speaking of other type of data that might useful for urban planners, I like Eurry’s idea about using restaurant data as a measure of community. But again, one should be cautious as not everyone is a frequent restaurant goers.

      1. An important point that Albert mentions is the difference between “volunteer-based data” and “passively collected data.” Personally, I have always struggled with interpreting volunteer based data like foursquare check-ins. Having been a very regular foursquare user for about an 8 month period before stopping, I know that my check-in behavior was mitigated by what I wanted to show up my profile. People check in at cool and interesting places where there is good 3G reception. I think foursquare data can certainly be build into products that serve individual users of foursquare (recommendation engines, mostly), but I still struggle to understand how a small portion of the population (foursquare users) can really serve as reliable proxy to help planners understand how people use different spaces and the built environment.

  5. The connection between data science and urban planning shocks me again. I have never thought about that issue before. But it is true that the return on community should be emphasized.

    With the development of society, the urban planning has shifted from function-focus into human-focus. How urban planning can assist in a better life experience has been brought to board. In less developed cities in China, the urban planning largely depends on the governor’s thoughts for the sake of economic development. And even in big cities, the urban planning has given into real estate business. I think China is now still in the phase of function-focus. Hence, if the urban planning has no obvious economic benefits, the government will not consider it. So what I am saying here is that the urban planner is different around the world. For a new-york planner, in face of the crowded traffic, he cannot suggest to broaden the road, which, in China, is the normal method.

    Back to our topic for this assignment, how can the urban planner better utilize the data? I really like those check-in apps, such as yelp. Those data could provide the people flow efficiently. For example, the taxi driver can get business in the crowded area. And urban planner can provide the guidance for taxi driver according to the data, which help the people get a cab and the driver get a business. The data directly from people is the best way for urban planner to understand the needs. With a human-focus in mind, urban planner can utilize such a data analysis to improve people’s urban life in some small aspect.

    I would like to see the day when the urban planner in China can have a more harmonious concept. I would like to see the details in urban planning — in street, in corners, in parks and to see the respect to human. It may take a long time when the governor have the time to consider people-related data into urban planning. But I will wait and see.

  6. The idea that corporations will increasingly drive city planning and development makes me think of the familiar concept of company towns –the difference being that urbanism is now highly desirable and the jobs are specialized and high income. The NY Times article mentions shrinking government funding of development, at which point which point companies like Zappos step in. As the Brooklyn-lifestyle is now popular, it makes sense for a company to attempt to create a city-community atmosphere while seeking out a location that is inexpensive and not under tight control by the existing communities. There seems clear parallels in that they are/were declining urban areas meant to be rebuilt into thriving communities. However I wonder if New York’s building stock, transit system, and forced density caused by high housing prices may be important missing factors in the Las Vegas neighborhood.

    Boston’s Hubway, and Foursquare seem to be two great examples of how cities are particularly conducive to creating and maintaining datasets that may allow for optimized predictions and strategically driven development. I find it very exciting how Foursquare is incorporating social network info, human behavior, location characteristics, and other time related variables in providing better activity recommendations. Both Hubway and Foursquare have the benefit of having metrics for an individual’s behavior across time. If cities are to be more effective in using data science in driving community development, planners may need greater access to individual location data across time.

  7. Jennifer Shin · · Reply

    There is a difference between working with data and understanding data (as an aside, the word data can be replaced with any number of different forms of knowledge and the statement would still hold). This difference is especially important to keep in mind when we think about complex problems. In this case, perhaps we should take a moment to ask ourselves: does it make sense to consider ROC in place of ROI or to even compare ROC against ROI?

    For instance, consider the philosophy of an individual and a business. A person can choose their personal philosophy based on experience and perspective, but a business or government agency has much less of a choice. For instance, a business cannot risk choosing any philosophy that puts the business at risk of bankruptcy and the government cannot take the same liberties as a private business because it is representative of a larger organization.

    Tony Hsieh achieved success at a young age as the founder of Zappos. This provided him with both resources and opportunity to develop, test, and refine his ROC perspective. Regardless of whether or not ROC can be implemented on a grander scale, this is even less likely to occur if we oversimplify ROC into simply “an idea” with no acknowledgement of the time and attention Hsieh must have invested to get it to a point where he can express it so succinctly.

  8. To use Hubway data as a resource for urban planning is a way worth trying. However, when we have this data on hand, I believe we still need to know more except the data itself. What’s the percentage of people using the rental system out of the whole population? Are there any other population stratification? Where are the other community members who don’t use this system? In which ways can we integrate this data with other publication transportation? Even through there are tons of data points in the dataset, there are still much more need to be taken into consideration before draw any conclusion.

    Also, I am with Kim regarding the micro and macro ROC. Just as what’s saying in the radio lab, “There’s no scientific metric for measuring a city’s personality”. There should be a tradeoff when using data to help urban planning because its human society related. The way we used data depend on the result we want to get from it.

  9. I read this blog post two times, and it meant totally differently to me! Once on October 22nd, where I just thought about everything Rachel had said, and once today (which, for historical reference, is just a few days after Hurricane Sandy hit New York, and also the first day of my trip to Chicago!) and this time around, I was mostly reading between the lines, if you know what I mean.

    Chicago’s famous for its history of being burnt down and then rebuilt. A byproduct of this very unfortunate event, known as Great Chicago Fire, was that the city was rebuilt using modern urban planing best practices. Like New York, Chicago is also famous for its skyscrapers (mainly because it was the first US city to have several such buildings). Unlike New York, these skyscrapers are not located on random locations, are not connected to each other by narrow, oddly bent streets, and do not deoend on a 100 year old subway system, a several hundred year old power system, a multi hundred year old sewage system, … and you can see where I’m going with this!

    The question to which I don’t have an answer is, what would happen to Chicago if it was subject to the same conditions New York experienced during Hurricane Sandy?

    On one hand, I blame New York’s ugly urban design for many of the problems we experienced during Hurricane Sandy, from the (quite ridiculous) loss of power in the most populated part of Manhattan (why on earth should we populate an island like this, if we don’t have reliable contingency plans for providing fundamental resources to it), to the transportation issue (isn’t it true that out of three major tunnels leading to Manhattan, the one which was least affected — Lincoln Tunnel — is the one which was designed better, i.e. with its both entrances away from the shoreline?), etc. Part of me thinks a better urban planning could mean less impact by natural disasters.

    On the other hand, I’m not sure if we could really learn anything, just by looking at data, that would help us plan New York differently in such a way that the impact of Hurricane Sandy would be less, or it would take less time/money/lives to get back to a normal lifecycle in that city. As a friend of mine recently said (paraphrased), “it is not like New York City is built without a good plan in mind; in fact, they try to optimize its urban structure based on the data they collect ALL the time; problem is, they optimize it for the 10000 days without a Hurricane, not for that one day out of 30 years were a hurricane hits!!” That sounds like cold truth to me! If you take an “optimization” approach to your data, it makes sense not to “waste” money, if you will, to make a fortress out of a city which is probably hit by a major natural disaster only once every few decades.

    I’d like to take Jennifer’s opening statement to another level and say “there is a difference between working on a problem and understanding the problem”. There is so much you can learn from the past, and if all you want to do is based on experience, you will never be prepared for the unpredictable. Remember our talk in the class about whether using a machine to score essays will kill creativity? I’m afraid relying on observational data for critical matters suffers from the same problem: it kills your creativity, as an urban planner, to prepare for the unpredictable.

    I am not trying to underrate the importance of using data in such a technical field as urban planning; all I’m trying to say is this is a hammer, but not everything is a nail. There is a reason why observational data is considered low quality in life-and-death disciplines like healthcare.

    PS: Rachel said in the above that she was asking herself if “more like Brooklyn” = “fixing the world”. Having heard about the problems people living in Brooklyn are struggling with these days, I’m sure that equation does not hold. ;)

    1. Your thoughts about Chicago reminded me of Brett Goldstein, who is the Chief Data Officer for the city of Chicago. The spatial component of statistical analysis is especially important when we talk about the relationship between data science and urban planning. Goldstein got his reputation as a police officer by using 911 calls to predict locations likely to have a murder in the next twenty four hours.

      I was at a lecture in Chicago where he described the linear regression method that he used. The police department used his model to deploy more forces in a given area and at a given time based on historical 911 call data. By his measurements the approach was successful at reducing the number of murders in those areas where the officers were stationed.

      His example is a really good case for the positive impact that data science can have for urban planning. This article gives you more details about Goldstein’s story:

      http://magazine.uchicago.edu/1102/arts_sciences/byte-cop.shtml

    2. I like your idea about how Urban Planning can lead to a better result of Sandy. I just read the news about the lost of experimental mice in NYU hospital that some PHD students had been caring for several years for their PHD thesis due to the damage of Sandy, and I felt horrible for them. If we can make good use of mess amount data base and make good plan to utilize resources during these disaster period, life will be much easier. For example, it will be great if we can show people maybe by message where in the city has electricity for charging, internet to work with, water and food that they can purchase during that time.

      Connecting to your topic about Chicago, I was born in Shanghai, and has been living there for 18 years. As a busy city crowed with people, I felt like urban planning using data analysis will be so useful and should be something that the government thinking about. With urban planning, people can solve problems like everyday traffic jams, subway crowds, and long lines for everything.

      In the business side, I felt like Starbucks is doing a really good job of utilizing data and find them good location in crowds. Every time I was waiting for a bus somewhere in city, I can always find a Starbucks for a coffee or so.

      Return to community is a interesting but kind of ambiguous concept to me. I think of this as building supportive system for people. I am really interested in what does it mean to different people and how does it work in different area.

  10. Locke and Demosthenes, qua advocatii diaboli · · Reply

    I don’t like the term “Return on Community”. It’s something like an aggregated version of social capital and problematic for one of the same reasons: the disanalogy between it and the term it mimics.

    Investment is something you put in, and by means of some function, returns something you get back. But you don’t put in a community (unless of course you’re importing techies form Brooklyn), if anything you select the community-to-returns function. Even this is ambiguous, because both the community is both the input population and the interaction-environment-whatever function. It’s also ambiguous (again, pretty much exactly like social capital) what the outcomes are, and whether they are distinct from the measurement of the inputs.

    Finally, there’s no clear owner of community, so no-one to whom returns accrue. One reason the measures can’t be easily calculated is that they take different forms and are obtained by different people. On what basis should we believe that the kind of community that benefits a tech startup also benefits an established automobile manufacturer, or a public sector employer, or a hipster coffee shop?

  11. Zappos’s project in Las Vegas shows potential for the application of data science at its worst: as an instrument used be “upwardly mobile, innovative professionals” (to take a phrase from the NYT article) to create an even nicer life for other “upwardly mobile, innovative professionals,” while ignoring the plight of the disadvantaged. In Zappos’s narrow and misleading understanding of community building, two important aspects are misrepresented.

    First, it would be more appropriate to speak not of community building but of community transfer, since all of those 10,000 inhabitants projected for Zappos’s new neighborhood are already part of some community; most probably of a community at a place in the US where living is less burdensome for the environment than in Las Vegas with its water scarcity and need to run air conditioning 24/7 most of the year.

    Second, what Zappos calls community building comprises, first and foremost, community destruction, since “downtown Las Vegas has for decades been a magnet for low-rent apartments and the homeless.” That Zappos nevertheless described the area as empty is a sign of social ignorance if not of lack of humanity. Even worse is Hsieh’s offhand statement, brushing the problem away: “We made some bad assumptions. Next time, we’ll dig deeper.”

    This terrible lack of compassion crystallizes itself also in the terminology “ROC.” To follow the analogy to ROI, we find a concept that privileges the interest of shareholders while ignoring those of other stakeholders and the environment. In a similar vein, ROC, as far as one can tell from the project in Las Vegas, seems to privilege those who already have access to a well-developed community over those who do not.

    The real problem for America’s cities is not to make life even more agreeable for “upwardly mobile, innovative professionals,” but to eradicate the pockets of poverty and urban plight. In his recently published book “Great American City,” Harvard sociologist Robert J. Sampson shows how bad neighborhoods can keep their inhabitants hostage in vicious circles of poverty and despair. Conventional policy efforts have done little to solve this problem, and it is worthwhile thinking about how data science might contribute. What is sure, however, is that the analysis of bike data from better-off neighborhoods won’t help. The best it could achieve were a band-aid on the gaping wound of American cities.

    1. P you really hit the nail on the head. The NYT, every so often, feels the need to spout off articles from this real estate developer perspective that are just ridiculous. (If you are wondering why, check out the amount of real estate ads in the Saturday paper.)

      This is an issue that ties together the definition of community, how data science can be used to track the framework and interaction of communities, and who is left empowered or disenfranchised by data science.

      There are opposing views on ROC. From the perspective of the real estate developer or city government, ROC is largely a factor of property value. This isn’t an entirely bad thing. Higher property values often go hand-in-hand with infrastructure development, increased public space, and new business creation.

      On the other side, there is a localized view of ROC that is very different. In this view, ROC would be largely a function of length of stay and network diameter. A long average length of stay and an short network diameter imply a deep-rooted bond within a community – even if that community is not necessarily massively revenue generating for the city or developers.

      Often – as you mentioned in Las Vegas – cities destroy communities with a high latter level of ROC in order to encourage the creation of communities with a high level of the former. It is funny that the article uses Brooklyn as a model. Of course the Brooklyn the author of the article is talking about is really the few costal neighborhoods that have boomed in recent years at the expense of the prior residents – places like Park Slope, Downtown Brooklyn, Fort Greene, and Williamsburg.

      Ever since the Supreme Court Case Kelo V. New London in 2005, cities have been able to enact eminent domain as long as the community enjoyed economic growth as a result. This means that cities can now enact eminent domain without taking into consideration the second ROC view – and indeed NYC has repeatedly enacted it in conjunction with Forest Ratner in Downtown Brooklyn. Basically it works by NYC kicking out the former (read poorer) residents and allowing development companies to build high rises in their place.

      Finally, data science ties into all this by empowering those who are inputting data into the system. That is – it empowers the economically well off who use credit cards and smart phones, who have bicycle rental stations in their neighborhoods, who use the internet every day, etc. We see the data of the well off, while the poor disappear even more than before. Thus when we analyze the data, we ignore those who are already being disenfranchised by it.

  12. As humans begin to interact with their environment, both real and virtual, there are some similarities in interaction that allow us to understand human behavior at scale. Architecture influences human behavior, and there are examples from the physical and online worlds to suggest that this is true. The success of the Stackoverflow Q&A engine (over other Q&A sites) in bringing together a community ranging from experts to novices who gather to ask and answer questions is an important lesson in the design of user centric systems.

    To quote Joel Spolsky of Stackoverflow from a Google Tech Talk delivered in 2009 titled, “Learning from Stackoverflow”, (http://www.youtube.com/watch?v=NWHfY_lvKIQ, the first fifteen minutes)

    ” And the story is about how, when you have a group of people and you give them an environment, you don’t even have to have people, you just create an environment. Those people will come into the environment and behave according to what you built. In certain very, very subtle ways that you probably didn’t think about.”

    In the physical world, environment influences behavior, as observed in the video (http://www.youtube.com/watch?v=2lXh2n0aPyw), which shows how to get people to take the stairs. The website, http://www.thefuntheory.com/, further goes on to demonstrate that even in the real world, something as simple as fun is the easiest way to change people’s behavior for the better and this can be controlled by designing systems that encourage a particular response.

    The successful adoption of gamification techniques has been an interesting development in the evolution of social systems. Gamification is the use of game design techniques and mechanics to solve problems and engage audiences and works by taking advantage of humans’ psychological predisposition to engage in gaming. The technique can encourage people to perform chores that they ordinarily consider boring, such as completing surveys, shopping, or reading web sites. Social platforms like Foursquare and Stackoverflow have demonstrated that social systems can be useful and successful by incorporating user centric design and gamification techniques.

    In the future, data scientists and urban planners can collaborate to design living and community spaces from user generated data. Just as the data scientist’s role is to study user engagement with a product, detect patterns and weave this knowledge back into product design, I see the role of data driven urban planners as one of community building, where the community is the “product”, by placing the emphasis on human capital before laying down structures. A positive return on community, perhaps?

  13. Bianca Rahill-Marier · · Reply

    I read the NYTimes article a while back, but before reading this post hadn’t really thought of it in the data science context. I think Tony Hsieh’s idea of R.O.C. is interesting and I’m not necessarily opposed to it. I don’t think many would disagree with the idea that good communities have intangible benefits for those who live in them. I’d like to pose a slightly different question; how do communities interact with other communities and those passing through (i.e. tourists, day commuters, etc..), in addition to its permanent population? Reading the NYTimes article I couldn’t help by wonder how the new residents (all theoretically employee of tech and other start-ups) would interact with the existing population? More generally – how can/does/should R.O.C. incorporate both those who ‘belong’ and those who don’t? In my mind a good community makes as little distinction as possible between the two or it tends to be excessively isolation and exclusive — at the detriment of R.O.C in my opinion (though I admit to the subjectivity of this statement)
    To bring this back to data science, I think it would be interesting to consider how different amenities serve different populations and how their location/implementation can be optimized to maximize R.O.C for everyone. I think it is at the detriment of communities to be isolated bubbles, and I think excessive intentional planning can sometimes contributes to this; I can imagine that it will take a very long time for Hsieh’s Las Vegas community to blend naturally with its surroundings (or perhaps it never will and will simply overtake it?). While social media tools such as Facebook and FourSquare certainly identify the preferences of a certain population they are far from representative of an entire community. Getting statistics on how many members of a community check-in places on FourSquare or Facebook would be a great to start to deciding whether this is really a good way of measuring return on community. Like another student mentioned, I’d be more interested in measuring the influx in-out of restaurants or other establishments and the size of the parties attending. This of course targets a certain population who can afford to frequent restaurants on a regular basis, or might be more specific to tourists in the areas. My favorite idea of measuring community is to measure the use of parks and attendance at community events. Both are public and free spaces and for a community to be successful both as a micro-environment and a macro-one (I stole this distinction from an earlier post), only such environments allow fully open interactions, both for those who spend most of their time inside the community and those who are just visiting. A magnetic strip placed over the entrance to a park could easily measure traffic without impeding on the privacy of citizens. More specific information would likely have to be provided voluntarily by users. Regardless, I think there are a variety of ways to measure and promote R.O.C., and I’d be interested to see them focus not only on permanent members of a community but the benefit the community provides to others who pass through it or even to those outside it (i.e. how much community service does a community do as a whole? etc…).

  14. When it comes to data with not only timestamp but location, the task becomes an operations research problem, which involves mathematical optimization. With the dataset, the most significant thing i can imagine is indeed “optimization”, because in our daily life we met tons of problems in scheduling buses and subways, traffic jams, etc. Take the subway as an example, which is very similar with scheduling bikes, some lines of NYC subways are sharing the same tracks, but they are not equally important in terms of number of passengers they carry in different stations and in different time. In order to maximize the utility of the subway system, we could optimize the system by looking at the history dataset. Bikes are the same thing, the hardest thing is to consider not only time, but location as well, and integrate them and analyze dynamically.

    1. Agreed. Location data literally adds another dimension to the possibilities of what we can do with information. Not only can this provide interesting and fun insights, they provide absolutely necessary ones. I think, with the rise in the amount of data we have, we have a huge responsibility to actually do something with it to benefit communities, cities, etc. Extending the subway example you spoke about, and that hs posted about earlier, there is a gigantic opportunity present to use data in a meaningful way. We could argue the benefits and drawbacks of how New York City and its subways were planned endlessly, but regardless, the storm that just happened shows some of the obvious ramifications from poor choices. The data we have (or, someone has) on NYC’s commuters should be considered vital to restoration efforts. Even when considering power outages we could look at similar things. Cuomo has recently been chastiscing the major local utility providers for their slow responses to the storm. Their planning should include any and all data they have available to analyze where the biggest outages are, and best optimize their responses.

  15. When it comes to Urban planning, I came up with a great research result, which is conducted by Dr. Yu Zheng, who is a lead researcher from Microsoft Research Asia. On his web page there are many urban planning result visualization, which is, as I see it, clearly and interesting.

    http://research.microsoft.com/en-us/projects/urbancomputing/default.aspx

    After searching his research results, I think using data to analyze and make decisions about urban planning is quite feasible and useful. Like he mentioned in ’Constructing Popular Routes from User Check-in Data’, we can get data from some auto-mobiles’(taxi especially) GPS or people’s mobile phone signals. And use these data to find some ‘popular routes’ to optimum the street design. Therefore, it’s quite sure that, as for urban planning, the goal can be achieved to serve people and cities or, as mentioned by Zappos, ROC.

    But after I saw the comments on “What Happens in Brooklyn Moves to Vegas” on New York Time Magazine and one comment of our classmate, hs, who talk about his opinion after New York hurricane Sandy and compared with situation in Chicago, I’m a little bit wondering about the definition for ‘serving’ or say ‘return’. Is the data correctly and truthfully reflecting how citizens want to select one route, instead of having to select? I’m not sure. As I saw three comments of the article, the people living in Las Vegas, all hold a negative opinion to the article, especially one of them mentioned her beautiful and comfortable life in Vegas. And as hs mentioned in his comments, please allow me to quote, “all I’m trying to say is this is a hammer, but not everything is a nail.”

    The other thing I always maintain, even I have no knowledge concerned urban planning and regardless of out topics, is the culture characters, especially when I see the title of the article: “What Happens in Brooklyn Moves to Vegas”. I think that’s terrible especially implementing this idea to the whole world! Different country has different culture and different characters. What if New York, London, Tokyo, Paris and Beijing all looks the same. I mean if planners just concerned about what data show and ignore the cultural thing, something which could make the city special might be lost. Actually after I read the whole passage, I felt the title is a little misleading and there are some good ideas in the passage. But I was really against the article when I just saw the title.

  16. I enjoyed reading many of the posts. I am using a bit of marketing jargons to explain my discussion points.
    I am assuming that we have a pre-designed urban structure based on which we have developed many habits of conducting different behaviors leading to normality for shopping, for entertainment, etc. Let’s consider the basic distinction between relationship of loyalty and satisfaction in marketing. Much literature work has been done about the relationship of the two. In some cases they proved to be very correlated and in some cases disconfirming this relationship or at least showing other intermediary factors affecting it. There has been much debate on these two and also the concept of retention. Why not making it clear by throwing a very simple example of a product purchase behavior. There are many reasons that one might repeat purchasing a specific product X. In one case one is in love with the product X; in another case, X might best fit one’s budget (However he/she might like Y better); Preference and satisfaction level of product Y might be downgraded by the distance from the store offering the product, if product X is much closer to where one lives. So, he/she won’t bother to commute long distance to buy Y; there could be dozens of other reasons relating the behavioral factors. Economists would say we do whatever makes us happy in general. Thus, retention might be existent in case of little or no satisfaction.
    Now I would like to connect this example with the thoughts of urban planning and what kind of data to collect. Collecting Foursquare type of data is always useful, but might not be enough. In an established urban environment many behaviors that we collect through Foursquare-like data is not representative of what we need to improve, meaning might not be the best solution, but the structure intrinsically encourage/force people to do whatever is consistent to that structure. They mostly represent the so far established behavior. Using this kind of data to improve urban planning, might lead to sparse organizations in terms of improvement efficiencies. In the first and second session of the class, we talked about data science explaining data science. This might be a peripheral curse caused by that.
    What I think would be important rather than only time-stamped and spatial data is to look at those that bear a semantic nature as in reviews and user-consciously-generated data. What might be hard or time consuming to achieve might be the interpretation of semantics into understandable and usable data.

  17. Data that places like Foursquare provide for the community really helps let us as analysts better understand people in different communities, or so called neighborhoods. It allows for easy tracking of where people tend to go, or even where people avoid. This is a very powerful tool for urban planning in the sense that picking an ideal location becomes easier to pinpoint with the collection of more data. Places where traffic seems the most congested may result to be an ideal spot for certain stores to exist. Being able to pinpoint location with the constant updates of apps has become almost habitual for most people, therefore generating constant data throughout the day. Someone may decide to “check in” to numerous different places every day, which would allow for interests to rise for other users. The use of this kind of data is in immediate interest to me and what I do since I have been working on a startup project that utilizes this kind of data (it is still a little premature to explain in detail). Although I may not be able to say much about this project, it is immensely useful and powerful to be able to use time stamped data along with a space dimension when trying to determine where a certain person may or may not want to go. It is in this area that communities are formed, where people with common interests or people living in the same area are able to bond and share their common likes and dislikes. Not only can we generate popular areas that people visit, we can also pinpoint the time of day that it is most crowded, or what days to go are the most ideal to different people.

  18. Alexandra Boghosian · · Reply

    We often take for granted the way space shapes our daily lives and social interactions. Designers, architects, and urban planners use this powerful tool to their advantage (and hopefully ours) every day. Steve Jobs leveraged it Pixar’s buildings to get people to interact. Tony Hsieh’s plan for Las Vegas uses this tool for purposes of good; he ultimately wants to build a community. This stand in stark contrast to the urban renewal plans of the 1950s and 60s. But the means remain the same, right down to basing metrics on Jane Jacobs’ ideas for fostering creativity.

    The difference now is that the relationship between space and society can finally be quantified and tested. In fact, this has been done. There is a certain configuration of chairs in a bank that yields the most transactions. This was found by analyzing the surveillance records.

    Hsieh has a good metric for success; whether people stay. It’s easy to measure, and probably decently represents residents’ satisfaction with the city. But this will take time to measure. I would like to see Hsieh try to generate some more real time data in the process, to see how these relatively specific actions impact people. Maybe using Foursquare to see, for example, how people change their behavior as the city gets each new pizza joint.

    To touch on a slightly unrelated note, I fear that by trying to build a community, Hsieh will belittle the community that already exists. He seems aware of this problem, but once a space is created, the culture that it promotes quickly snowballs.

  19. I agree with the general idea that we might be able to create greater R.O.C from cobbling otherwise innocuous data. However, in addition to gathering data such as: length of rental, distance from original rental, and peak hours for rentals, etc., other extraneous variables such as the safety of the neighborhoods which the rider will most likely traverse, whether or not there are bike lanes, time of day and weather should also be taken into consideration. Of course the aforementioned extraneous variables is a non-exhaustive list, but i believe the more variables we look at outside of the data collected, the better our understanding will become as to the rider’s motivations. Furthermore, in order to increase our R.O.C we must determine our community’s demographic. Here the glaring issue would be whether or not the people renting the bicycles are truly members of the immediate community or simply commuters. Identifying the community will make the data collected more useful to our specific aims, whatever they may be.

  20. Companies have been involved in Cause-Related Marketing (CRM) for years. From McDonalds’ efforts to raise awareness and funds for Ronald McDonald House (supporting families of children with diseases) to American Expresses 1983 campaign to dedicate its funds to restoring the Statue of Liberty, Companies played a significant role in their ROC efforts while still benefiting their ROI. Now, with the advent of technology and data, what appears to be most useful for a ROC effort can be tracked and fine-tuned with data? Having a world of Data Scientists can be the missing tool to this data-research-to-company-investment relationship.

    In the case of the data collected by Boston’s transportation system, there appears to be a great opportunity for Companies to invest in issues that have an immediate and possibly monetize-able solution. By overlaying Bus Routes and Non-Subway Public Transportation options we can see if the population is being underserved. If the Buses follow the same Point-A to Point-B of the cyclists, then the questions arise: Is there a reason they do not take the bus? Is there a timing/frequency of service issue with the number of Buses on that route?

    Beyond the effort to make public transportation efficiency, the End-Point and Time-of-Day the cyclists arrives crossed with City crime activity records in the End-Point location could drive a discussion about a need for police and/or street-lighting. In addition, the ability to recognize Parks & Rec Centers at the End-Points could drive evidence in support greater funding.

    The metrics to capture ROC could be in the realm of sharing these correlations with City Agencies and helping the city spend its dollars in a more effective way. The number of policies put in place that leads to less traffic congestion, fewer/greater busses on a route, and/or a decrease in crime rate could serve as a tool for Companies to say “We care about our city”. While maintaining the company’s money (unlike past CRM efforts) these organizations can play a large Data Scientist role in building up a community.

  21. The time-stamped event data discussion remains me one of the most useful inventions in 21st century-Electronic Map. Google map has made a stunning launch in 2005 and it kept adding more features and capabilities after the launch, such as providing the shortest-path directions, flexible transportation mode selections and customizable route preferences, etc. Up until recently, Hurricane Sandy cut down the power supply to millions of homes, people are having trouble finding open gas stations in the states. In addition to the traditional way of using spreadsheet to locate open gas stations and update it daily, Google map is able to show on the map instantly the locations of the gas stations (and of course other facilities of your choice), the status indicating whether it’s open or not. Needless to say how much it improves our life. But if you think beyond its convenience of use, what made possible all those features. The answer would be the evolution of Data Science. We used to look back on historical data and try to find inference from it, then we added time component and started forward looking prediction, and then we continue to expand the dimensions of location. Start from recent years, with the high-speed Internet connections and popularly used smart phones, instant data are no longer the dream of data scientists.

    As far as the Urban Planning and Data Science topic, I think the instant data availability will open up new opportunities in Urban Planning. Urban Planning was something all set once being designed and completed. It’s reluctant to any changes, which makes sense in old days because any changes cost much. However, instant data may add some flexibility in it. Example would be to instantly monitor the traffic on roads, crossroads and toll stations and adjust the alternating speed for traffic lights, number of open toll booths. Opportunities are huge with instant time-stamped event data, we as data scientists should learn to make good inference from each single piece.

    1. I agree completely on your statement on the electronic map. Another example is years ago when Katrina hit. Overlays of satellite imagery were used to help coordinate volunteer and rescue efforts which was really key in places where roads were destroyed and such. At the time, this was a HUGE improvement over the things that could have been used in the past and a really big deal. (http://www.cs.cmu.edu/~globalconn/katrina.htmlhttp://googleblog.blogspot.com/2006/07/google-earth-and-katrina-help.html)

      But at the same time, I think further use of data for urban planning can lead to ethical and validity issues as others have pointed out. Hubway is a rental system, and while the data can probably be useful, it only shows the activity of bicycle renters. Generalizing renters to all bikers is a pretty dangerous assumption given that the bikes need to be borrowed and returned and people who bike frequently may choose to buy instead of rent. I’m sure this data can be used to help make Hubway’s operations more effective and help the company respond to user and potential users preferences and in it’s own marketing efforts, but to use it to influence broader scale political decisions seems risky because of the subset involved which, as Jennifer pointed out is extremely self-selected.

    2. Jianyu W · · Reply

      Yes I agree with the opinion that data science has been shifting the focus on
      historical data into a more forward looking orientated. The Hubway trip history
      data give us a big picture of how does the bike rental business look like in
      Manhattan. We can infer from the data that, when’s the peak hour, which is
      the busiest station, etc. However, the information is somehow static. It’s just a
      reflection of the truth in a certain period of time. With the rapid change of the
      business world, no business keeps the constant for a long time. There is seasonal
      fluctuation in tourism population, event related people shifts and some other
      factors that change the business daily or in a short period of time. We should
      now not only consider the trend from past data but also look forward and set up
      a plan for the future. Then it comes a true Urban “Planning”

  22. It is clear that data science plays a vital role in urban planning. By analyzing the Hubway trip history data, urban planners can grasp the key factors for further medication of their original ideas and plans. From this perspective, I believe that data science will also be the key in market research and analysis industries.

    When reading this article, I strongly felt that urban planning shared various features with market analysis. Market research and analysis need to predict future buying behavior or consumer insights with the same sort of data-oriented analytical path, which mostly depends on data science. And market analysts have to deal with tons of consumer behavior data sets everyday just like the urban planners described in this article. However, as the new generation of powerful data-analyzing skills, data science is not popular in market analysis industry so far, this situation needs to be changed. Data science will certainly add great values to traditional market research areas just as it did in urban planning regions. All we need to do is discarding traditional biased opinion to machine learning, and simplifying the process of conducting a specific data analysis process so that market analysts can acquire this skill quickly.

  23. Anonymous · · Reply

    2.) Data Science can definitely be used in the planning of mass transit. A simple example is the NYC subway. During peak hours, the MTA runs more trains and during other hours, the trains are more spaced out. This of course has limits on the system such as there can only be so many trains on a track during peak hours and a desire to only space out trains so much in the early hours of the morning. Not only is this used to optimize the train system to get the maximum amount of people on a train without being too crowded, but it can also help the average commuter. Apps such as embarkNYC help commuters to find out when the next train is coming. This is especially important early in the morning when trains can be spread out as much as 30 minutes. This helps to reduce the wait the commuter has. This is great for existing systems, but it is much harder for planning such as the Hubway in Boston. One overlooked fact is that pre Hubway, Boston developed biking areas heavily to the extent that they removed a lane of car traffic on the heavily used Beacon Street to make a bike lane. This had the effect forcing people to ride a bike, thus inflating the numbers of users.

    1. In addition to apps that simply report wait times, such as embarkNYC, more advanced services like HopStop have actually recommended alternatives routes while I am on the subway (assuming I manage to get some precious cell signal). I think this is a great use of optimization. I am able to get to my desired location and cause less congestions.

      In terms of biking as transportation, the addition of bike lanes has taken a while to reach positive consensus. However, many do think it is a good idea now, and I think that this will help in bringing more people to biking as a way to get to work. Both Boston and New York started their bike lane programs at around the same time, but Boston seems to have a more developed biking network and atmosphere. There are most likely multiple factors for explaining the differences, but I think it would be fair to consider the bike share program (or lack thereof) as a possible factor.

      http://www.nytimes.com/2012/08/22/nyregion/most-new-yorkers-say-bike-lanes-are-a-good-idea.html

      http://www.cityofboston.gov/bikes/statistics.asp

  24. I would organize the possible uses of the Hubway data into two groups. First I think that there are questions that can be answered from the data which can be used to increases in the operational efficiency and effectiveness of the program. Using this data from a systems perspective we can see the flow of bikes from one point to another on different days and predict the number of runs to transport bikes back and fourth and even out the location of docked bikes to minimize the time during which there are no bikes at a given port. As others have mentioned we can also cross-reference the bike numbers with repair records to better understand which routes are harder on bikes and more likely to result in breakage or flat tires needing repair. The second group of uses for the data is for planning expansions of bikeways. Though the data does not indicate the actual paths that the bikes traveled, it can be used to assemble spatial networks that shows end points and start points as nodes and the trips as edges. The edges that cut across areas where there are the fewest number of roads and paths safe for bikes can be identified as ideal candidates for expansion of bikeways.

    Something else that this discussion makes me think of is the opportunity that data scientist have to actually collect data. In my experience with this class and others and more generally in the data science space there is an emphasis on hacking on existing data. The result is that in things like Kaggle competitions and hackathons, there are lots of people working on one dataset when there is a huge opportunity to take a question-driven approach to data science rather than an I-have-this-one-data-set-in-front-of-me-what-are-the-questions-that-this-can-help-me-answer. There are many opportunities to easily design methods of collecting information through by using cheap webcams and vision software, learning what freedom of information acts to file, or using cheap RFID tags.

    This I think points to the importance of experimentation, which we have briefly touched on as a skill that’s part of data science. From what I gather from this class, it seems that experimentation skills either get lumped into domain expertise—know what to try to explore or test for and statistics- having a robust understanding of how to design statistically valid experimental procedures.

  25. This blog (and time & space data as used in cities in general) made me think of several things, a little disjointed at this point but many of which have been touched upon in previous posts. The main common thread is that in urban spaces resources are frequently limited and infrastructures strained such that wherever ameliorations are possible for people both as consumers and investors the situations are frequently win-win, and data science has its important part to play.
    Firstly, the topic at once reminded me of the article ‘The laws of the city’ that appeared in the economist this summer, the director of CASA sees a prospect for a ‘science of the city’ or systems science. Whilst focus is frequently on infrastructural improvements and organisational effectiveness, much of the aim is to improve the lives of the urban population versus provide a return on investment. The unprecedented concentration of populations (both labour and consumer markets) in large cities and, increasingly – with a wave of growth that is shifting the world’s economic balance east to the emerging world – in megacities, mean the allocation of (limited) resources increases in importance- be it people, housing (land), financial capital, natural resources. Posing challenges to societies and businesses, solutions may be found in the data generated by these centres of economic and demographic gravity. http://www.economist.com/node/21557313
    On a different note, the Hubway data is similar to the Barclays bikes data available in London. This, in combination with data available form Oyster cards – the British electronic equivalent of the NY metrocard that stores all journeys and times electronically, as well as personal information – have recently provided much more information on how Londoners and tourists travel around the capital.
    Lastly, in the context of recent discussions on who is empowered by data science and of hurricane Sandy last week, this post and the value of time and space data made me think of crisis mapping – an emerging field that brings disparate bodies, disciplines and backgrounds together around a community-led endeavour – and another example of an attempt at ROC. See this paper for more details on crisis mapping: http://www.tandfonline.com/doi/pdf/10.1080/15420353.2012.662471

  26. What can or should urban planners do with this type of data? What other data sets would be useful for urban planners? Could any of these be used to maximize ROC (return on community)? I believe urban planners can take advantage of this type of data through the whole policy development process for improving communities. According to Association of College & University Policy Administrators (ACUPA), there are 3 steps in the process: Predevelopment, Development, and Maintenance. When it comes to Predevelopment, the priority is to first identify issues and then conduct analysis. For example, in the Hubway case, the dataset can be used to identify existing conditions such as which stations have the most traffic in different time periods. This allows urban planners to perform Predevelopment more specifically for each location. Rather than addressing the system as a whole without specific data, they are now able to see nuances in usage. Following Predevelopment, and before implementation of Development, policy decisions are made using the data.

    Development and Maintenance include agreeing on common definitions and terms, obtaining approval of involved parties, planning communication and education, and most significantly given our data-driven society, putting accessible relevant data online, developing plans of active maintenance and review, encouraging users to provide feedback, and measuring outcomes by monitoring or testing. Data regarding community development can be critical in avoiding misunderstandings about scope, timing, responsibilities and ownership when communicating with the public. In this respect, it is vital to make online data assessable, interactive and real-time in public. For instance, if an urban planner is creating a pedestrian zone, he or she can share data online regarding decision making and at the same time, receive interactive feedback from a variety of social networks in a regular basis. This encourages him or her to interact with the public concerning the current situation and improve the policy. To be specific, having an easy and visible way to invite feedback will assist in the maintenance process. In other words, user involvement will help communicate the message to users that their help is welcomed and that they have an opportunity to offer suggestions for improvement. Thus, I believe having policy data online is the most effective way to make data available. Lastly, urban planners should consider developing a measure to quantify the usefulness of the policies, such as the number of hits on the web site or logging phone calls on questions or suggestions for improvement.

    All in all, I think, data have a significant influence on making results of the whole policy development process for improving communities useful to apply to implement another Predevelopment process, leading to desirable ecosystem of better policy development process and ultimately maximizing ROC (return on community).

  27. As a data scientist, we should always ask ourselves “What kinds of important problems could be solved by analyzing or visualizing this data set?”. Keeping data, visualizing and analyzing data shouldn’t just because it’s cool.
    In this case, I think we should keep track of the bike route, rent/lend duration, ID, start and end point, reason for using bike. By analyzing these data, we can solve the problem of how the bikes could be distributed so that they can be used more efficiently. Such as encouraging people to borrow from nearby station where more bikes are returned to.

  28. I think the challenge to shape our cities had been taken a long time back by engineers. However, using time stamped data to analyze the components that would that would go into the making of an ideal city is an exciting challenge. An important question to be asked in the whole exercise would be should the expected R.O.C be adjusted according to the neighborhood distribution of the community. That is, should it be fair to expect similar standards of R.O.C for regions with varying levels of infrastructural capabilities.

    With respect to the issue of data science in this regard, it is worth remembering that the Institute of Data Science established at Columbia University has one of its focus areas on designing smart cities. This just goes on to show that big data is catching up with the academia in urban planning and engineering too. Hubway is an excellent example of how data could be used to make an optimal plan for the cities. However, in my opinion , the issue of R.O.C goes beyond just transportation optimization problems. As a metric for R.O.C we should also assess other parameters, for example how much a community is contributing towards minimizing pollution, preventing green house emissions etc.

  29. Yige Wang (yw2511) · · Reply

    Several thoughts were induced after reading the above post of different topics.

    First of all, it is an interesting idea of how we could take advantage of the power of data science to perfect urban planning, although it seems that we are not exactly sure how we could put this thought into action. However, before answering the question “what data you could collect to measure return on community”, I think it is crucial to answer, “what is R.O.C”? Does it refer to the development/ improvement of the hardware facilities of a community, or does it refer to the overall well being of the population that resides within the community? And obviously, what type of data should we collect largely depends on this definition of R.O.C. If we stick with the latter definition, data such as the area of planting, the access to various utility facilities (stores, gym, parking lots), noise pollution, individual’s rating of the community should all be taken into consideration.

    In terms of data visualization of urban planning, I think it could be done in a more vivid way. This reminds me of what Prof. Mark Hansen showed to us in his lecture – using the vegetation of trees in different areas to visualize the urban/environmental condition.

    Secondly, some thought about the Hubway trip historical data. I think we could use the “duration” metric to distinguish user groups- who are the heavy bicycle riders, the medium and the light riders. Based on that, it might be possible to find a way to turn light riders into medium riders, and medium into heavy riders? To achieve the latter goal, more data needs to be collected, such as people’s preference and perception of the bicycle vs. other vehicles.

    Furthermore, besides using “start station” and “end station ” to see which stations make up of the most popular origin/destination pairs, we can also conduct a deeper analysis of the specific features of the sightseeings along the trip between those popular pairs. Therefore, we might know that what particular type of tour would travelers have a higher demand/desire of travelling on bikes, and thus developing more corresponding routes.

  30. I think one area in which data science could be utilized is when analyzing the impact of the built environment on health outcomes. For instance, neighborhoods and even regions within neighborhoods can differ in how “walkable” they are. Some of the attributes of a neighborhood which can impact this are access to public transit, the presence of sidewalks, population density, or the land use of surrounding buildings. Neighborhoods which are more walkable have been found to have lower rates of obesity and other positive health effects. Some studies have found that even more micro-level attributes may impact the health of a neighborhood’s inhabitants. Some examples are the cleanliness of sidewalks, the presence of sidewalk cafes, and the speed limit.

    When urban settings are constructed or modified, measures like these should be taken into account and optimized. If a new public transit station is to be built, in addition to its impact on traffic flow, its impact on public health should be considered as well. The same could be said when adjusting speed limits. When issuing permits for outdoor cafes, a look at the numbers may suggest that adding an outdoor cafe to a certain street may not just be a good business move, but could also get people in the area to walk more often. Thousands of similar decisions both big and small could use the input of data scientists to help estimate the impact they could have on health.

  31. Though I want to leave aside the ridiculous title altogether, it’s indicative of and consistent with the issue with the Downtown Project’s core stated principle. Just as the article has nothing to do with Vegas being like Brooklyn, the Return on Community (R.O.C.) has little to do with community. Both cases are marketing tools. Hsieh’s supposed rejection of R.O.I. is an attempt to improve his R.O.I. Strengthening a community does not include allowing an existing community to exist for one more year, then replacing it. But, this could go on ad nauseum and is a digression from what is of interest in this course.

    Pratt’s piece does note the importance of “unplanned interactions”, consistent with the literature on this subject, including in urban planning and other sociological research. How to monitor unplanned interactions? Pedestrian traffic could be monitored similar to vehicular traffic, which use cables laid across the road. Perhaps piezoelectric sensors could be embedded in sidewalks to measure vibrations and map temporal and spatial patterns of pedestrians.

    As for developing community infrastructure: Hsieh’s Post-It note practice made me think of ways to identify what a community wants and where it wants it. Perhaps information on Yelp-type searches for businesses (actual search item and position of user at the time of the search) would be useful in identifying what people are looking for. Even without personal information, inhabitants of the community could be separated from visitors by identifying those who perform regular searches from somewhat stationary positions in residential buildings.

    A sense of community is likely to come from the expressions of the inhabitants itself. It seems to me this can’t be understood without surveying and more active monitoring means.

  32. Luyao Zhao · · Reply

    To improve ROC in the case of Hubway, in my opinion, is to try to make sure that every one who needs a bike at anytime can have access to a bike, and try to enhance the utilization rate of bikes (i.e. reduce the waste of idle bikes) at the same time. Therefore, urban planners need the data to decide the optimal total amount of bikes and optimize the allocation of bikes to different stations.

    I think it is important to track the check-in time, check-out time, peak time and amounts of bikes around peak time at those busy Hubway stations. In some crowded cities in China, we also have similar bike systems, and I found out that a large group of people who use these bikes are those who took the subway to go to work everyday. For example, Jennifer lives in an apartment near Station A (see below) and she doesn’t own a car. So everyday she takes the subway from Station A to Station B and then rides a bike to go to her office. After work, she rides the bike back to Station B, returns the bike and takes the subway home. Therefore, if there is a CBD (or there are a lot of office buildings) in the riding-distance of a subway station (marked as the orange circle below), the demand for bikes there is usually high. Urban planners can track the amounts of bikes in these busy stations as well as the amounts of bikes in the stations that are not busy. In this way we can optimize the amount of bikes in each station to ensure everyone who needs bike has access to a bike and enhance the utilization rate of bikes, thus improving the ROC.

    Another large group of Hubway bike users are those who live near a subway station and use the bike to go to some places not far away. For example, Matt lives near Station B (see below). He usually borrows a bike from Station B to ride to his office as Jennifer does. Besides, he also uses the bike when he wants to go to a nearby supermarket, a gym, a post office etc. Therefore, if there are a lot of people living in one area and there are many living facilities (like grocery stores, gym, supermarket, post office) in the riding-distance of a nearby subway station (marked as the orange circle below), this area should also be equipped with more bikes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 365 other followers

%d bloggers like this: