The CEO of RealDirect.com, Doug Perlson, visited on Wednesday so you could ask him questions to help better inform your (hypothetical) data strategy for RealDirect in your (hypothetical) capacity as Chief Data Scientist for HW #1, question 2. I really appreciate (non-hypothetical) Doug taking his time to come talk to us! Here are questions you asked me, followed by some I am asking you:
Were the types of questions we (the students) asked, the level you(Rachel) were expecting? Were they technical enough?
To begin with, I was pleased that you were engaged, asking questions, demonstrating that you had given thought to the problem. Given Doug is the CEO/Founder, and not a data scientist, the level of conversation was where I expected it would be, meaning he is concerned with the big picture and strategy, and not the details of how to implement a specific algorithm, nor writing up the code to log user interactions with the site, for example. Talking to a CEO/Founder of a start-up is a good way to get the overall vision. They’re almost universally enthusiastic about their business because it’s “their baby” and something they have been building from the ground up. It remains to be seen whether the information you gathered during this stage was technical enough for you to propose a detailed data strategy, but you also are forming your own opinion in other ways (exploring the site, proposing logging, exploring the NYC Housing Data). Part of the purpose of talking to the CEO is to gain intuition about the product. In practice, if you actually were the data scientist, this would be an iterative process and you’d also be talking to engineers. (Sometimes the CEO is an engineer, which can change things.)
Who are the other speakers coming?
The other speakers coming are in the syllabus. We’ll give you more info about each of them before they arrive. They are all data scientists, and so the conversations you have with them will be at a technical level involving implementation details and best practices. Occasionally we may have other visitors, depending on how things unfold.
Please reflect on the following (in your head or as a discussion below):
— Being the “data scientist” often involves speaking to people who aren’t also data scientists so it would be ideal to have a set of communication strategies for getting to the information you need about the data. Can you think of any?
— Most of you are not “domain experts” in real estate or online businesses. Does stepping out of your comfort zone and figuring out how you would go about “collecting data” in a different setting give you insight into how you do it in your own field?
— Sometimes “domain experts” have their own set of vocabulary. Did Doug use vocabulary specific to his domain that you didn’t understand? (“comps”, “open houses”, “CPC”). Sometimes if you don’t understand vocabulary that an expert is using, it can prevent you from understanding the problem. It’s good to get in the habit of asking questions because eventually you will get to something you do understand. This involves persistence and is a habit to cultivate.
— Doug mentioned they didn’t necessarily have a data strategy. There is no industry standard for creating one. As you work through this assignment, think about whether there are a set of “best practices” you would recommend with respect to developing a “data strategy” for an online business, or in your own domain.
I think everyone from the class has something to offer on these so I invite you all to answer. Hoping to hear perspective of business or journalism students. (Don’t journalists have to figure out how to ask the right questions to get to the bottom of things? Don’t business people like to think about strategies?)
From my perspective, getting the information you would need to form a data strategy is basically the same process you would go through when designing an information system. There’s been a bit of research in requirements gathering (http://en.wikipedia.org/wiki/Requirements_elicitation) for software and the best practices include things like use cases and wireframes to show the client what you think they want and work out any issues in communication before actually developing the system which I think could translate well to this.
One thing I find very helpful as a market research analyst when am working with data from an outside source and don’t understand something is making a point to ask more general open-ended questions (‘how do you get this data?’ rather than ‘is this a real time feed?’). It tends to result in more useful answers and helps reduce confusion. (ie the client and I having different definitions of ‘real time’)
We also normally summarize client data we were sent for them. (Basically eda- charts, tables of activity by year- I think it’s important to note that this is eda is to help someone else understand the data and needs to be client-friendly) We ask them to look at it and make sure it makes sense to them before proceeding with any model building. This helps catches things like missing data, data processing errors, etc as well as drawing out comments from clients like ‘that thing that looks weird in the data is correct- but it’s because of this thing you should find a way to control for’ which can become really useful later on.