Site icon R-bloggers

Data Products, Dashboards, and Rapid Prototyping (Transcript)

[This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is a link to the podcast.

Introducing Tanya Cashorali

Hugo: Hi there, Tanya and welcome to Data Framed.

Tanya: Thank you, Hugo. Glad to be here.

Hugo: It’s great to have you on the show and I’m really excited to talk about your work in data consulting, data products, and especially this idea of rapid prototyping that I know you’re a huge proponent of. But before all of that, I’d like to find out a bit about you. So maybe you could start by telling us what you’re known for in the data community?

Tanya: Sure. What I’m known for in the data community. It’s definitely a loaded question and it’s one of those things where … Have you ever done, what do people think you do and what do you actually do at work?

Hugo: Exactly.

Tanya: I think it’s one of those. So I think some people know I was a pretty early R user in the community. I got a little bit lucky in that I was an undergrad student at Northeastern and I worked on a co-op which is when you work six months and then you go to school the other six months. And the PI at the lab I was working at said, "You’re gonna learn this thing called R." And this was back in 2005 or ‘6. And I had no clue what it was and there was no Rstudio, there was no DataCamp which makes it incredibly easy now to learn it. But I printed out the entire CRAN documentation of just like, I don’t know 250 pages at the time or something and brought it home over my summer break and just read it and thought, "I’m in deep trouble."

Tanya: But sure enough, it worked out. I had some really cool mentors that were just super smart and I got to publish a paper on some of the work I did learning R, just basically diving in headfirst, blind, and it turned out to be a good thing to learn.

What do you do?

Hugo: Great. And what do you do now?

Tanya: Yeah so the other thing is that I started a consulting firm back in 2015 after working at, mostly startups. I worked at one large company, Biogen. And I think both were very different and fun, wild, experiences but I always wanted to do my own thing and so after seeing so many different use cases and vendor pitches and just what can be done out there and how much opportunity there is, it just became a no-brainer as I was consulting on the side it became too much to have a full-time job and consult so I just made the leap and here we are three years later.

Hugo: Fantastic. And so you’ve given us a bit of insight in there of how you actually got into data science but maybe you can tell us a bit more about your history.

Tanya: Sure. I always knew I liked computers and technology. I was a huge sports nerd. I basically played every sport under the sun. So between basketball and video games, computer games and somehow stumbling into Q basic in sixth grade, I was definitely on my way to knowing I wanted to pursue doing something with technology, something with computers.

Tanya: So ended up at Northeastern University and dual majored in computer science and I got interested in biology actually which is why I ended up in bioinformatics and biotech ultimately when I graduated. But bioinformatics was really not even well known back in 2005. There were no degree programs like there are now for it, just like there weren’t no data science programs and now there are.

Hugo: Exactly. And I was gonna say, the greater Boston area is a really great place for biotech as well, right?

Tanya: Oh, the best. Probably the best in the world. I mean between all of the universities, the hospitals, there’s teaching hospitals. Yeah I got really fortunate to be able to work at children’s hospital, for the Harvard-MIT division, for Dana Farber Cancer Institute. It’s one of the best in the world and that was really great.

Tanya: So networking early on and meeting a lot of just incredibly talented people was definitely a key factor to my decision to go to Boston for school.

Hugo: Great. And now you really think about business questions outside the realm of healthcare and bioinformatics and that type of stuff, right?

Tanya: Yeah. So we still have a large focus in healthcare. It was kind of funny. I sort of full circle came back to healthcare after exploring some other industries. But we do, yeah. We work in healthcare, life sciences, but also sports, some retail and consumer package goods, telecom. So really every industry now is generating data so it … For me it expands multiple verticals.

What industries do you see data science having the largest impact on?

Hugo: And which verticals or industries do you see data science having the most serious impact on currently? And through the lens of your experience as a consultant.

Tanya: Yeah. I mean any of the industries that are generating data as a result of just day-to-day business operations are always good ones and just ripe for opportunity. Because not only can you optimize their internal benchmarks, things like how can we save time on this operation we’re doing? Or how can we optimize what we’re selling based on targeting audiences? And I think any time you have transactional data which is obviously very large in quantity and longitudinal and typically pretty structured, those are also really great opportunities.

Tanya: But healthcare in general just has so much different data across the spectrum to improve outcomes for so many people and patients across the world that that’s where my passion lies in terms of which vertical I enjoy the most.

Hugo: And what type of questions are you interested in answering, or are most relevant or is data science kind of most well equipped to answer in life sciences and healthcare and that type of stuff?

Tanya: Yeah. I think it really always come back to the data. Is it … I started out analyzing gene expression data and genetic data. And back when I was doing it, the data was still noisy. There was a lot of noise to signal where we want to obviously have more signal to noise. So we actually started also looking at electronic medical record data and claims data which is cleaner. I mean it’s still messy but it was much easier to have an impact sooner. So when you go on the research side with the genetics and genomics, there’s just so much that has to get done even if you make a finding, you publish, "Hey we have a potential new biomarker." But then what? You have to go through clinical trial and we know that those take up to 12 years and hundreds of millions of dollars.

Tanya: Whereas, if we can say, "With these patients, we know that if you intervene this way, a phone call versus a text. This person’s gonna respond to a text and take their medication when they should." And that is something that is not expensive and is something we can do immediately to have a positive effect on patient outcomes. So it’s two very different worlds and two very different sort of questions that we’re trying to answer, at least in healthcare.

Hugo: Great. And I’m sure you get data in a whole variety of different forms. So is this a challenge thinking about the amount and heterogeneity of the data and how to even convey insight from that to non-technical stakeholders? Tanya: Yeah. I think people still don’t understand the messiness of data and how much time gets spent on cleaning it and standardizing it and joining it and what can go wrong in that process. You make one wrong join and you’ve completely inflated the number of sales or you accidentally find and replace the wrong pattern. It’s just there’s so many different things that can go wrong in that process that I think conveying that is still very tricky and it’s … The statistic is that it’s 80% or so of any data analysis project is just getting the data ready for analysis. So I think the more we can get people hands on and dirty with the data themselves, the more they’ll start to understand.

Tanya: And I think everyone at any level, C-level, entry level, should be looking and diving into data the same way that you were expected to start using email 20 years ago.

Hugo: Absolutely. And I think one example that I’ve heard from several people who work in analytics in health is … The great example is, if you have a patient record or something like that, having a doctor’s scribbles in the margins, something hand-written. And we all know how horrible doctor’s handwriting can be as well so …

Tanya: Yes. Yeah, I mean the other problem is, we’ve created all these forms that are supposed to go in and fill out. You know like Epic has a massive software solution that doctors can go in and check the boxes, if they’re a smoker or not and check how many packs. But typically because they’re trying to save time and they’re seeing a lot of patients, they go to the bottom comment box and just write it all in free text. And they type it at least so we don’t have to read their handwriting but now you’ve got doctors saying things in different ways and it turned into natural language processing which becomes significantly harder than obviously just having that structured form of data.

Hugo: So you mentioned in passing there are other sectors like retail, consumer goods, telecommunications, that maybe aren’t quite as mature in how they think about data science but there are steps being made there. So maybe you can tell us a bit about that.

Tanya: Yeah. A couple interesting problems that we worked on was actually … This was a while ago. One of these I’m interested in it because it was I think early on for this type of work. They were conducting surveys. They were a famous gum manufacturer. And they had user surveys that queried people about the gum. How do you like the taste? How long does it last? The flavor. And we were using that data to try to predict new product launch successes. So using old product surveys that were really successful at launch, we then tried to predict. They gave, I think thousands, a couple thousand people different types of gum to try that they were gonna then bring to the market. That was an interesting one. I don’t think we ever got to see whether our predictions panned out or not which is one of the Holy Grail problems in some of these fields. But that was one.

Tanya: And another more recent one was with a massive consumer goods supplier who sells products online and they just have every single thing documented on what’s happening on their website. If a user clicks a page, if they click a link, if they add something to their cart which most eCommerce sites are now collecting that data and storing it. But they’re all … So many of them are at this place where they don’t know what to do with it next. So we came in to help them try to optimize product placement on their site. Figure out when people were leaving and abandoning their cart and try to make changes to resolve those issues, essentially and sell more products.

Hugo: Yeah. And of course that’s one of the reasons what has now become modern data science, a lot of the techniques did emerging tech, right? Because we were able to … Once you have the data foundation set up as a tech company, to actually get all the data and start to work with it.

Tanya: Yeah, I think companies have been collecting data for a while now and they’ve caught on. Some are just starting now but we’re definitely at a tipping point where now they need to figure out what to do with it.

Hugo: Yeah, absolutely.

Hugo: I want to just pop back to this gum example. I find it very intriguing because I’m a gum guy as well and I’ve got a lot of questions around gum which we probably shouldn’t delve into too much. I was wondering, you said you may not have found out whether the predictions had panned out, but were there actionables from the work that you’d done that the business could take from the insight that you got out of the data?

Tanya: Yeah. I believe they took several. We ranked them essentially and I think they took the top three or so and did a more focused study on those. So you’re always trying to figure out where do we spend our money? Where do we spend our marketing budget? And so on. And so they took those and conducted a more focused study on those products and then decided from there which ones they were gonna try to mass produce.

Tanya: So definitely action items. They took it and ran with it and who knows? I may have tried the gum when it came out.

Hugo: So I do want to say two things about gum. Firstly, I feel like gum flavor has decreased in duration as I’ve gotten older but it may actually just be my olfactory system being less sensitive.

Hugo: Secondly, I was in Schiphol airport. I’ve got no idea how to pronounce it. The airport in Amsterdam a while ago. Tried to buy gum. They don’t sell gum in the airport at all. And I asked, "Why?" And they said, "It’s because people will put it under chairs and on the ground." So they just stopped selling it. And I said, "But people can bring it in." And they said, "Yeah, but we’re minimizing it in a way we can." And I was like, "Okay, deli dude. That’s fine."

Tanya: I mean I kind of respect that.

Hugo: Yeah, no. It’s incredible. And he was so upfront about it as well. It’s hyper rational and logical. We don’t only want to go from data to insight all the time. A lot of the time we want to make a decision based around that insight, right?

Tanya: Yes. And that’s what you always want to be doing. I mean there’s some cases where insight may be valuable depending on what it is. But ultimately, I want to drive people to make decisions that are measurable so we can determine, yes there was a success. Or no, there wasn’t.

Hugo: And is the role of the data scientist, and this maybe a provocative or ill-formed question. Is the role of the data scientist to make that decision or provide insight into making that decision, or to just provide results from the data?

Tanya: It’ll depend on the seniority level of the person. How much they know about the domain or the company but I think they’re in a position to, sure. Because they probably know the data better than anyone in the company. What management does with it or if you’re a corporate-founder or chief data officer, you can probably make those decisions yourself. But I think ultimately we’re building … We’re not replacing humans and our natural instincts but we’re building decision recommendation engines in a way. So we’re trying to guide and optimize the decision making process rather than put your finger in the air and just take a guess.

Hugo: Awesome, I love it. And something you mentioned earlier was that people who aren’t necessarily data scientists will more and more hopefully be able to get into the weeds with data whether it be data exploration, or basic statistical modeling, or cleaning data and understanding how messy data can be. And how do you see that with people you do consulting work for? I mean people at C-level or whatever it may be. Would you like to see these types of people become more data literate in the next five to ten years?

Tanya: Yeah, for sure. It does everything from alleviate whether it’s unreasonable expectations and how long something actually takes to get done. It helps them really position the value prop from the analysis better, in a smarter way. It helps them realize what’s actually possible. So oftentimes, as we know, there can be empty promises made. Maybe by a salesperson and you absolutely cannot do what they said because either you don’t have the data or the data is just too … There’s not enough of it or it’s too messy. So absolutely, I mean it will help in a number of ways.

Dashboards

Hugo: I’d be right in saying you’re a huge fan of dashboards, right?

Tanya: Yes. Yeah.

Hugo: So maybe you can give us your take on dashboards and data products in general. Tell us what they are to you and then let us know, kind of what their role is in your work and data science for business in general.

Tanya: Sure. So a dashboard is really just a great way for people to consume and interact with data quickly. And an ideal dashboard in my opinion, and I think also maybe Edward Tufte’s, is that you should be able to look at something, a visualization or a dashboard and take away insight or an actionable item within 10 seconds or less. And if you didn’t accomplish that, you need to go back to the drawing board, redesign it, make it simpler. Because otherwise, you’re basically better off just looking at a big Excel sheet of numbers and try to weed through it and figure out what your answers are. So dashboards are great for that, for distilling a bunch of different data down into something consumable.

Tanya: The reason I like them for data products … And what is a data product? It’s either something that your company sells, it could be your entire product. I’ve worked at companies that were just data driven product companies. So as an example, I worked for a Telecom company that had this proprietary data on customers switching. So I don’t know what your providers are in Australia but here we have T-Mobile, Sprint, Verizon. You could imagine that if you switched from T-Mobile to Verizon, which people do because nowadays it’s a poaching game, right? Because everyone has a cell phone. We had all those switches in the country. The date it happened, the phone number, and they were about 50 thousand of those happening a day. And obviously you can imagine, a CMO at Sprint would love to know where customers that are almost out of contract, when contracts existed, high value customers that are coming off contract for maybe Verizon, are densely located in Southern California. So now Sprint says, "We’re gonna run a marketing campaign there."

Tanya: So we built a product around that. It was put in the hands of these providers as a competitive insights tool. And that was essentially our entire business model. But the data was our product. It had to be correct. It had to be clean. It had to scale. The product needed to be user friendly and completely usable. And I think that’s a huge use case at a lot of companies now, especially startups that are building essentially their profits from data itself.

Hugo: For sure. And in that case, and maybe in general, these data products tend to abstract over the data in order to show the insights as immediately as possible?

Tanya: Yeah, exactly.

Interactivity in Data Products

Hugo: And what’s the role of interactivity in data products, do you think?

Tanya: Yeah. Especially in rapid prototyping which I think we’ll talk about in a little bit, it’s really a way to make something real and tangible to someone. So you come up with an idea, you go through a whole bunch of requirements planning, and then you get the data and you build something and you realize, "Oh, crap. I didn’t even think of this." Because of the complexity of data and how many unexpected things can come up, you’re never just gonna be able to document and know ahead of time what’s gonna be in there. So we build something quickly to allow the user to interact with it and get an idea for how this might work, put it in the hands of a customer, get their feedback before spending a ton of time and money on building a fully fledged product.

Hugo: This is a really nice segue into this idea of rapid prototyping. And I’d love to hear more about your approach to the trade-off between rapid prototyping of data products and dashboards and building fully mature data products.

Tanya: Sure. Yeah. So there’s the old sort of waterfall technique they call it where you spend six months building out your first phase of the product. I always liked the analogy that you could end up building a Ford truck when your client really wanted just a Prius.

Hugo: I love it.

Tanya: So you have to essentially not over complicate things. You want to keep it simple. You want to ship quickly and iterate and get that feedback from the customer. Because I’ve been in situations where a client or a company has over engineered something and made it so complicated the user was overwhelmed. It just sat on the shelf and collected dust and was never used because we never tested the market. That’s what you’re doing with data product. You need to test your market before you go get funding and build a company. You should be doing that for data products as well.

Hugo: Yeah and you want to demonstrate at least potential value ASAP so that everyone’s in. Or at least some key stakeholders are in, right?

Tanya: Yeah. And it seems like common sense but you’d be amazed. I still run into this where it’s just over-engineered and not what the customer actually wanted. A lot of people also like to think that maybe they know their domain and their clients better than they do, but at the end of the day, sometimes they just don’t. And there could be something unforeseen and sometimes a client doesn’t even know what they want. So let’s put something in front of them, get them to react to it, and then start to define the requirements and iterate that way. And that’s an agile prototyping method that I swear by. And it’s been successful.

Tanya: There are drawbacks and one of them being, let’s say you put something in front of a client quickly, you don’t always have time to fully QA it. You don’t always have time to catch every single edge case. And people sometimes misconstrue that as, "Well this is completely wrong. It’s a broken product and we don’t trust this data anymore." But there needs to be, I think, a shift in thinking.

Tanya: You’re always building. There’s never final versions. Everything is a draft because otherwise, you spend two years developing something and it either never gets out the door or you built the wrong thing.

Hugo: For sure. And it sounds like essentially what we need is this type of rapid prototyping but hand in hand with some serious management of expectations around it as well.

Tanya: For sure. Managing expectations and also getting the client highly involved in the QA process if possible. So I always try to ask for a helping hand whether they have a junior analyst or anyone that can come in and get some extra eyes on what we’re building just because sometimes we don’t have a dedicated QA team on stuff. Sometimes we need the extra help and they know their data better than us. So we try to bake that into our projects now, is we want you guys to look at it. Make sure it doesn’t look crazy because you know your data better than we do at this point.

Hugo: Absolutely. Do any illustrative examples spring to mind?

Tanya: Of catching wrong data?

Hugo: Yeah, or no just of the power of rapid prototyping before building out fully mature products?

Tanya: Yeah, for sure. So the telecom example that I talked about. We had an idea and it was called Voice of the Customer. So we knew that people were switching from, say Verizon to T-Mobile in vast quantities. But we didn’t know the why. So we wanted to build some sort of social media monitoring tool, start monitoring of events that were happening like iPhone releases. And it was just me. Well it was a 12 person company but it was just me in charge of this product. And I had just hired, I think two fresh grads out of college who were just learning R and everything about data. And so we got together and we literally just went to Twitter and started searching to see if people were talking about switching. Turns out they were. People love to go to Twitter to complain about things.

Tanya: So you’d see a lot of things like, "I’m switching to T-Mobile because Verizon service sucks." And from that tweet, there’s so much information there, right? You’d have, they’re talking about the reason they’re leaving Verizon. They’re talking about who they’re switching to. And so we started to quantify that using just basic language processing. And built a prototype. The first one was just in Ruby on Rails at the time. And then we had actually built some R Shiny apps as well. So we were doing a bunch of different things in parallel. Testing out different designs and ideas. And when we realized, "Hey, there’s a viable product here." Because we put it in front of a client. They were very interested.

Tanya: And then we decided, "Well let’s go to …" it was called Gnip at the time. They were the Twitter Fire Hose provider. I believe they got acquired by Twitter. And we decided to purchase data in bulk. So initially we were just scraping whatever tweets we could get which is a very small percentage of the Twitter Fire Hose. But there’s no reason to go and buy the Twitter Fire Hose before, it just wouldn’t make sense. You need to make sure that the data is there and that it’s viable and then you cast your net wider.

Hugo: Yeah and it’s what they want as well, whatever the management is that you’re dealing with and working with.

Tanya: Yeah, yeah. Exactly.

Learning R

Hugo: Incredible. So I love that you’ve mentioned R and Shiny. Now we’re not gonna get too much into the programming language R. When I say R, because of my accent, it doesn’t sound like an R. So it sounds like what the doctor –

Tanya: It sounds like you’re at the doctor, yeah.

Hugo: Yeah, exactly. And Shiny which, for our listeners out there, Shiny is a wonderful technology for rapidly prototyping dashboards. We’ve got great courses on Shiny at DataCamp. Rstudio has a lot of great resources. But my question for you, Tanya, is these technologies have and are evolving so quickly. So I’m wondering how your ability to actually do the work you do has evolved as a function of the tech and open source software development since 2005 when you first learned R by reading the entire CRAN documentation.

Tanya: Ah, no way. I didn’t read the whole thing. I’m sure. But yeah, I was crying myself to sleep, no. It’s so crazy how it’s changed.

Tanya: But yeah, I constantly feel like I can’t keep up, right? And that goes back to probably good old imposter syndrome. But if you don’t have a little bit of imposter syndrome I think that you’re probably doing something wrong because if you’re not just aware of how much there is out there to know. Because I know what I don’t know. And there’s a lot that I don’t know. But I try to … What’s nice is having clients kind of dictate what I need to know and learn.

Tanya: So if I’m gonna need a columnar data base because I know this client’s gonna have a ton of data, it needs to be HIPAA compliant maybe for healthcare. Then I brush up on my AWS and redshift.

Tanya: If my client is gonna be doing large batch processing jobs, maybe we look at Spark or Hadoop. Spark is definitely outpacing Hadoop now but luckily, R has built all … There’s been all of these packages that really awesome people have made to make my life easier, and so I always thought for example, I wanted to learn D3. Because I saw those really sexy visualizations in The New York Times. And I thought, "Crap. Now I have to learn D3 and JavaScript and all these other things." But then Ramnath came along and built rCharts and I could basically build D3 charts just knowing R. And similarly with Shiny. I can build websites now with web applications just using R.

Tanya: So I’ve been fortunate and I think the R community is a big reason why I’m able to be successful in my current consultancy.

Hugo: Great. So there’s literally stuff that you do now that’s part of your daily bread and butter that you wouldn’t have been able to do pre these tech being developed?

Tanya: Absolutely, yeah.

Data Scientists Hiring Process

Hugo: So I also know that something you’re very interested in and passionate about is kind of thinking about the data scientists hiring process.

Tanya: Yes.

Hugo: And I hesitate to use the word opinionated but I think you have very good, strong opinions on it. So maybe you could tell us a bit about how you feel about all this.

Tanya: I do. It’s a pretty broken process. I’ve been through some interviews myself that I thought were just horrendous experiences and I’ve been through solid, pretty good interviews. And I mean the interview process is the first experience that someone has with your company. You want it to be a good experience. And so, I hate white boarding. I call it the … It’s the waterboarding, essentially, of interviewing.

Hugo: Never heard that before. That’s incredible.

Tanya: It puts people in this just weird state of trying to almost dehumanize them and some kind of situation that would never happen. It’s like you’d be hacking in the movie Swordfish with a gun to your head or something.

Tanya: But yeah. I gave a talk on this at Strata and I have formed a lot of ideas around it and honestly though, the reason I started getting interested in it was because I was working at startups where I was in charge of hiring a lot of people and interviewing a lot of people. And it’s just such a time sink for everybody involved.

Tanya: So I wanted to just streamline it for myself, honestly, and for the candidates as well. So I came up with a small test and people have opinions about this too. Like you shouldn’t expect them to spend their time doing something, however this is maybe to a couple hours, two to four hours. You should never expect more than four hours from anyone. And they take it on their own time. We give them some hints in what they should … Some questions they should answer. There’s really no right answer, it’s just you evaluate their thinking process and how they document their code and everything. And they come in and present.

Tanya: And it might be an hour to your stakeholders. You put them in front of maybe your business stakeholders and see how they convey technical concepts or if you want them to be very technical-focused people and client facing or facing the engineers or statisticians in your business and have them present to them.

Tanya: So it’s still better than an all day gauntlet which I like to call it. The eight hours of just going through and meeting everyone in the company. But those are kind of the big no-nos is I think the white boarding and just having someone spend an entire day at your office.

Hugo: Absolutely. And we’ll put the link to your Strata talk which is online in the show notes as well.

Tanya: Oh cool. Okay.

Data Science Skills Gap

Hugo: And something I think definitely coupled with this line of questioning is … You know something you’re also speaking to is you don’t want someone who can do everything necessarily. We’re not looking for the data science unicorn. And I’m wondering what your thoughts are on the data science skills gap, essentially, and how as educators we can approach arming future data scientists with the skills that they’ll need.

Tanya: Sure, yeah. There’s a huge skills gap but it’s not as unattainable as some people think. There’s a lot of hype around machine learning and now AI and advanced statistical modeling but at least in my experience, that’s really only about maybe 20% of the projects I’ve worked on. Probably also 20% of the companies I’ve worked in.

Tanya: There’s a ton of need to just be able to take the step from Excel to basic data munging in R & Python. And I’ve taught courses to non-coders, non-technical people on the basics of R. We’ve had great success with it. People enjoy it and they feel empowered to now go and do something a little bit more sophisticated than what they were previously able to do in Excel. And there’s a lot of need for that. Just literally taking different sheets of data, joining them up, doing some basic QC, looking for missing data and be able to just do basic summary statistics. Like aggregating and finally maybe putting it in a Tableau Dashboard. And we’ve taught courses that are part-time over eight weeks that get you to that point.

Tanya: So people get intimidated and I can see why because the field hasn’t been super welcoming to some extent, where you’re not a real data scientist if you don’t know every single sort algorithm in existence and can write it on a white board while someone stares at you. Or you’re not a real data scientist if you don’t have a PhD in statistics.

Tanya: So we need to start to compartmentalize and understand what we actually need into different skillsets.

Hugo: Yeah, and I do think this also speaks to a general statistical literacy and data literacy which, as you say, the people who are using … You know the tens of millions or perhaps more people who use spreadsheets once a week or more, getting them up to speed on a bit more of the robust programing and statistical concepts that will help them do their job.

Tanya: Yep, exactly.

Hugo: So this leads nicely into my next question because we’re really talking about the future now. I’m wondering what the future of data science looks like to you? And this is a prediction problem, right?

Tanya: Yeah. Let me build a model for you real quick.

Hugo: Yeah, great. Build me a dashboard.

Tanya: Wow. I mean I think we will start to see more advanced cases for predictive modeling, machine learning but like we said before, companies are just now starting to ask questions of their data. Seeing what they have, realizing it’s dirty. The prediction part always comes later. So I do think you’re gonna see an increase in that even though some people are starting to say the big data and machine learning hype is … The wave has passed. But I think it’s still just beginning. We were kind of ahead of our time when we were doing this type of work at a biotech company I worked at back in 2008 or so.

Tanya: But people are starting to get it. The more success stories you see with it. The more it’s gonna catch on. I think you’re gonna start seeing more uses across industries we might not have expected. In government, in small research labs, and even the army is starting to use some techniques. You’re gonna see just a lot of different applications of using data to improve outcomes for any case you can really think of. I mean you can improve employee retention. You can improve … You can try to cure cancer. There’s a million different ways that we can use it more effectively.

Hugo: Absolutely. So in that sense, what the prediction is then that we’re gonna see it move laterally into kind of all facets of life and society and business?

Tanya: Yeah. I just think it’s gonna be more adopted, widespread and I think a big key indicator of that has actually been it’s involvement in the recent election and in sports. With sports gambling actually being legalized here pretty soon, people are starting to get it. That’s a big way I teach it too is we use sports data so that people have a little bit of fun and they understand it. And once you use analogies like that, it starts to click and people realize, "Oh, wow. I could use this in accounting or to sell more inventory."

Hugo: Yeah. That’s one of the most important things for education and learning in general is making it relevant to the learner.

Tanya: Yeah. We always try to use relevant data sets at corporate training so that people get it and they understand the why. And when we do just these part-time, kind of fun ones, we’ll use a fun data set and then we ask the students, "Well how can you see this applying to your current work environment?" And it’s pretty interesting. ‘Cause they can just see how it applies laterally once they’ve done … You know you calculate Tom Brady’s touchdown conversion percentage and then you go and calculate percentage of budget used.

Hugo: Yeah, that’s fantastic.

Favorite Data Science Technique

Hugo: So what’s one of your favorite data science-y techniques or methodologies? Something you enjoy doing.

Tanya: I love Shiny. I mentioned it before and I just think it’s given data scientists such a cool way to put their work out there and express themselves and I used to just build R scripts in isolation that maybe generated a CSV file. But now, with R markdown and Shiny we can really showcase what we’ve done and make it something real that people can touch and see and that is one of my favorite things to do.

Tanya: I mean I build fantasy sports dashboards for fun. So it’s definitely my favorite part of it I think is visualizing and putting together those dashboards.

Hugo: Great. And I don’t know if … I recall Mara Averick who’s also in Boston area. She’s a big fantasy sports fan as well, right?

Tanya: She is. We worked together actually on a project. It was a sports API that we built an R wrapper for and she’s a huge hoops nerd. She knows more about basketball then probably anyone I know. And yeah I keep trying to team up with her to take over the sports betting market so we’ll see.

Hugo: Fantastic. Well let me know how that pans out.

Tanya: I will.

Call to Action

Hugo: So Tanya, my final question is, do you have a final call to action for our listeners out there?

Tanya: Final call to action is just dive in, don’t be afraid. R is not scary. Python’s not even scary. But I honestly love R. I use both but just find a data set that you might be interested in to … It could be anything. It could be about food. It could be about sports, movies. And look up some tutorials. DataCamp is a great place to start. I always plug you guys and –

Hugo: Awesome. I do too.

Tanya: Yeah. I’m not paid, I swear for this promotion. But yeah. It’s just dive in and get your hands dirty. And the R community’s very friendly and helpful. There’s lots of places to go. There’s slack channels you can join. So definitely just get started if you’re interested in playing with data.

Hugo: Absolutely. And I always tell people, you know the R stats hashtag on Twitter is a great way to get involved. You’ll ask something and most of the time you’ll get an answer really quickly, in all honesty.

Tanya: Yeah. And Rstudio also just launched a community section where people ask questions and get answers. So if stack overflow isn’t working out or you just don’t even want to face the wrath of some stack over-flowers you could go to our studio community.

Hugo: And what we’ll also do Tanya is post links to a whole bunch of pieces you’ve written. There’s a great one on rapid prototyping. Another one on hiring in the data science space. We’ll post all of these in the show notes as well.

Tanya: Cool, that’d be great. Then I’m always happy to answer questions if people want to reach out over Twitter or you can find my email at our website, tcbananyltics.com as well.

Hugo: Fantastic. Thank you so much for coming on the show, Tanya. It’s been an absolute pleasure.

Tanya: Thank you, Hugo.

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.