Site icon R-bloggers

Critical Thinking in Data Science

[This article was first published on DataCamp Community - r programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Debbie Berebichez, a physicist, TV host and data scientist and is currently the Chief Data Scientist at Metis in NY.

Introducing Debbie Berebichez

Hugo: Hi there, Debbie, and welcome to DataFramed.

Debbie: Hi, Hugo. It’s a pleasure of mine to be here.

Hugo: It is such a pleasure to have you on this show, and I’m really excited to be here today to talk in particular about critical thinking in data science, and what that actually means, and as we know, to get critical about critical thinking and to see what aspects of data science in the space, what ways we are being critical, where we can actually improve aspects of critical thinking, particularly with respect to data thinking in general. But before we get into that, I’d love to know a bit about you. So, could you start off by telling us what you’re known for in the data community?

Debbie: Sure. Thank you. Well, I’m not sure I’m that well known in the data science community, but if I am, I would say it’s because I’m a big promoter of both critical thinking and of getting minorities, such as women, and especially Hispanic women, to enter the fields of STEM, including data science, and I’ve promoted and have started a bunch of initiatives geared towards getting more women to get into science, technology, and engineering. The second reason could be because I cohost a TV show for the Discovery Channel called Outrageous Acts of Science, so I know that a lot of people know me from there.

Hugo: Right, and you also are at Metis, aren’t you, the data science bootcamp?

Debbie: Absolutely. I was gonna say that next. So, I’m the chief data scientist at Metis. Metis is a data science training company that is part of Kaplan, the large education company, and we basically have two modes of teaching data science. One is through bootcamps, which we host in person at four locations, in New York, Chicago, San Francisco, and Seattle, and the second mode of teaching is through corporate training and other products. So, we teach live online intro to data science as a pre-bootcamp course, but we also customize various courses for corporations that need either visualization courses or Python programming or big data techniques and whatnot, and we’ve had quite a bit of success with that.

Hugo: That’s great, and I look forward later on to talking about kind of the relationship to your work at Metis, bootcamps in general, how they can prepare people for a job market where … In the job market, in some respects, coding skills are at the forefront and not critical thinking skills, and how to deal with that trade-off in the education space, which is something we think a lot about.

Debbie: Absolutely.

Hugo: On top of that, though, you mentioned you’re a big promoter of women, in particular, Hispanic women in the space, and correct me if I’m wrong, I may have mess this up completely, but you were the first Mexican woman to get a PhD from Stanford in physics?

Debbie: Wow. You didn’t get it wrong.

Hugo: I got that right?

Debbie: Yes.

Hugo: Fantastic.

Debbie: That’s right, and I think it’s an important statistic not so much to brag about it, but to show that examples like mine, of persevering and working really hard and making your dream come true exist out there, and they’re so important to talk about because they really serve as inspiration for people who sometimes think that their particular minority group or so is not suited for a career in data science or STEM.

Hugo: So, is this how you got interested in data science and computation initially, through a physics PhD?

Debbie: Yeah, yeah. I have kind of a, I guess, not so atypical background for data science. I did my PhD in physics at Stanford, like you said, and I did theoretical physics. I did a lot of computational work the last two years, and so I learned about models and programming and working with data. Then I moved to New York to do two postdocs at Columbia University and at the Courant Institute, part of NYU, after which I decided, like a lot of physicists, to work in Wall Street for a few years as what is sometimes derogatorily called a quant. I was involved in creating risk models, and I did a lot of data analysis, and that’s when I realized that my skills in math and programming had other alternative ways of being applied, not just in physics.

Debbie: So then, after Wall Street, I thought that that was not the field for me because I didn’t really care about just making money, even though making money is nice, but I had bigger aspirations, and I wanted to do data and ethics and help the world and change the world in many ways, and so I’d heard about this new field, sort of new field for me at least, called data science about 10 years ago, and I took a course. It was kind of like a bootcamp. I had the skills, but I didn’t know how to translate them into the different techniques and algorithms that are typical of data science. So, after taking that course, I jumped ship and I started my career in data science.

Hugo: Awesome. That’s a really interesting trajectory, and I just want to step back a bit, and if you don’t want to talk about this, we don’t have to, but I’m just wondering, coming from where you were in Mexico, did you have kind of a social, cultural, and even familial or parental support to go down this path?

Debbie: No, I didn’t, and that is precisely why I care so much about inspiring and helping other young women who, like myself, feel attracted to a career in science or engineering, but who for some reason, whether it be financial or social, feel that they cannot achieve their dreams. From a very young age growing up in Mexico City, I was discouraged from pursuing a career in physics and math because I was a girl, and I was told by friends and parents and teachers in school that I better pick something more feminine, and that to do physics I had to practically be a genius, which I knew I wasn’t, and so they really discouraged me so much that I became insecure about my math skills and about my ability to conquer and study the field.

Debbie: So, years later when it came to go to university, I picked philosophy as an undergrad because I thought that that was something similar to physics. It had a lot of questions, and you could use your imagination to ask yourself why are we here, and all kinds of things that had to do with objects that surround us and their meaning and whatnot, but I realized, Hugo, that the more I tried to hide my love for physics and math, the more that this inner voice telling me to go for it and to study it was screaming at me, until two years into the bachelor’s program in Mexico, I decided behind everyone’s back to apply to schools in the US as a transfer student, and it was difficult because in Mexico we were paying an eighth of what universities cost in the US, and especially as a foreign student, it’s very hard to find scholarships and financial help, but I was extremely lucky that I got full scholarship offered to me by Brandeis University in Massachusetts, and so in the middle of my BA in philosophy, I transferred to Brandeis in the winter. I hadn’t seen the snow before, and I picked up philosophy courses, but right in my first semester I had the courage to take my first intro to physics class. It was a very large classroom with a hundred students, and the class was astronomy 101.

Debbie: In that class, I realized that my passion and my love for physics was not gonna go away, and I befriended the teaching assistant in the classroom who was a graduate student by the name of Rupesh, who came from India. He came from Darjeeling, town in the Himalayas, and Rupesh and I became friends, and we would meet all the time, and he was the first person who truly believed in me, and he told me that I wasn’t the typical student that just wanted to get an A in the homework, that my questions were just so curious, and I was so inquisitive, and that I really, really cared about knowing about the planets and quantum mechanics and statistical mechanics, and all kids of things, and so he really encouraged me to try to do physics, until one day, we were walking in Harvard Square in Cambridge, and we sat under a tree, and I looked at Rupesh with tears in my eyes, and I said, “Rupesh, I just don’t want to die without trying. I don’t want to die without trying to do physics.”

Debbie: He got up, and we didn’t have cellphones at the time, but he called his advisor who was the head of the physics department at Brandeis, Dr. Wardle, who was the professor in my astronomy class, and he said, “I have a student here who has a scholarship for only two years because she’s a transfer student, and I know that BA in physics takes normally four years to complete, but she’s really, really passionate. What can we do about it?” So, Dr. Wardle called me into his office, and we had a conversation, and he basically told me, me and Rupesh, who was there with me, he said, “Believe it or not, there’s somebody else who’s done this in the past at Brandeis. His name is Ed Witten. He is-”

Hugo: Wow …

Debbie: I know. For those people who know physics and know who he is, he’s basically the father of string theory, so he definitely qualifies as a genius, and so I thought he was pulling my leg, like okay, Ed Witten, there’s no way I could achieve this. But he said, "Ed switched at Brandeis from history to physics, and he did it in only two years," because I couldn’t ask my family to pay for another extra two years to stay there, and so what Dr. Wardle offered is he gave me a book called Div, Grad and Curl, which is vector calculus in three dimensions, and basically, he said to me, "If by the end of the summer you’re able to master this material," and Hugo, I didn’t even remember algebra at this point-

Hugo: And of course, there’s a whole bunch of linear algebra, which goes into this vector calculus. Right?

Debbie: Of course. There’s so much background you have to know to even get into studying this book. So, he said, "If in two months," because this was in the month of May, "you’re able to master this material, we’ll give you a test, and we’ll let you skip through the first two years of the physics major, so you can basically finish the whole BA in only two years." So, Rupesh looked at me, and he said, "We’re gonna do this," and he decided, incredibly, to devote his entire summer from mid June to end of August to teaching me and mentoring me, and basically covering all the subjects that I needed to master in order to enter the third year of physics in September.

Debbie: It was amazing because I was so incredibly hardworking and passionate that I didn’t move from my desk. Every day, Rupesh taught me from 9:00 in the morning till 9:00 p.m. We didn’t have much time, so it was just practical, knowing how to solve derivatives on Saturday. Sunday, we’ll do integrals. Monday, first three chapters of classical mechanics, and you get the idea. So, at the end of the summer I presented the test, and I passed. I tried to not burn too many capacitors in my first electronics lab at the time, and I remember how incredibly grateful I was to Rupesh, this person that absolutely changed the course of my life.

Debbie: I tell this story every time I have an opportunity because it’s incredible to me what Rupesh told me. I basically always wanted to pay him for all that he dedicated to me and all the effort he put into tutoring me, and he said to me that when he was growing up in India, in Darjeeling, there was an old man who used to climb up to his little town in this mountainous terrain, and used to teach him and his sisters the tabla, the musical instrument, math and English, and every time the family wanted to compensate this old man, he said, "No. The only way you could ever pay me back is if you do this with someone else in the world."

Debbie: That beautiful story is how my mission in life began, and Rupesh passed the torch of knowledge to me to inspire, help, and encourage other minorities who, like myself, dream of becoming scientists or engineers, but who for some reason lack the confidence or the skills at the time, and that has really informed my career. It has been the passion that connects everything that I’ve done, and I’m incredibly grateful to that pay-it-forward story. So, after graduating with highest honors from Brandeis is when I went to Stanford, and I reconnected with Rupesh only about seven years after that, because he had gone to the South Pole to be a submillimeter astronomer, and we connected, and he was incredibly proud that I managed to graduate and do my research with a Nobel Prize winner at Stanford, and it was a great story.

Critical Thinking and Data Science

Hugo: Firstly, Debbie, thank you so much for sharing that beautiful story. Secondly, I wish I had a box of tissues with me right now, and thirdly, I feel like I was sitting there under that tree with you and Rupesh solving all the vector calculus challenges, and I want to give Rupesh a big hug and a bunch of cash right now as well, but of course, I’ll do exactly what I’m trying to do and what we need to be doing, which is paying it forward, and I think that actually provides a great segue into talking about critical thinking and data science, how we think about critical thinking as educators, being critical of critical thinking, and maybe I want to frame this conversation by saying there’s just a lot of talk around the skills aspiring data scientists, data analysts, data-fluent, data-literate people need to know, and sometimes to me, anyway, the conversation around this seems to be a little bit superficial, and I was wondering, firstly, if that’s the case for you, and secondly, if it is, what seems superficial about it?

Debbie: Yes. I’m so glad you’re asking this question, Hugo. I can’t tell you how many times I have visited programs where I’ve been a mentor for high school students, and I’ll give you one example. One of these afternoon programs was receiving quite a bit of funding, and there were three groups of young girls from high school working in data science, and they had been taught SQL, so they were masters at it, much more than I was ever proficient at their age, so I was like, “Wow. These girls are really impressive.” There were three groups. They were working at a museum, and so one of them was working with a data set that was about birds in the museum, and they were trying to find patterns by looking at their demographics of the birds and their flying patterns and all this kind of information.

Debbie: Another group was looking at astronomical objects, and a third group was working with turtles because the museum had a whole bunch of turtles in an exhibit. So, I went to the third group that was working with turtles, and I looked at the data that they were working with, and one of the columns said weight, so the weight of the turtles, and so I said, “Oh, wow. So, just out of curiosity, how big are the turtles that you’re working with? Have you ever seen them?” They said, “Oh, yeah, we have. They’re about the size of the palm of my hand.” I said, “Oh, cute. I’d love to see those turtles.” I said, “Okay. So, is the weight here that you have in the column … You don’t have any units for it because you just have the number, and the numbers are around 150 and 200 and 300. So, is this weight in pounds? Is it in kilograms? What is this weight in? What are the units?”

Debbie: All of a sudden, these six girls in the group got all quiet, and none of them ventured to answer until one of them raised her hand and said, “Oh, I think it’s in pounds,” and I said, “Oh, wow. Let’s see. I’m about five-foot-three, and I weight probably about 120 pounds, so this is interesting because a turtle that’s the size of my hand, basically, you’re telling me it weighs double the amount of pounds that I do. Does that make sense?” Then they all laughed and said, “Oh, yeah. You’re right. It doesn’t make sense,” and we had this very nice conversation, and we went back and forth. It turns out, after an hour, we finally found a teacher who knew, and for certain, gave us the information that the weight was actually in grams.

Hugo: Wow.

Debbie: So, the girls were surprised, and that story really caught my attention because I had been visiting a lot of schools and programs that are trying to teach coding in a very kind of fast and superficial way, just to be able to say, "Our students know how to code," and I realized that in an effort to get more and more people to know the skills for data science and for data analysis in a world that’s going way too fast where we need to prepare our students for jobs in AI and machine learning and whatnot, we are forgetting what all of this is for. Coding and analyzing data has a purpose. It’s not an end in itself. The purpose is to be able to solve problems and to have insights about what the data is telling us.

Debbie: If we’re not taught to ask the right questions and to think critically about where the data comes from, why is it being used or collected in a certain way, what other data could help or hurt my dataset, what biases are being introduced by this dataset, if we’re not teaching our kids to think what’s behind these techniques, then we’re basically failing, because we’re just making them like robots who can only perform a simple task if, and only if, the next dataset they see is similar in scope and structure to the one that they’re learning to work with.

Debbie: It was a very moving, and in a way, also painful experience to see, because I realized how needed are those critical skills, and not only in the education at the high school level, but how many projects haven’t we seen at companies, at very large companies and advanced data science groups where there’s a significant bias being introduced because no one bothered to include a certain minority but important group in the statistical sample, or bias was introduced because people didn’t bother to check what some outliers in the dataset were describing et cetera. So, I’m very, very passionate about teaching the critical thinking skills that are behind our why for why we do data science.

Collecting Data

Hugo: You’ve spoken to so many essential points there. The overarching one is critical thinking, and what I like to think of, data thinking or data understanding before even … There’s a movement to put data into models and throw models at data before even looking at, as you say, units or important features, or really getting to know your data, getting to understand it, and performing that type of exploratory data analysis, and a related point that underlay a lot of what you were discussing there is thinking about the data collection process as well, and if you’re collecting data in a certain way, what are you leaving out? What are your instruments not picking up? Is your data censored for any of these reasons? Are you leaving out certain demographics because they don’t use a particular part of your service?

Debbie: Mm-hmm (affirmative). Exactly. Exactly, and I think I see a lot of companies that don’t really know what data science is about, because it has become this buzzword, and everyone wants to be in it, but nobody really knows exactly what you can get out of it, and what’s happening is a lot of companies are investing significant dollar amounts in big data and solving big problems because they have collected so much data, they just build a huge infrastructure and try to find insights, but without really know if, first of all, those insights are important for the company, second of all, if they find them, would they be able to use them for something and enact policies or something that’s actually gonna be helpful for the goals of the company? I always remind them with this kind of simple example. One of my heroes in physics is Tycho Brahe, who was a very famous Danish astronomer. Basically, he was locked up in a tower in an island in Denmark, which I actually had the opportunity to visit last summer.

Hugo: Oh, really?

Debbie: Yes.

Hugo: Wow.

Debbie: He lived in the 1500s, an amazing man, but he also had a … Apparently, he was a nobleman. He had an awful personality, and he lost his nose in a duel.

Hugo: They say he replaced it with a golden bridge, I think.

Debbie: With a bronze-

Hugo: Bronze. Yeah. Okay, great.

Debbie: I think that has been discredited a bit. That’s what they told me in the museum. But anyways, yeah, this very interesting character, but the amazing thing about him is that he looked at the sky without any telescope. He basically had created these sophisticated instruments, but in the 1500s, it took him years, and he created a catalog of only about a thousand stars. That’s it. So, that’s a very, very small dataset by today’s standards, but from only those thousand data points, I think it was like 1,800 or so, to be more accurate, but he helped the theories that were later created by Kepler and Copernicus, and where the laws of planetary motion were derived.

Debbie: Basically, Kepler used that, and then Isaac Newton used it as the basis for the law of gravity. So, from those thousand data points came universal theories that we’re still using today, that are incredibly powerful and deep, and that is a good example to say that sometimes we can put a lot of investment into huge datasets, but when we’re talking about data literacy, large datasets also have a lot of noise, and you have to start by teaching that the most important thing is the insight that you’re going to derive from that dataset and not its size.

Big Data

Hugo: I’d like to speak to this idea of the focus on big data and the fact that a lot of us are collecting as much data as possible, thinking that all the information we need will be contained in there, even before asking critical questions, which is very dangerous, but before that, I just want to say tangentially, Tycho Brahe and Kepler’s story is so wild. I haven’t looked into it in a while, but if I recall correctly, Kepler wanted to unlock the secrets of planetary motion and figure out what was happening, and he realized that Tycho had the data. So, this is a story of someone realizing someone else has this data, and he went to work with him in Tycho Brahe’s, I think, final years, and Tycho didn’t even give him all the data at that point. He was actually very secretive about the data he had, and even when Brahe died, Kepler had to struggle with Brahe’s family in order to get the data. So, there were all types of data secrecy and data privacy issues at that point as well.

Debbie: Also, data ownership, because what-

Hugo: Exactly. That’s what I meant. Yeah.

Debbie: Most people know who Kepler was, but if you ask people about Tycho Brahe, very few non-science people know, and that’s because a lot of the credit went to Kepler, and some people argue that the one that did all the meticulous observations and had theories about it was Tycho, and so he deserved more credit. So, it was kind of a crazy time, and lots of fights about data were happening.

Hugo: Of course, we’re talking about a decoupling or a separation of, let’s say, humans into the people who are fantastic at collecting data and the people who are fantastic at analyzing it as well. This is a division in a lot of ways.

Debbie: Yeah, absolutely.

Hugo: But this focus on big data, the fact that even a lot of companies’ valuations are based around the fact that they have so much data, and it must be useful in the future, right? This is incredibly dangerous for practitioners, but also for society.

Debbie: Absolutely. I mean, we did have a tipping point in that we had the hope in the ’70s of AI and changing the landscape of our society, and it didn’t quite deliver in its promise because we didn’t have the capacity to analyze very, very large datasets like we do now, and there was a tipping point where now we are able to analyze these much, much larger datasets. I mean, I think every day in the world, we produce 10 to the 18 bytes of data, like 3 exabytes of data, something like that, that we generate. So, obviously these are enormous scales, but what’s important is not that we now have this capacity to analyze it, but are we really getting a significant marginal insight, or are the insights that we’re getting commensurate with the ones that we were getting when we didn’t have such large datasets?

Debbie: I think that question’s still out there. We haven’t been able to answer it because, as you know, the real important applications of AI are still being created and worked on. A lot of the AI things that we see out there are still simplistic in that they don’t use all of the incredible and deep capacities that AI has to solve problems. So, dimensionality of the data matters. It matters a lot, and probably for certain problems, it’s going to be hugely important. But my point is more about when you’re educating people or when you’re a company investing in certain technology, you have to be able to walk before you run, so start analyzing the smaller datasets, come up with strategies that are based more on critical thinking, and the questions that you’re trying to solve rather than the size of your dataset, and the size of the infrastructure that you’ve built.

Top 3 Critical Thinking Skills to Learn

Hugo: Great. So, I’ve got a thought experiment for you, which may happen all the time. I have no idea. But a student, an aspiring data scientist data analyst comes to you and says, "I need to learn some data thinking skills, some critical thinking skills to work with data. What are the top three critical thinking skills that you think I should learn, Debbie?"

Debbie: Thanks for that question, Hugo. I think the first one is you have to be a skeptic about data. You have to always … Just like when you read a scientific paper, you have to know who paid for this research. Was it the drug company that is sponsoring a paper that says their drug is the only and best drug in the world? Clearly, I’m not gonna trust that paper. So, a healthy skepticism about the team that collected the data, what biases could have been introduced, where was this data taken, how was it collected, what things were left out, what variables would be important in the future, et cetera. All those questions I think are super important. So, if you don’t ask them before even doing exploratory data analysis, it means you’re thinking about the data, and your relationship with the data is gonna be limited.

Debbie: The second one, and this one, I came up with it from another famous physicist, Richard Feynman, who said, "The ability to not fool oneself is one of the hardest and most important skills one can acquire in life," because it’s very easy … Sometimes we think, oh, I wouldn’t be fooled by anyone, not any marketing campaign, not any government is gonna fool me, but we fool ourselves much more often than the people interpreting the data out there. So, the ability to not fall in love with what we think our data should be telling us, that is what I call fooling yourself, that is super important.

Debbie: The third skill is connecting the code and the algorithms to the real world, like my example with high school girls that were working with the data. To be working with a database for three months and forgetting that behind the data are actual turtles, in this example, that’s a big mistake, the same way when Facebook is incredible at doing face recognition and analyzing relationships between groups and people, but if they’re forgetting that behind those connections are real people with real lives and real consequences, then we’re failing. We need to really connect our analysis to the world out there.

Hugo: I agree, and I just want to go through those again, because I’m sure our listeners are scribbling away trying to remember all of this. So, the first one was a healthy skepticism about data, the second, the ability to not fool yourself, and the third, connecting the code and the real world and all the stakeholders that actually exist on the ground.

Debbie: Correct. Thank you, Hugo.

Bias

Hugo: So, I just want to build slightly on the ability not to fool yourself. I mean, all of these are incredibly important, but there’s a paper called, I hope I get this right, Many Analysts, One Dataset, that we’ve discussed once or twice on the podcast before, and it gives a whole bunch of statisticians and domain experts a dataset, separates them into teams, and gives them the same dataset and asks … It’s a dataset of, I think, either yellow or red cards given to football players in football or soccer matches, and the question is, are these decisions to give cards, is there some sort of ethnic bias or a racial bias in these decisions?

Hugo: The fact is, what happened was 70% of the teams said one thing, 30% said the other thing, either yes or no, and then when they got to see everyone else’s results, nearly all the teams were even more sure of their own techniques and their own results. There are a lot of reasons for this, but one of the points is that people go in with a certain bias already, and if you have a bias going into a dataset, you make all these micro-decisions as an analyst, which helps you get to the place that you already thought you were going, right?

Debbie: Yeah. You reminded me, funnily, of a paper that I discussed. I don’t even think you could consider it a scientific, sophisticated paper, but it was a paper done for the astrology, not astronomy, but Astrology Association in India years ago, and I talked about it at a conference because they first decided the hypothesis is that through some astrological charts that tell you certain characteristics about some kids, if these people that were the gurus and the chart readers and predictors were able to guess, I think that they gave themself a pretty low score. They said, "If we are able to guess 60% of the outcomes," and I think the question was whether these students were intellectually gifted or just going to be average students in school, based just on their astrological chart, "and if we’re able to get 60% of them right, then that means we are gurus, and astrology is true, and we are able to predict this with very high confidence." That was their confidence level.

Debbie: The funny thing is even though they did slightly worse than a coin toss, that is they got 49% of them right, and anybody in their right mind would be able to say, "Well, clearly they did even worse than chance, a toss of a coin would’ve done better," but they themselves patted themselves on the back saying, "You see? We got 49% right. We can do this." So, it’s a very funny paper, and I encourage people to read it because it’s so easy to fool ourselves.

Hugo: Absolutely, and the best thing about doing worse than a coin toss is you could actually just switch all your decisions and do better. So, we’ve been talking about critical thinking at an individual and societal level. I’m wondering how you think about the needs for all these skills, critical thinking skills, how they should be spread through organizations, and what I mean is, what type of critical thinking and data thinking skills will be needed and are needed for people who don’t even work directly with data themselves, but in jobs impacted by data?

Debbie: Yes. That’s an excellent question because I think the more that our field of data science grows, the more that we get different dependencies in companies, different groups needing insights or even having contact with the data, and not everybody’s going to be a data scientist. We’re gonna have people just interpret visualizations that come from the data, others using APIs and having to interpret what the algorithms come up with and whatnot. So, I think it’s essential that we spread the critical thinking message across organizations, and it has to start early in school because the ability to ask the right questions in an industry setting in incredibly important, and I don’t think we’re putting enough emphasis in it. So, I think everybody in an organization has to be trained about things such as data ethics. How is the data being collected? Are we using it for the right purpose? Data ownership, data privacy, data security, all kinds of issues that impact the manipulation of data, and so that’s part of the critical thinking process.

Hugo: Hopefully, this aspect of understanding on the part of people in society and other working professionals who aren’t data scientists will result in less burden on the data scientists. What I really mean by that is … Well, there are a few ways to frame it. The first way is I think it was probably Nate Silver who said this. Any quotation I don’t know who it was, I’ll just say it’s Nate Silver, generally. But it was probably Nate Silver who said something like, "When a data scientist gets something right, they’re thought of as a god, and when they get something wrong, they’re thought of as they’ve made the worst mistakes ever," as opposed to a job in which sometimes you get it right, and sometimes you get it wrong.

Hugo: Another way to frame it is it kind of viewed by people without data skills are like, "I have no idea how to deal with this, so this is what you’re going to do, and you have kind of … You’re a prophet, or you’re the holder of divine knowledge, or the high priest of data science", I like to call them, and whether this will actually help, as people develop more data skills who aren’t data scientists, will actually help bridge this gap in a lot of ways. So, how do you think about these types of issues and challenges when building data science curricula at Metis and elsewhere?

Debbie: Yeah. It’s very important for me to learn … I’m not an expert in the field of learning science, but it’s very important to me to learn how to best build curriculum that optimizes these critical thinking principles and questions that I’m talking about, and so it really depends on the curriculum. So, for example, we built with a team with Cathy O’Neil, who I know you’ve interviewed before, who I love, and a group of others, seven executive women with the funding from Moody’s Analytics and the help of Girls, Inc., we developed the first data science curriculum for high school girls of under-served backgrounds, and we deployed it in New York in several high schools.

Debbie: So, I think it was just this amazing experience because we try to emphasize focusing on the topic and what the consequences were of every single step in the process, from data collecting, to choosing the algorithm, to knowing how to measure the accuracy, the recall, the precision, everything that we were doing, where it comes from, how to choose the metric that was right for the problem at hand, et cetera, and so the intention was very conscious to be about how to get the most insight about the limitations and the successes of the challenge or the problem at hand.

Debbie: When I build curriculum for the Metis bootcamp currently in my position, I want the students to have a pretty broad set of tools with which they can crack really hard problems. So, I may not focus on getting every single clustering algorithm there is in the curriculum, but I will focus on how to analyze the results of the clustering algorithms that we will see, and how to know if we’re using the right algorithms for the problem at hand, and how to be able to ask that question of our colleagues, of our communities, et cetera, because we all have limitations to our knowledge.

Metis

Hugo: Yeah. There are two things there I want to focus on. The first is, as you said, at Metis, thinking about the actual problems, and thinking about the question at hand before even getting coding I think is incredibly important, and also, educating people through questions that really pertain to them and are interesting to them. So, students will ask me, "If I want to embark upon my first data science project, what would you suggest I do?" I say, "Well, what are you interested in," and if they have a fitness tracker, for example, I say, "Maybe you could analyze your own fitness data. If you’re a foodie, scrape Yelp reviews of restaurants and work with that type of stuff. If you love movies, if you’re a cinephile, the OMDB has a fantastic API."

Debbie: That’s exactly what we do at Metis. We have our students in the bootcamp use their own dataset, and they create their own project. So, it’s really cool. I encourage people to go to madeatmetis.com, and it’s a site where we have some of our greatest projects, and it’s incredible because you see people that had very basic math and programming skills coming in, and in three months they’re able to analyze contamination sources in the ocean, or some healthcare-related thing, or an app that helps you choose the best restaurant for crepes that evening, and stuff like … It’s really, really cool what you can do.

Hugo: Yeah, and I’ll build on that by saying I’ve been to several of Metis’s graduation presentations. What do you call them?

Debbie: Career Day.

Hugo: Yeah. They’re incredible, and seeing all the learner students there present the work they’ve done is amazing, and I know that … For example, you know I’ve had Emily Robinson on the podcast. I work with her now at DataCamp, and she completed Metis, and I think she went to Etsy straight from Metis. I could be wrong there.

Debbie: Yes. We love Emily.

Future of Data Science

Hugo: Yeah, incredible. So, we’re gonna wrap up in a few minutes, but I’d like to … We’ve talked about the state of play of critical thinking today, but I’d like to … It’s a prediction problem. So, what does the future of data science look like to you, Debbie?

Debbie: To me, it’s going to merge with the industry of IOT or the internet of things. That is, as we see the ubiquitous sensors, that these sensors are simply everywhere, from medical devices, to buildings that are smart buildings testing our comfort level, to apps that match our behavior, it’s-

Hugo: I mean, you’re right. We wear them, and we carry them in our pockets, right?

Debbie: Exactly, and just like the personal computer came to revolutionize the information technology field, the same way, IOT is going to revolutionize, and we’re gonna see a new paradigm where we’re going to collect substantial more amounts of data about ourselves, our behaviors, our connections, and so issues that have to do with data privacy, data ownership, security, analysis, insights are going become evermore important. So, what I predict is that with more automation, we’re gonna have more needs to have people that are not necessarily the data scientists working with the data, but are working in the field to analyze the ethical consequences of it to act as peer reviewing committees to see if there should be policies or regulations that should be enforced around certain applications, et cetera. So, that’s what I see for a future, more and more need for sort of adjacent professions that help with the data analysis process.

Hugo: Yeah, I think you’re right in terms of defining it anyway or describing it as a merging between data science and IOT and automation. I can’t quite remember, did you give a talk on the internet of things at the NYR… Jared’s conference, a few years ago?

Debbie: Yes, I did at the R… Yep. Yeah.

What is your favorite data science technique?

Hugo: Okay, great. Well, I loved that talk, and Jared puts all those talks up online, so I’ll find a link for that and put that in the show notes as well, if anyone’s interested. So, I want to get a bit technical. I’m wondering what one of your favorite data sciencey techniques or methodologies is, just something you love to do.

Debbie: I actually really, really love singular value decomposition, SVD. I’ve always loved linear algebra, and just the thought of being able to reduce the dimensionality of a problem is so sexy to me. In physics, we deal with all the time, and my first encounter with it was when I worked briefly with David Botstein, who’s … This is many, many years ago at Stanford. He’s one of the creators of Genentech, the biotech company, and we were analyzing the data coming from DNA microarrays, which basically compare a sample of healthy DNA with a sample that came from a patient in order to conclude whether the patient had cancer, and in the case of a positive answer, what type of breast cancer it was.

Debbie: So, it was really, really interesting because, obviously, there are so many genes in our genome that the dimension of the problem was humongous, and so to apply SVD and be able to reduce it to the dimensions that were most important enabled them to come up with pretty customized drugs that I have heard, because I have since stopped working in that topic, but I’ve heard are working quite well for different types of breast cancer. So, the applications of SVD are incredible, and so I don’t know, I just really like that conceptually, and anything that has to do with that, even NLP and, I don’t know, just seeing what you can get by sacrificing a bit of information is just really interesting to me.

Hugo: Well, I’m sold. I mean, you’ve motivated it through linear algebra, which I also love, and then you gave some incredibly important examples of its use, and for those of you out there who know of PCA, I’d definitely suggest you to check out SVD as well.

Debbie: Yeah.

Call to Action

Hugo: I’ve got one final question for you. Do you have a final call to action for our listeners out there?

Debbie: Yes, I do. I’ll repeat, Hugo, what I said in my Grace Hopper Celebration keynote speech a little over a year ago. Think deeply, be bold, and help others.

Hugo: I think that’s fantastic, Debbie, and what we’ll do is we’ll link to your Grace Hopper talk as well, because I think the way you explained in that talk all of these things, why it’s important to think deeply, be bold, and help others, which you’ve kind of gone through this talk as well, I think that talk can provide more context there also.

Debbie: Wonderful. This has been such an awesome conversation, Hugo. Thank you.

Hugo: Thank you so much, Debbie. It’s been an absolute pleasure having you on the show.

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Community - r programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.