Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
With 2020 almost upon us, those of us involved in the production, management and utilisation of data are set for yet another fascinating year. The growth in the volume of data that is produced will continue its inexorable rise, meaning that those who understand how to harness the commercial value of data will only become more important to organisations. So I look to the future with optimism! But as I do so, there is one thing I hope that those involved with data will talk less about next year and one of the many important areas that I hope people will spend more time talking about.
Let’s start with the topic I want to hear less about in 2020; data governance. Sure, governance is essential to creating an organisation that uses data to power their business but I view a governance strategy as ‘table stakes.’ To give some context to my point, Caroline Carruthers and Peter Jackson’s fantastic book “The Chief Data Officer’s Playbook” does a great job of defining different generations of Chief Data Officers (CDOs) in terms of a first, second and third generation:
- First Generation CDOs (FCDOs) – those who put in place the building blocks for future success, with a focus on governance, architecture, engagement and delivering quick wins
- Second Generation CDOs (SCDOs) – use the foundations created by the FCDOs to “make the vehicle sing”, producing repeatable value for the organisation to show how data can be used at the core of business
- Third Generation CDOs (TCDOs) – support the transition of a data-first approach into a “business as usual” state across the organisation
Now, as someone who works in the data industry and advises organisations on their data and analytic strategy, I go to a lot of “data” conferences and I’ve noticed something peculiar; most “data leader” type conferences these days are badged as having primarily “Second or Third Generation” themes (you know the exciting stuff!), but when you get there, they are just talking about governance…again. I feel that the conversation hasn’t really moved on from “first generation” topics yet, and believe the industry is crying out to focus on such topics as generating value through AI, rather than another rehash of data governance strategies. So, whilst I agree that Data Governance is important, I’d really like to talk about something else in 2020!
One of the many areas I’d like to hear talked about MORE in 2020 is on the subject of Analytics Ethics. To explain, earlier this month I read an excellent post by Ryan den Rooijen on this subject. Now I’m absolutely in agreement with Ryan that we absolutely need to start talking about ‘Analytic Ethics’ because by not doing so there is a real chance we could do serious harm to the perception of data and to its inherent value
For me there are 3 “ethical” considerations when using data and analytics to drive decision making:
- Data Ethics – this is a mature conversation aligned to recent regulations that talks about our responsible handling of data and the limitations of use
- Analytic Ethics – Ethics that guide the conversion of data into wisdom to inform some decision
- Business Ethics – the framework that governs our business practices
Ryan’s post talks, quite rightly, about the Transparency, Accountability and Morality of analytics. However, I’m also concerned about the impact that analytics will have on business ethics, and I’m just not sure we’re ready for it. Let me use a simple example to illustrate (loosely based on a real-life scenario that was, thankfully, narrowly avoided). Let’s imagine a company wants to use its data to market to a large group of people. Let’s assume they absolutely have the rights to use the data in this way. The idea is that they will call people on this huge list and offer them a product. However, based on reading an article, the Head of Customer Experience realises that analytics could be used to prioritise the list (good old propensity modelling, lift charts etc). So, their Junior Data Scientist goes away and fits a model to the data, creating an algorithm that is nicely embedded in the call centre’s system. The call centre staff get prompted to phone people in a certain order, which results in a significant lift over just randomly calling people. Everyone is happy – another great use of analytics.
However, what really happened is that the list contained a number of vulnerable people who, when contacted, will buy anything out of fear, with potential for getting themselves in financial difficulty and causing them a lot of stress.
In this example, how could this outcome have been avoided? Where is the “liability”? Was it on the data scientist, who doesn’t yet have the experience to understand why their model prioritised this subgroup? Or the business which just doesn’t understand modelling and just sees this as a “better” way of working? It’s a thorny and complex topic (which is as much moral as technical in scope) and one that we, as an industry, have to, yes, talk about in 2020, in order to reach the right conclusion. It’ll be a challenge I’m sure, but I’m confident that, with the right conversations, we’ll come to the right conclusions.
Looking forward again to 2020, I return to my sense of optimism about the future; the possibilities of driving value from data is only going to continue to increase and our ability to do so will only be limited by our intellect and imagination. Here’s to a fantastic next 12 months!
The post The world of Data in 2020 – what we need to STOP talking about and what we need to START talking about! appeared first on Mango Solutions.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.