Becoming a master-chef: the logic of programming

gdickens

2 hours ago

[This article was first published on R – Policy Analysis Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Summary: In this post, Giles explains the logic behind using code to achieve a set of analysis tasks with a focus on the R programming language.

I’d spent quite a bit of time working with data before being introduced to R, but this was almost always one using a ‘point and click’ interface’.

I’d have a question, a set of data related to it and I would click through a series of menus to come to an answer.

I remember finding tools like Microsoft Excel confusing when learning statistics, but in a sense it mirrored how we’re taught to solve problems: you have a goal, ingredients, a limited set of tools and a step by step recipe. To bake a cake, we need to break the eggs, combine them with the sugar and flour and mix them together. To get the average age of a group of people, we select the relevant data and apply Excel’s AVERAGE() function to calculate the average.

	Baking	Point and click analysis
Goal	Make a (terrible) cake	Determine the average age of the group
Ingredients	Eggs, milk, flour, sugar (etc).	Four data points.
Recipe	Specifying how to combine ingredients and use tools from your kitchen.	Apply the AVERAGE() function to the relevant age data.
Tools	Whisk, bowls, cake tin, spoons and oven.	Microsoft Excel’s AVERAGE() function.

This feedback loop is also familiar to us: we take action and observe the consequences of our actions at each stage of a recipe. If there are eggshells in our mixture we’ll hopefully notice this when mixing the ingredients together. If we’ve accidentally selected MAX() instead of AVERAGE() we’ll probably notice our results are too high.

When coding, we lose that direct connection with our work. Rather than handling data ourselves, we’re writing instructions for a computer to follow. This indirect approach means we don’t always see the immediate impact of our choices, and troubleshooting can become more complex without clear signals about what went wrong.

Yet for those who work with data regularly, learning to code is invaluable. It unlocks powerful analytical tools and can dramatically speed up your workflow by allowing you to automate repetitive tasks. But, there is a trade-off: you must become adept at writing the recipe – mapping out each step of the analysis and translating it into precise set of instructions that a computer knows how to execute.

Source: @rogierK, Twitter (link no longer available).

Dreaming of Electronic Sheep

Writing code is like writing a recipe for a robot chef that can’t cook. Robots are great at math and precisely following instructions, but have no idea what food tastes like or how to use ingredients to make it. Give a typical cake recipe to a human and they’ll probably be able to figure it out. Give the same recipe to a robot and you’ll be lucky if they don’t set the kitchen on fire.

This is because computers are incredibly fast, but purely literal, machines. They excel at processing vast amounts of data and performing calculations at speeds far beyond human capability. However, they lack the human ability to recognize context or read between the lines. Meaning if somebody gives them a stupid instruction they won’t challenge it, but will faithfully hum away in the background doing stupid things as quickly as they can (or fail when something isn’t clear). Writing for computers therefore requires leaving nothing to chance. We need to give exactly the ingredients it needs, in exactly the right format and tell it precisely how to put everything together in a way that leaves us with no doubt the final result will be what we asked for.

Not your first rodeo

Newcomers to programming can find this incredibly daunting. As not only do you need to master a new language, but become comfortable with the arcane set of rules and procedures that computers are comfortable with.

The good news is this isn’t your first rodeo as you’ve learned a language before (after all, you’re reading this post!).

None of us are born knowing how to speak our native language. Instead, we learn by watching and mimicking those around us. Maybe we start mimicking others by trying to say the word “ball” when we see that round thing on the lawn. Next, we might start to realize that the word “throw” is frequently used when hurl this “ball” thing around. Eventually we might notice that this ball thing is more likely to be thrown at us when we use the word “me”.

Code is similar. We spend time mimicking others, troubleshooting errors and practicing the basics until we’re armed with a large enough repertoire of words to write the recipes we need. We will also get a sense of how what makes a good recipe: what should our ingredients (data) look like, how should it be prepared and how can I monitor whether everything is on track throughout the analysis pipeline?

As a simple example of this, take a look at how the R code is executed below. Notice that in the code below we’ve had to provide R with the data in a precise format: a collection of numbers separated by ‘,’. We’ve then enclosed the ages in c() and used ‘<-‘ to save the numbers in a parcel of data called ‘dta_ages’. We then send the data to the mean() function to return the result as our Excel example above.

Give R the data in a format it doesn’t understand, incorrectly spell dta_ages or call the MEAN() function instead of mean() and the code will fail. And while this might feel frustrating when you first start learning how to code, having an opinionated programming language obnoxiously warn you when a problem exists is always better than it being hidden from view. These error messages, though initially intimidating, are actually valuable learning tools. They help you develop good habits and catch mistakes early, before they cascade into bigger issues downstream in your analysis.

The key is to remember that every programmer, no matter how experienced, started exactly where you are now – learning the syntax, dealing with error messages, and gradually building their confidence. What seems like a foreign language today will eventually become second nature, allowing you to focus less on the mechanics of coding and more on solving the actual problems at hand.

A note how AI was used: the majority of this post was written by the author. AI tools were used to workshop alternative approaches for more clearly communicating key ideas and concepts.

To leave a comment for the author, please follow the link and comment on their blog: R – Policy Analysis Lab.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Related