Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So Steve (SQL), Marsha (C), Bob (Python), and I (R) are at this party. We have TOTALLY cleared the room, especially now that Steve and I are deep into a debate about saving native data objects to disk versus storing data in a database.
I see Monica enter from the kitchen, carrying a bowl full of punch. It’s an awkward task and the fruity, sticky liquid is sloshing on the floor. Monica does data science, so I’m hoping she’ll come to my assist. Sure enough, she places the punch bowl on the table and joins us. She’s about to say something when the front door swings open.
Guenter walks in; he just got off a plane from Germany, so he looks a bit jet-lagged. Since the room is filled with a bunch of people talking SQL, he assumes database debates are the theme of the party.
“I think I have already written an article in this context,” Guenter begins.
Before he can say anything more, Helen speaks up. “Perhaps talking about programming is an attempt to get everyone to leave the house at the end of the party so you can go to bed?” Where Helen appeared from is a mystery.
Monica listens for a minute, then interrupts the pointless debate between Steve and I. “People who are math aficionados” she says, “are a lot more comfortable generating datasets on-the-fly. People like me enjoy relying on the safety and reliability of importing a structured dataset we checked earlier!”
Steve is happy to hear someone is on his side. Steve thinks I’m a knucklehead. There are many people who agree.
“Sure, but there are advantages to not messing around with unnecessary overhead,” I say. “Let’s play with an example.”
I get out a new napkin and sketch out some R code…
recentRainfall <- sum(waterByZone["rainfall", (yearDay - 1):(yearDay + 1)])
If I assume ROWID can be used in a similar fashion to yearDay, the equivalent SQL code might look something like this…
CREATE TABLE "waterByZone" ( "recentRainfall" INTEGER, "neededInFront" INTEGER, "neededInRear" INTEGER, "wateredInFront" INTEGER, "wateredInRear" INTEGER, "secondsWateredInFront" INTEGER, "secondsWateredInRear" INTEGER, "evapotranspiration" INTEGER ); SELECT SUM(rainfall) FROM waterByZone WHERE ROWID BETWEEN :start_day AND :end_day;
Steve gags at my SQL. I’m sloppy, but he needs to give me some leeway. After all, I’m working on a napkin with a dull pencil.
“So I’m not sure where there is much of an advantage between SQL and a data object, at least in this context.” I point out.
“You are correct,” Monica says. “The question is technically two sides of the same coin.”
Guenter picks up the napkin. “When it comes to manageable data sizes, I even save them as a CSV file for later use (I know, I know, …). Guenter shrugs his shoulders and continues. “When I “collect” data, the database system SQLite is actually my first choice on the Pi!”
Helen laughs, “I’m trying to imagine you adjusting the water levels for optimal tomato growth.”
It’s my chance to show off more code, and I take it, “I’m pretty happy with how the data object allows me to access fields. The R code to calculate how much water to put on zone one and zone two looks like this,” I say, and begin work on a new napkin…
howMuchToWater <- function(waterByZone, yearDay) { # get the sum rainfall for yesterday, today, and tomorrow recentRainfall <- sum(waterByZone["rainfall", (yearDay - 1):(yearDay + 1)]) recentEVOTRP <- sum(waterByZone["evapotranspiration", (yearDay - 1):(yearDay + 1)]) # calculate needed water for front zone neededRain <- waterByZone["neededInFront", yearDay] - recentRainfall + recentEVOTRP waterByZone["wateredInFront", yearDay] <- ifelse(neededRain <= 0, 0, neededRain) # calculate needed water for rear zone neededRain <- waterByZone["neededInRear", yearDay] - recentRainfall + recentEVOTRP waterByZone["wateredInRear", yearDay] <- ifelse(neededRain <= 0, 0, neededRain) return(waterByZone) }
Guenter looks at the napkin with all of our code scratchings, “I collect data on the Raspberry Pico W using MicroPython, send this data to the Raspberry Pi 4B via MQTT and analyze it on the Pi 4B using an R function (which runs in a loop) and save it in a SQLite database.” he says. “Then I access the database from my laptop via SSH and perform a post analysis with R. The whole thing is a mixture of MicroPython and R.”
At the mention of Python, Bob rejoins the conversation.
How about you?
The post Party Buzz Kill: modifying data appeared first on Mark Niemann-Ross.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.