2023

R {targets}: How to Make Reproducible Pipelines for Data Science and Machine Learning

June 27, 2023 | Dario Radečić

The R {targets} package is a pipeline tool for statistics, data science, and machine learning in R. The package allows you to write and maintain reproducible R workflows in pipelines that run only when necessary (e.g., either data or code has changed). The best part is – you’ll learn ...

[Read more...]

Tidy Tuesday: US Populated Places

June 26, 2023 | Louise E. Sinks

Today’s TidyTuesday is about place names as recorded by the US Board on Geographic Names. The dataset has been cleaned to include only populated places. This week will involve more libraries than normal, since I am going to play with mapping. library(tidyverse) # who doesn't want to be tidy? ...

[Read more...]

The ave() Function in R

June 26, 2023 | Steven P. Sanderson II, MPH

Introduction In the world of data analysis and statistics, grouping data based on certain criteria is a common task. Whether you’re working with large datasets or analyzing trends within smaller subsets, having a reliable and efficient tool for ... [Read more...]

ggplotting power curves from simr package

June 26, 2023 | R on Pablo Bernabeu

The R package simr has greatly facilitated power analysis for mixed-effects models using Monte Carlo simulation (i.e., hundreds or thousands of tests under slight variations of the data). The powerCurve function is used to estimate the statistical power for various sample sizes in one go. Since it runs serially, ...

[Read more...]

%dofuture% – a Better foreach() Parallelization Operator than %dopar%

June 26, 2023 | JottR on R

[Read more...]

R for Predictive Modeling and Data Visualization in Turkey

June 26, 2023 | R Consortium

Mustafa Cavus, organizer of the Eskisehir R User Group, in Turkey, discussed the diverse and thriving R community in Eskisehir. He shared the details of a 4-day event hosted by... The post R for Predictive Modeling and Data Visualization in Turkey appeared first on R Consortium.

[Read more...]

Tidy Freedom Index as an R Package

June 25, 2023 | pacha.dev/blog

R and Shiny Training: If you find this blog to be interesting, please note that I offer personalized and group-based training sessions that may be reserved through Buy me a Coffee. Additionally, I provide training services in the Spanish language ... [Read more...]

Visualization in R: Unleashing the Power of the abline() Function

June 25, 2023 | Steven P. Sanderson II, MPH

Introduction Welcome to the world of data visualization in R! In this blog post, we will explore the abline() function, a versatile tool that allows you to add straight lines to your plots effortlessly. Whether you’re a beginner or an experience...

[Read more...]

The differences of left join in SQL and R

June 25, 2023 | Aster Hu

Recently, I encountered a situation where I needed to translate an Access SQL query to R, and I noticed the contrasting behaviors of these two languages when it comes to handling NA/NULL values in left joins. The impact of NA/NULL values on join... [Read more...]

The differences of left join in SQL and R

June 25, 2023 | Aster Hu

Recently, I encountered a situation where I needed to translate an Access SQL query to R, and I noticed the contrasting behaviors of these two languages when it comes to handling NA/NULL values in left joins. The impact of NA/NULL values on joins... [Read more...]

The differences of left join in SQL and R

June 25, 2023 | Aster Hu

Recently, I encountered a situation where I needed to translate an Access SQL query to R, and I noticed the contrasting behaviors of these two languages when it comes to handling NA/NULL values in left joins. The impact of NA/NULL values on joins... [Read more...]

Order Constraints in Bayes Models (with brms)

June 25, 2023 | Stat's What It's All About

Over a year ago, while listing to a very not-at-all-statistical podcast, I discovered that Bayesian modeling is widely used in archaeology since the mid 90s to calibrate carbon dating.1 Carbon dating is a scientific method used to determine the a...

$\mu_i$

[Read more...]

How to break down colour variable in sjPlot::plot_model into equally-sized bins

June 24, 2023 | R on Pablo Bernabeu

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). For instance, using the plot_model function, I plotted the interaction between two continuous variables.

library(lme4)
#> Loading required package: Matrix
library(sjPlot)
#> Learn more about sjPlot with 'browseVignettes("sjPlot")'.
library(ggplot2)

theme_set(theme_sjplot())

# Create data partially based on code by Ben Bolker  
# from https://stackoverflow.com/a/38296264/7050882

set.seed(101)

spin = runif(800, 1, 24)

trait = rep(1:40, each = 20)

ID = rep(1:80, each = 10)

testdata <- data.frame(spin, trait, ID)

testdata$fatigue <- 
  testdata$spin * testdata$trait / 
  rnorm(800, mean = 6, sd = 2)

# Model
fit = lmer(fatigue ~ spin * trait + (1|ID),
           data = testdata, REML = TRUE)
#> boundary (singular) fit: see help('isSingular')

plot_model(fit, type = 'pred', terms = c('spin', 'trait'))
#> Warning: Ignoring unknown parameters: linewidth

...

[Read more...]

How to break up colour variable in sjPlot into equally-sized bins

June 24, 2023 | R on Pablo Bernabeu

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). For instance, using the plot_model function, I plotted the interaction between two continuous variables.

library(lme4)
#> Loading required package: Matrix
library(sjPlot)
library(ggplot2)

theme_set(theme_sjplot())

# Create data using code by Ben Bolker from 
# https://stackoverflow.com/a/38296264/7050882

set.seed(101)
spin = runif(600, 1, 24)
reg = runif(600, 1, 15)
ID = rep(c("1","2","3","4","5", "6", "7", "8", "9", "10"))
day = rep(1:30, each = 10)
testdata <- data.frame(spin, reg, ID, day)
testdata$fatigue <- testdata$spin * testdata$reg/10 * rnorm(30, mean=3, sd=2)

fit = lmer(fatigue ~ spin * reg + (1|ID),
           data = testdata, REML = TRUE)

plot_model(fit, type = 'pred', terms = c('spin', 'reg'))
#> Warning: Ignoring unknown parameters: linewidth

...

[Read more...]

How to map more informative values onto fill argument of sjPlot::plot_model

June 24, 2023 | R on Pablo Bernabeu

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). For instance, using the plot_model function, I plotted the interaction between a continuous variable and ...

[Read more...]

How to visually assess the convergence of a mixed-effects model by plotting various optimizers

June 24, 2023 | R on Pablo Bernabeu

To assess whether convergence warnings render the results invalid, or on the contrary, the results can be deemed valid in spite of the warnings, Bates et al. (2023) suggest refitting models affected by convergence warnings with a variety of optimizers. The authors argue that, if the different optimizers produce practically-equivalent results, ...

[Read more...]

Student’s t-test explained with R and Pokemon

June 23, 2023 | pacha.dev/blog

R and Shiny Training: If you find this blog to be interesting, please note that I offer personalized and group-based training sessions that may be reserved through Buy me a Coffee. Additionally, I provide training services in the Spanish language ...

$H_1: \mu^{\text{electric}} \neq \mu^{\text{water}}$

[Read more...]

(Update) How to install RStudio, RStudio Server and Quarto with ‘apt install’

June 23, 2023 | pacha.dev/blog

R and Shiny Training: If you find this blog to be interesting, please note that I offer personalized and group-based training sessions that may be reserved through Buy me a Coffee. Additionally, I provide training services in the Spanish language ... [Read more...]

Table joins with conditional “fuzzy” string matching in R

June 23, 2023 | R on Pablo Bernabeu

Here’s an example of fuzzy-matching strings in R that I shared on StackOverflow. In stringdist_join, the max_dist argument is used to constrain the degree of fuzziness.

library(fuzzyjoin)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(knitr)


small_tab = data.frame(Food.Name = c('Corn', 'Squash', 'Peppers'), 
                       Food.Code = c(NA, NA, NA))


large_tab = data.frame(Food.Name = c('Sweet Corn', 'Red Corn', 'Baby Corns', 
                                     'Squash', 'Long Squash', 'Red Pepper', 
                                     'Green Pepper', 'Red Peppers'), 
                       Food.Code = c(532, 532, 944, 111, 123, 654, 655, 654))

joined_tab = stringdist_join(small_tab, large_tab, by = 'Food.Name',
                             ignore_case = TRUE, method = 'cosine', 
                             max_dist = 0.5, distance_col = 'dist') %>%
  
  # Tidy columns 
  select(Food.Name = Food.Name.x, -Food.Name.y, 
         Food.Code = Food.Code.y, -dist) %>%
  
  # Only keep most frequent food code per food name
  group_by(Food.Name) %>% count(Food.Name, Food.Code) %>% 
  slice(which.max(n)) %>% select(-n) %>%
  
  # Order food names as in the small table
  arrange(factor(Food.Name, levels = small_tab$Food.Name))

# Show table with columns renamed
joined_tab %>%
  rename('Food Name' = Food.Name, 
         'Food Code' = Food.Code) %>%
  kable()

Food Name Food Code Corn 532 Squash 111 Peppers 654 Created on 2023-05-31 with reprex v2.0.2 [Read more...]

Weighted versus unweighted percentiles by @ellis2013nz

June 23, 2023 | free range statistics - R

This interesting paper came out recently: A test of the predictive validity of relative versus absolute income for self-reported health and well-being in the United States, by David Brady, Michaela Curran and Richard Carpiano. It uses a large sample of... [Read more...]

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)