2023

Tidy Tuesday: US Populated Places

June 26, 2023 | Louise E. Sinks

Today’s TidyTuesday is about place names as recorded by the US Board on Geographic Names. The dataset has been cleaned to include only populated places. This week will involve more libraries than normal, since I am going to play with mapping. library(tidyverse) # who doesn't want to be tidy? ...
[Read more...]

The ave() Function in R

June 26, 2023 | Steven P. Sanderson II, MPH

Introduction In the world of data analysis and statistics, grouping data based on certain criteria is a common task. Whether you’re working with large datasets or analyzing trends within smaller subsets, having a reliable and efficient tool for ... [Read more...]

ggplotting power curves from simr package

June 26, 2023 | R on Pablo Bernabeu

The R package simr has greatly facilitated power analysis for mixed-effects models using Monte Carlo simulation (i.e., hundreds or thousands of tests under slight variations of the data). The powerCurve function is used to estimate the statistical power for various sample sizes in one go. Since it runs serially, ...
[Read more...]

Tidy Freedom Index as an R Package

June 25, 2023 | pacha.dev/blog

R and Shiny Training: If you find this blog to be interesting, please note that I offer personalized and group-based training sessions that may be reserved through Buy me a Coffee. Additionally, I provide training services in the Spanish language ... [Read more...]

The differences of left join in SQL and R

June 25, 2023 | Aster Hu

Recently, I encountered a situation where I needed to translate an Access SQL query to R, and I noticed the contrasting behaviors of these two languages when it comes to handling NA/NULL values in left joins. The impact of NA/NULL values on join... [Read more...]

The differences of left join in SQL and R

June 25, 2023 | Aster Hu

Recently, I encountered a situation where I needed to translate an Access SQL query to R, and I noticed the contrasting behaviors of these two languages when it comes to handling NA/NULL values in left joins. The impact of NA/NULL values on joins... [Read more...]

The differences of left join in SQL and R

June 25, 2023 | Aster Hu

Recently, I encountered a situation where I needed to translate an Access SQL query to R, and I noticed the contrasting behaviors of these two languages when it comes to handling NA/NULL values in left joins. The impact of NA/NULL values on joins... [Read more...]

How to break down colour variable in sjPlot::plot_model into equally-sized bins

June 24, 2023 | R on Pablo Bernabeu

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). For instance, using the plot_model function, I plotted the interaction between two continuous variables.
library(lme4)
#> Loading required package: Matrix
library(sjPlot)
#> Learn more about sjPlot with 'browseVignettes("sjPlot")'.
library(ggplot2)

theme_set(theme_sjplot())

# Create data partially based on code by Ben Bolker  
# from https://stackoverflow.com/a/38296264/7050882

set.seed(101)

spin = runif(800, 1, 24)

trait = rep(1:40, each = 20)

ID = rep(1:80, each = 10)

testdata <- data.frame(spin, trait, ID)

testdata$fatigue <- 
  testdata$spin * testdata$trait / 
  rnorm(800, mean = 6, sd = 2)

# Model
fit = lmer(fatigue ~ spin * trait + (1|ID),
           data = testdata, REML = TRUE)
#> boundary (singular) fit: see help('isSingular')

plot_model(fit, type = 'pred', terms = c('spin', 'trait'))
#> Warning: Ignoring unknown parameters: linewidth
...
[Read more...]

How to break up colour variable in sjPlot into equally-sized bins

June 24, 2023 | R on Pablo Bernabeu

Whereas the direction of main effects can be interpreted from the sign of the estimate, the interpretation of interaction effects often requires plots. This task is facilitated by the R package sjPlot (Lüdecke, 2022). For instance, using the plot_model function, I plotted the interaction between two continuous variables.
library(lme4)
#> Loading required package: Matrix
library(sjPlot)
library(ggplot2)

theme_set(theme_sjplot())

# Create data using code by Ben Bolker from 
# https://stackoverflow.com/a/38296264/7050882

set.seed(101)
spin = runif(600, 1, 24)
reg = runif(600, 1, 15)
ID = rep(c("1","2","3","4","5", "6", "7", "8", "9", "10"))
day = rep(1:30, each = 10)
testdata <- data.frame(spin, reg, ID, day)
testdata$fatigue <- testdata$spin * testdata$reg/10 * rnorm(30, mean=3, sd=2)

fit = lmer(fatigue ~ spin * reg + (1|ID),
           data = testdata, REML = TRUE)

plot_model(fit, type = 'pred', terms = c('spin', 'reg'))
#> Warning: Ignoring unknown parameters: linewidth
...
[Read more...]

Table joins with conditional “fuzzy” string matching in R

June 23, 2023 | R on Pablo Bernabeu

Here’s an example of fuzzy-matching strings in R that I shared on StackOverflow. In stringdist_join, the max_dist argument is used to constrain the degree of fuzziness.
library(fuzzyjoin)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(knitr)


small_tab = data.frame(Food.Name = c('Corn', 'Squash', 'Peppers'), 
                       Food.Code = c(NA, NA, NA))


large_tab = data.frame(Food.Name = c('Sweet Corn', 'Red Corn', 'Baby Corns', 
                                     'Squash', 'Long Squash', 'Red Pepper', 
                                     'Green Pepper', 'Red Peppers'), 
                       Food.Code = c(532, 532, 944, 111, 123, 654, 655, 654))

joined_tab = stringdist_join(small_tab, large_tab, by = 'Food.Name',
                             ignore_case = TRUE, method = 'cosine', 
                             max_dist = 0.5, distance_col = 'dist') %>%
  
  # Tidy columns 
  select(Food.Name = Food.Name.x, -Food.Name.y, 
         Food.Code = Food.Code.y, -dist) %>%
  
  # Only keep most frequent food code per food name
  group_by(Food.Name) %>% count(Food.Name, Food.Code) %>% 
  slice(which.max(n)) %>% select(-n) %>%
  
  # Order food names as in the small table
  arrange(factor(Food.Name, levels = small_tab$Food.Name))

# Show table with columns renamed
joined_tab %>%
  rename('Food Name' = Food.Name, 
         'Food Code' = Food.Code) %>%
  kable()
Food Name Food Code Corn 532 Squash 111 Peppers 654 Created on 2023-05-31 with reprex v2.0.2 [Read more...]
1 24 25 26 27 28 66

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)