Systematic Sampleing in R with Base R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In this post, we will explore systematic sampling in R using base R functions. Systematic sampling is a technique where you select every (k^{th}) element from a list or dataset. This method is straightforward and useful when you want a representative sample without the complexity of more advanced sampling techniques.

Let’s dive into an example to understand how it works.

What is Systematic Sampling?

Systematic sampling involves selecting every (k^{th}) element from a dataset after a random start. The value of (k) is calculated as:

where (N) is the population size and (n) is the sample size.

Example: Sampling a Dataset

Imagine we have a dataset of 1000 elements, and we want to select a sample of 100 elements using systematic sampling.

  1. Generate a Dataset

First, let’s create a dataset with 1000 elements.

set.seed(123)  # Setting seed for reproducibility, although with this 
               # example it doesn't matter
population <- 1:1000

Here, population is a sequence of numbers from 1 to 1000.

  1. Define Sample Size

Define the number of elements you want to sample.

sample_size <- 100
  1. Calculate Interval (k)

Calculate the interval (k) as the ratio of the population size to the sample size.

k <- length(population) / sample_size
  1. Random Start Point

Choose a random starting point between 1 and (k).

start <- sample(1:k, 1)
  1. Select Every (k^{th}) Element

Use a sequence to select every (k^{th}) element starting from the chosen start point.

systematic_sample <- population[seq(start, length(population), by = k)]
  1. Check the Sample

Print the first few elements of the sample to check.

head(systematic_sample)
[1]  3 13 23 33 43 53

Here is the complete code in one block:

# Step 1: Generate a Dataset
set.seed(123)  # Setting seed for reproducibility
population <- 1:1000

# Step 2: Define Sample Size
sample_size <- 100

# Step 3: Calculate Interval k
k <- length(population) / sample_size

# Step 4: Random Start Point
start <- sample(1:k, 1)

# Step 5: Select Every k-th Element
systematic_sample <- population[seq(start, length(population), by = k)]

# Step 6: Check the Sample
head(systematic_sample)

Try It Yourself!

Systematic sampling is a simple yet powerful technique. By following the steps above, you can apply it to your datasets. Experiment with different sample sizes and starting points to see how the samples vary. This method can be particularly useful when dealing with large datasets where random sampling might be cumbersome.

Give it a go and see how systematic sampling can be a handy tool in your data analysis toolkit!


Happy Coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)