Applications of randomNames (an R-Package): avoid real names!

[This article was first published on R Stories, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Why avoid real names?

Using real names (or other personal identities) in research or related activities such as publishing on the web or in journals, presentations, and/or demonstrations can be problematic due to privacy concerns, existing data protection laws and regulations, and ethical obligations. Examples of highly sensitive data include student academic records, health records, and financial records.

To ensure the safety and privacy of all human subjects, many universities and other institutions establish oversight committees known as Institutional Review Boards (IRBs). The board is tasked with minimizing risk to participants. When using data collected from institutional participants, users are required to sign their proposals on the usage of the data, which, among other things, is required to include a declaration to use anonymous or pseudonymous names to avoid the use of real names.

The ‘randomNames’ R-Package

One of my favorite R Packages is the ‘randomNames’ package which is actually simple and easy to use. It has a single function allowing users to generate random first and last names.

Here are a few simple usage examples.

Generate 6 random names as “Last, First”

library(randomNames)
randomNames(6) 
## [1] "Montoya, Jesus" "Gonzales, Seidy" "Galvan, Clint" "el-Baig, Misbaah"
## [5] "Juniel, Jasmine" "Martel, Katelyn"

Generate 6 random female names as “Last, First”

randomNames(6, gender = 1) 
## [1] "el-Mohammed, Ruqayya" "Fischer, Taylor" "al-Shariff, Afeefa"
## [4] "Wall, Patricia" "Guha, Judy" "al-Murad, Ruqayya"

Generate 6 random names with half of them male names as “Last, First”

randomNames(6, gender = c(0,0,0,1,1,1)) 
## [1] "Cooper, William" "Leewaye, Dustin" "Miller, Nathan" "Lee, Janna"
## [5] "Hawley, Aijah" "Duncan, Alyssa"

Generate 6 random names with three of them African Americans and the other three Whites (not Hispanic) as “Last, First”

randomNames(6, ethnicity = c(3,3,3,5,5,5)) 
## [1] "Kirkman, Damien" "Salas, Sierra" "Lawson, Jasmine" "Harris, Joshua"
## [5] "Hoffman, Logan" "Potter, Tyler"

More example

Below is an example of a file which is publicly available by X-University. It is a schedule of classes at a fall semester generated by a software by a company called Ellucian. The data has variable names such as enrollment number (ENRLD), instructor name (INSTRUCTOR), credit hours (HRS), etc, for schedule of classes. Many higher education institutions use the same software to generate class schedules hence have similar outputs. Here you can find some examples of similar outputs: Savannah State University Spring 2024 Class Schedule, Fort Valley State University Class Schedule, Benedict College Class Schedule Fall 2022.Some of them allow users to download the schedules as an excel or csv file. One can also use cut/paste to use the data.

For the purpose of this demonstration, below is a schedule of spring 2024 classes by X-University; and we want to change instructor names in the schedule by random names.

library(dplyr)
schedule<-read.csv("X-College_Spring_2024_Classes_Schedule.csv")
schedule<-subset(schedule,INSTRUCTOR !="") #skip classes with no instructor
schedule<-filter(schedule, P.of.T==1) #Session 1 Classes only
head(schedule)
## P.of.T SUBJ NUMB TITLE HRS ENRLD MAXENRL TIMES
## 1 1 ACCT 2101 PRINCIPLES OF ACCOUNTING I 3 28 30
## 2 1 ACCT 2101 PRINCIPLES OF ACCOUNTING I 3 30 30 11:00-11:50
## 3 1 ACCT 2102 PRINCIPLES OF ACCOUNTING II 3 16 30
## 4 1 ACCT 2102 PRINCIPLES OF ACCOUNTING II 3 15 30 12:30-01:45
## 5 1 ACCT 3103 INTERM ACCOUNTING I 3 7 30 03:30-04:45
## 6 1 ACCT 4123 COST ACCOUNTING 3 9 30 05:15-06:30
## DAYS INSTRUCTOR
## 1 Haile, Brandie
## 2 M W F Lopez, Elizabeth
## 3 Haile, Brandie
## 4 T R Lopez, Elizabeth
## 5 T R Haile, Brandie
## 6 M W el-Hassan, Nawaar

Instructor names show under the variable ‘INSTRUCTOR’ and we want to change them by random names while keeping everything else in the data unchanged.

original_names<-unique(schedule$INSTRUCTOR)
random_names<-randomNames(length(original_names)) #Random names to replace original names
for(i in 1:length(original_names))
{
schedule[schedule == schedule$INSTRUCTOR[i]]<-random_names[i]
}
head(schedule)
## P.of.T SUBJ NUMB TITLE HRS ENRLD MAXENRL TIMES
## 1 1 ACCT 2101 PRINCIPLES OF ACCOUNTING I 3 28 30
## 2 1 ACCT 2101 PRINCIPLES OF ACCOUNTING I 3 30 30 11:00-11:50
## 3 1 ACCT 2102 PRINCIPLES OF ACCOUNTING II 3 16 30
## 4 1 ACCT 2102 PRINCIPLES OF ACCOUNTING II 3 15 30 12:30-01:45
## 5 1 ACCT 3103 INTERM ACCOUNTING I 3 7 30 03:30-04:45
## 6 1 ACCT 4123 COST ACCOUNTING 3 9 30 05:15-06:30
## DAYS INSTRUCTOR
## 1 Morgan, Hershel
## 2 M W F Perkins, Kara
## 3 Morgan, Hershel
## 4 T R Perkins, Kara
## 5 T R Morgan, Hershel
## 6 M W Hong, Prachi

There we have it, all instructor names under “INSTRUCTOR” in the data are replaced by random names!

References

[1] randomNames package was created by Damian Betebenner and the repository is available in GitHub at randomNames (version 1.6-0.0).

To leave a comment for the author, please follow the link and comment on their blog: R Stories.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)