[This article was first published on Data by John, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I saw a question on reddit today which can be answered using {patientcounter}
The asker wants to know how to work out how many folk are in a hotel, at any given time, based on check in and check out dates.
Here’s the setup:
check_in_date <- c('2010-01-01', '2010-01-02' ,'2010-01-01', '2010-01-08', '2010-01-08', '2010-01-15', '2010-01-15', '2010-01-16', '2010-01-19', '2010-01-22') check_out_date <- c('2010-01-07', '2010-01-04' ,'2010-01-09', '2010-01-21', '2010-01-11', '2010-01-22', NA, '2010-01-20', '2010-01-25', '2010-01-29') #NB - the original question has a string instead of the NA value. # patientcounter can handle the NA, but not the string. Person = c("John", "Smith", "Alex", "Peter", "Will", "Matt", "Tim", "Kevin", "Tom", "Adam")
Make the dataframe:
checkin <- as.POSIXct(as.Date(check_in_date)) # make it a datetime checkout <- as.POSIXct(as.Date(check_out_date)) # make it a datetime hotel <- data.frame(checkin, checkout, Person) hotel ## checkin checkout Person ## 1 2010-01-01 2010-01-07 John ## 2 2010-01-02 2010-01-04 Smith ## 3 2010-01-01 2010-01-09 Alex ## 4 2010-01-08 2010-01-21 Peter ## 5 2010-01-08 2010-01-11 Will ## 6 2010-01-15 2010-01-22 Matt ## 7 2010-01-15 <NA> Tim ## 8 2010-01-16 2010-01-20 Kevin ## 9 2010-01-19 2010-01-25 Tom ## 10 2010-01-22 2010-01-29 Adam
Now for patientcounter to do it’s thing…it’s designed for this exact task – getting a census, or count, of people / goods/ items in or at a place at a point in time.
The interval_census
function does the work:
#remotes::install_github("johnmackintosh/patientcounter") library(patientcounter) # calculate the number of occupants per day occupancy <- interval_census(hotel, identifier = "Person", admit = "checkin", discharge = "checkout", time_unit = '1 day', results = 'total') # grab the date and count columns: occupancy[,.(base_date, N)] ## base_date N ## 1: 2010-01-01 2 ## 2: 2010-01-02 3 ## 3: 2010-01-03 3 ## 4: 2010-01-04 2 ## 5: 2010-01-05 2 ## 6: 2010-01-06 2 ## 7: 2010-01-07 1 ## 8: 2010-01-08 3 ## 9: 2010-01-09 2 ## 10: 2010-01-10 2 ## 11: 2010-01-11 1 ## 12: 2010-01-12 1 ## 13: 2010-01-13 1 ## 14: 2010-01-14 1 ## 15: 2010-01-15 3 ## 16: 2010-01-16 4 ## 17: 2010-01-17 4 ## 18: 2010-01-18 4 ## 19: 2010-01-19 5 ## 20: 2010-01-20 4 ## 21: 2010-01-21 3 ## 22: 2010-01-22 3 ## 23: 2010-01-23 3 ## 24: 2010-01-24 3 ## 25: 2010-01-25 2 ## 26: 2010-01-26 2 ## 27: 2010-01-27 2 ## 28: 2010-01-28 2 ## base_date N
Job Done 🙂
There were several answers, and maybe a package is overkill, but this function is fast and scales well. It’s also pretty flexible and has applications outside of healthcare.
To leave a comment for the author, please follow the link and comment on their blog: Data by John.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.