Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The 3.8.0 release of simmer, the Discrete-Event Simulator for R, hit CRAN almost a week ago, and Windows binaries are already available. This version includes two highly requested new features that justify this second consecutive minor release.
Attachment of precomputed data
Until v3.7.0, the generator was the only means to attach data to trajectories, and it was primarily intended for dynamic generation of arrivals:
library(simmer) set.seed(42) hello_sayer <- trajectory() %>% log_("hello!") simmer() %>% add_generator("dummy", hello_sayer, function() rexp(1, 1)) %>% run(until=2) ## 0.198337: dummy0: hello! ## 0.859232: dummy1: hello! ## 1.14272: dummy2: hello! ## 1.18091: dummy3: hello! ## 1.65409: dummy4: hello! ## simmer environment: anonymous | now: 2 | next: 3.11771876826972 ## { Monitor: in memory } ## { Source: dummy | monitored: 1 | n_generated: 6 }
Although it may be used to attach precomputed data too, especially using the at()
adaptor:
simmer() %>% add_generator("dummy", hello_sayer, at(seq(0, 10, 0.5))) %>% run(until=2) ## 0: dummy0: hello! ## 0.5: dummy1: hello! ## 1: dummy2: hello! ## 1.5: dummy3: hello! ## simmer environment: anonymous | now: 2 | next: 2 ## { Monitor: in memory } ## { Source: dummy | monitored: 1 | n_generated: 21 }
Now, let’s say that we want to attach some empirical data, and our observations not only include arrival times, but also priorities and some attributes (e.g., measured service times), as in this question on StackOverflow:
myData <- data.frame( time = c(1:10,1:5), priority = 1:3, duration = rnorm(15, 50, 5)) %>% dplyr::arrange(time)
This is indeed possible using generators, but it requires some trickery; more specifically, the clever usage of a consumer function as follows:
consume <- function(x, prio=FALSE) { i <- 0 function() { i <<- i + 1 if (prio) c(x[[i]], x[[i]], FALSE) else x[[i]] } } activityTraj <- trajectory() %>% seize("worker") %>% timeout_from_attribute("duration") %>% release("worker") initialization <- trajectory() %>% set_prioritization(consume(myData$priority, TRUE)) %>% set_attribute("duration", consume(myData$duration)) %>% join(activityTraj) arrivals_gen <- simmer() %>% add_resource("worker", 2, preemptive=TRUE) %>% add_generator("dummy_", initialization, at(myData$time)) %>% run() %>% get_mon_arrivals() # check the resulting duration times activity_time <- arrivals_gen %>% tidyr::separate(name, c("prefix", "n"), convert=TRUE) %>% dplyr::arrange(n) %>% dplyr::pull(activity_time) all(activity_time == myData$duration) ## [1] TRUE
Since this v3.8.0, the new data source add_dataframe
greatly simplifies this process:
arrivals_df <- simmer() %>% add_resource("worker", 2, preemptive=TRUE) %>% add_dataframe("dummy_", activityTraj, myData, time="absolute") %>% run() %>% get_mon_arrivals() identical(arrivals_gen, arrivals_df) ## [1] TRUE
On-disk monitoring
As some users noted (see 1, 2), the default in-memory monitoring capabilities can turn problematic for very long simulations. To address this issue, the simmer()
constructor gains a new argument, mon
, to provide different types of monitors. Monitoring is still performed in-memory by default, but as of v3.8.0, it can be offloaded to disk through monitor_delim()
and monitor_csv()
, which produce flat delimited files.
mon <- monitor_csv() mon ## simmer monitor: to disk (delimited files) ## { arrivals: /tmp/RtmpAlQH2g/file6933ce99281_arrivals.csv } ## { releases: /tmp/RtmpAlQH2g/file6933ce99281_releases.csv } ## { attributes: /tmp/RtmpAlQH2g/file6933ce99281_attributes.csv } ## { resources: /tmp/RtmpAlQH2g/file6933ce99281_resources.csv } env <- simmer(mon=mon) %>% add_generator("dummy", hello_sayer, function() rexp(1, 1)) %>% run(until=2) ## 0.26309: dummy0: hello! ## 0.982183: dummy1: hello! env ## simmer environment: anonymous | now: 2 | next: 2.29067480322535 ## { Monitor: to disk (delimited files) } ## { arrivals: /tmp/RtmpAlQH2g/file6933ce99281_arrivals.csv } ## { releases: /tmp/RtmpAlQH2g/file6933ce99281_releases.csv } ## { attributes: /tmp/RtmpAlQH2g/file6933ce99281_attributes.csv } ## { resources: /tmp/RtmpAlQH2g/file6933ce99281_resources.csv } ## { Source: dummy | monitored: 1 | n_generated: 3 } read.csv(mon$handlers["arrivals"]) # direct access ## name start_time end_time activity_time finished ## 1 dummy0 0.2630904 0.2630904 0 1 ## 2 dummy1 0.9821828 0.9821828 0 1 get_mon_arrivals(env) # adds the "replication" column ## name start_time end_time activity_time finished replication ## 1 dummy0 0.2630904 0.2630904 0 1 1 ## 2 dummy1 0.9821828 0.9821828 0 1 1
See below for a comprehensive list of changes.
New features:
- New data source
add_dataframe
enables the attachment of precomputed data, in the form of a data frame, to a trajectory. It can be used instead of (or along with)add_generator
. The most notable advantage over the latter is thatadd_dataframe
is able to automatically set attributes and prioritisation values per arrival based on columns of the provided data frame (#140 closing #123). - New
set_source
activity deprecatesset_distribution()
. It works both for generators and data sources (275a09c, as part of #140). - New monitoring interface allows for disk offloading. The
simmer()
constructor gains a new argumentmon
to provide different types of monitors. By default, monitoring is performed in-memory, as usual. Additionally, monitoring can be offloaded to disk throughmonitor_delim
andmonitor_csv
, which produce flat delimited files. But more importantly, the C++ interface has been refactorised to enable the development of new monitoring backends (#146 closing #119).
Minor changes and fixes:
- Some documentation improvements (1e14ed7, 194ed05).
- New default
until=Inf
for therun
method (3e6aae9, as part of #140). branch
andclone
now accept lists of trajectories, in the same way asjoin
, so that there is no need to usedo.call
(#142).- The argument
continue
(present inseize
andbranch
) is recycled if only one value is provided but several sub-trajectories are defined (#143). - Fix process reset: sources are reset in strict order of creation (e7d909b).
- Fix infinite timeouts (#144).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.