Update of overviewR with new functions!

[This article was first published on R-post | Cosima Meyer, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We have updated (and extended) overviewR with three new functions:

  • overview_plot
  • overview_heat
  • overview_na

You can also access a detailed overview of all functions in the CheatSheet:

overview_plot

overview_plot illustrates the information that is generated in overview_table in a ggplot graphic. All scope objects (e.g., countries) are listed on the y-axis where horizontal lines indicate the coverage across the entire time frame of the data (x-axis). This helps to spot gaps in the data for specific scope objects and outlines at what time point they occur.

data(toydata)
overview_plot(dat = toydata, id = ccode, time = year)

overview_heat

overview_heat takes a closer look at the time and scope conditions by visualizing the data coverage for each time and scope combination in a ggplot heat map. This function is best explained using an example. Suppose you have a dataset with monthly data for different countries and want to know if data is available for each country in every month. overview_heat intuitively does this by plotting a heat map where each cell indicates the coverage for that specific combination of time and scope (e,g., country-year). As illustrated below, the darker the cell is, the more coverage it has. The plot also indicates the relative or absolute coverage of each cell. For instance, Angola (β€œAGO”) in 1991 shows the coverage of 75%. This means that of all potential 12 months of coverage (12 months for one year), only 9 are covered.

toydata_red <- toydata[-sample(seq_len(nrow(toydata)), 64), ]

overview_heat(toydata_red,
ccode,
year,
perc = TRUE,
exp_total = 12)

overview_na

overview_na is a simple function that provides information about the content of all variables in your data, not only the time and scope conditions. It returns a horizontal ggplot bar plot that indicates the amount of missing data (NAs) for each variable (on the y-axis). You can choose whether to display the relative amount of NAs for each variable in percentage (the default) or the total number of NAs.

toydata_with_na <- toydata %>%
dplyr::mutate(year = ifelse(year < 1992, NA, year),
month = ifelse(month %in% c("Jan", "Jun", "Aug"), NA, month),
gdp = ifelse(gdp < 20000, NA, gdp))

overview_na(toydata_with_na)

overview_na(toydata_with_na, perc = FALSE)

To leave a comment for the author, please follow the link and comment on their blog: R-post | Cosima Meyer.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)