Update of overviewR with new functions!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We have updated (and extended) overviewR with three new functions:
overview_plot
overview_heat
overview_na
You can also access a detailed overview of all functions in the CheatSheet:
overview_plot
overview_plot
illustrates the information that is generated in overview_table
in a ggplot graphic. All scope objects (e.g., countries) are listed on the y-axis where horizontal lines indicate the coverage across the entire time frame of the data (x-axis). This helps to spot gaps in the data for specific scope objects and outlines at what time point they occur.
data(toydata) overview_plot(dat = toydata, id = ccode, time = year)
overview_heat
overview_heat
takes a closer look at the time and scope conditions by visualizing the data coverage for each time and scope combination in a ggplot heat map. This function is best explained using an example. Suppose you have a dataset with monthly data for different countries and want to know if data is available for each country in every month. overview_heat
intuitively does this by plotting a heat map where each cell indicates the coverage for that specific combination of time and scope (e,g., country-year). As illustrated below, the darker the cell is, the more coverage it has. The plot also indicates the relative or absolute coverage of each cell. For instance, Angola (“AGO”) in 1991 shows the coverage of 75%. This means that of all potential 12 months of coverage (12 months for one year), only 9 are covered.
toydata_red <- toydata[-sample(seq_len(nrow(toydata)), 64), ] overview_heat(toydata_red, ccode, year, perc = TRUE, exp_total = 12)
overview_na
overview_na
is a simple function that provides information about the content of all variables in your data, not only the time and scope conditions. It returns a horizontal ggplot bar plot that indicates the amount of missing data (NAs) for each variable (on the y-axis). You can choose whether to display the relative amount of NAs for each variable in percentage (the default) or the total number of NAs.
toydata_with_na <- toydata %>% dplyr::mutate(year = ifelse(year < 1992, NA, year), month = ifelse(month %in% c("Jan", "Jun", "Aug"), NA, month), gdp = ifelse(gdp < 20000, NA, gdp)) overview_na(toydata_with_na)
overview_na(toydata_with_na, perc = FALSE)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.