Site icon R-bloggers

Waffle Geoms & Other Miscellaneous In-Development Package Updates

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

More than just sergeant has been hacked on recently, so here’s a run-down of various updates:

waffle

The square pie chart generating waffle package now contains a nascent geom_waffle() so you can do things like this:

library(hrbrthemes)
library(waffle)
library(tidyverse)

tibble(
  parts = factor(rep(month.abb[1:3], 3), levels=month.abb[1:3]),
  values = c(10, 20, 30, 6, 14, 40, 30, 20, 10),
  fct = c(rep("Thing 1", 3), rep("Thing 2", 3), rep("Thing 3", 3))
) -> xdf

ggplot(xdf, aes(fill=parts, values=values)) +
  geom_waffle(color = "white", size=1.125, n_rows = 6) +
  facet_wrap(~fct, ncol=1) +
  scale_x_discrete(expand=c(0,0)) +
  scale_y_discrete(expand=c(0,0)) +
  ggthemes::scale_fill_tableau(name=NULL) +
  coord_equal() +
  labs(
    title = "Faceted Waffle Geoms"
  ) +
  theme_ipsum_rc(grid="") +
  theme_enhance_waffle()

and get:

It’s super brand new so pls file issues (wherev you like besides blog comments as they’re not conducive to package triaging) if anything breaks or you need more aesthetic configuration options. NOTE: You need to use the 1.0.0 branch as noted in the master branch README.

markdowntemplates

I had to take a quick peek at markdowntemplates due to a question from a blog reader about the Jupyter notebook generation functionality. While I was in the code I added two new bits to the knit: markdowntemplates::to_jupyter code. First is the option to specify a run: parameter in the YAML header so you can just knit the document to a Jupyter notebook without executing the chunks:

---
title: "ggplot2 example"
knit: markdowntemplates::to_jupyter
run: false
--- 

If run is not present it defaults to true.

The other add is a bit of intelligence to whether it should include %load_ext rpy2.ipython (the Jupyter “magic” that lets it execute R chunks). If no R code chunks are present, rpy2.ipython will not be loaded.

securitytrails

SecurityTrails is a service for cybersecurity researchers & defenders that provides tools and an API to aid in querying for all sorts of current and historical information on domains and IP addresses. It now (finally) has a mostly-complete R package securitytrails. They’re research partners of $DAYJOB and their API is so give it a spin if you are looking to broaden your threat-y API collection.

astools

Keeping the cyber theme going for a bit, next up is astools) which are “Tools to Work With Autonomous System (‘AS’) Network and Organization Data”. Autonomous Systems (AS) are at the core of the internet (we all live in one) and this package provides tools to fetch AS data/metadata from various sources and work with it in R. For instance, we can grab the latest RouteViews data:

(rv_df <- routeviews_latest())
## # A tibble: 786,035 x 6
##    cidr         asn   minimum_ip maximum_ip  min_numeric max_numeric
##    <chr>        <chr> <chr>      <chr>             <dbl>       <dbl>
##  1 1.0.0.0/24   13335 1.0.0.0    1.0.0.255      16777216    16777471
##  2 1.0.4.0/22   56203 1.0.4.0    1.0.7.255      16778240    16779263
##  3 1.0.4.0/24   56203 1.0.4.0    1.0.4.255      16778240    16778495
##  4 1.0.5.0/24   56203 1.0.5.0    1.0.5.255      16778496    16778751
##  5 1.0.6.0/24   56203 1.0.6.0    1.0.6.255      16778752    16779007
##  6 1.0.7.0/24   56203 1.0.7.0    1.0.7.255      16779008    16779263
##  7 1.0.16.0/24  2519  1.0.16.0   1.0.16.255     16781312    16781567
##  8 1.0.64.0/18  18144 1.0.64.0   1.0.127.255    16793600    16809983
##  9 1.0.128.0/17 23969 1.0.128.0  1.0.255.255    16809984    16842751
## 10 1.0.128.0/18 23969 1.0.128.0  1.0.191.255    16809984    16826367
## # ... with 786,025 more rows

That, in turn, can work with iptools::ip_to_asn() so we can figure out which AS an IP address lives in:

rv_trie <- as_asntrie(rv_df)

iptools::ip_to_asn(rv_trie, "174.62.167.97")
## [1] "7922"

It can also fetch AS name info:

asnames_current()
## # A tibble: 63,453 x 4
##    asn   handle       asinfo                                                iso2c
##    <chr> <chr>        <chr>                                                 <chr>
##  1 1     LVLT-1       Level 3 Parent, LLC                                   US   
##  2 2     UDEL-DCN     University of Delaware                                US   
##  3 3     MIT-GATEWAYS Massachusetts Institute of Technology                 US   
##  4 4     ISI-AS       University of Southern California                     US   
##  5 5     SYMBOLICS    Symbolics, Inc.                                       US   
##  6 6     BULL-HN      Bull HN Information Systems Inc.                      US   
##  7 7     DSTL         DSTL                                                  GB   
##  8 8     RICE-AS      Rice University                                       US   
##  9 9     CMU-ROUTER   Carnegie Mellon University                            US   
## 10 10    CSNET-EXT-AS CSNET Coordination and Information Center (CSNET-CIC) US   
## # ... with 63,443 more rows

which we can use for further enrichment:

routeviews_latest() %>% 
  left_join(asnames_current())
## Joining, by = "asn"

## # A tibble: 786,035 x 9
##    cidr         asn   minimum_ip maximum_ip  min_numeric max_numeric handle            asinfo                     iso2c
##    <chr>        <chr> <chr>      <chr>             <dbl>       <dbl> <chr>             <chr>                      <chr>
##  1 1.0.0.0/24   13335 1.0.0.0    1.0.0.255      16777216    16777471 CLOUDFLARENET     Cloudflare, Inc.           US   
##  2 1.0.4.0/22   56203 1.0.4.0    1.0.7.255      16778240    16779263 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  3 1.0.4.0/24   56203 1.0.4.0    1.0.4.255      16778240    16778495 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  4 1.0.5.0/24   56203 1.0.5.0    1.0.5.255      16778496    16778751 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  5 1.0.6.0/24   56203 1.0.6.0    1.0.6.255      16778752    16779007 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  6 1.0.7.0/24   56203 1.0.7.0    1.0.7.255      16779008    16779263 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  7 1.0.16.0/24  2519  1.0.16.0   1.0.16.255     16781312    16781567 VECTANT           ARTERIA Networks Corporat… JP   
##  8 1.0.64.0/18  18144 1.0.64.0   1.0.127.255    16793600    16809983 AS-ENECOM         Energia Communications,In… JP   
##  9 1.0.128.0/17 23969 1.0.128.0  1.0.255.255    16809984    16842751 TOT-NET           TOT Public Company Limited TH   
## 10 1.0.128.0/18 23969 1.0.128.0  1.0.191.255    16809984    16826367 TOT-NET           TOT Public Company Limited TH   
## # ... with 786,025 more rows

Note that routeviews_latest() and asnames_current() cache the data so there is no re-downloading unless you clear the local cache.

docxtractr

The docxtractr package recently got a CRAN push due to some changes in the tibble but it also include a new feature that lets you accept or reject “tracked changes” before trying to extract tables/comments from a document without harming/changing the original document.

ednstest

DNS Flag Day is fast approaching. What is “DNS Flag Day”? It’s a day when yet-another cabal of large-scale DNS providers and tech heavy hitters decided that they know what’s best for the internet and are mandating compliance with RFC 6891 (EDNS). Honestly, there’s no good reason to run crappy DNS servers and no good reason not to support EDNS.

You could just go to the flag day site and test your provider (by entering your domain name, if you have one). But, you can also load the package, and run it locally (it still calls their API since it’s open and provides a very detailed results page if your DNS server isn’t compliant). You can just run it to get compact output and an auto-load of the report page in your browser or save off the returned object and inspect it to see what tests failed.

I ran it on a few domains that are likely familiar to readers and this is what it showed:

edns_test("rud.is")
## EDNS compliance test for [rud.is] has ✔ PASSED!
## Report URL: https://ednscomp.isc.org/ednscomp/60049cb032

edns_test("rstudio.com")
## EDNS compliance test for [rstudio.com] has ✖ FAILED
## Report URL: https://ednscomp.isc.org/ednscomp/54e2057229

edns_test("r-project.org")
## EDNS compliance test for [r-project.org] has ✔ PASSED!
## Report URL: https://ednscomp.isc.org/ednscomp/839ee9c9af

The print() function in the package also has some minimal cli and crayon usage in it if you’re looking to jazz up your R console output.

ulid

Finally, there’s ulid which is a package to make “Universally Unique Lexicographically Sortable Identifiers in R”. These ULIDs have the following features:

They’re made up of

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

The timestamp is a 48 bit integer representing UNIX-time in milliseconds and the randomness is an 80 bit cryptographically secure source of randomness (where possible). Read more in the full specification.

You can get one ULID easily:

ulid::ULIDgenerate()
## [1] "0001E2ERKHVPKZJ6FA6ZWHH1KS"

Generate a whole bunch of ’em:

(u <- ulid::ULIDgenerate(20))
##  [1] "0001E2ERKHVX5QF5D59SX2E65T" "0001E2ERKHKD6MHKYB1G8JHN5X" "0001E2ERKHTK0XEHVV2G5877K9" "0001E2ERKHKFGG5NPN24PC1N0W"
##  [5] "0001E2ERKH3F48CAKJCVMSCBKS" "0001E2ERKHF3N0B94VK05GTXCW" "0001E2ERKH24GCJ2CT3Z5WM1FD" "0001E2ERKH381RJ232KK7SMWQW"
##  [9] "0001E2ERKH7NAZ1T4HR4ZRQRND" "0001E2ERKHSATC17G2QAPYXE0C" "0001E2ERKH76R83NFST3MZNW84" "0001E2ERKHFKS52SD8WJ8FHXMV"
## [13] "0001E2ERKHQM6VBM5JB235JJ1W" "0001E2ERKHXG2KNYWHHFS8X69Z" "0001E2ERKHQW821KPRM4GQFANJ" "0001E2ERKHD5KWTM5S345A3RP4"
## [17] "0001E2ERKH0D901W6KX66B1BHE" "0001E2ERKHKPHZBFSC16FC7FFC" "0001E2ERKHQQH7315GMY8HRYXV" "0001E2ERKH016YBAJAB7K9777T"

and “unmarshal” them (which gets you the timestamp back):

unmarshal(u)
##                     ts              rnd
## 1  2018-12-29 07:02:57 VX5QF5D59SX2E65T
## 2  2018-12-29 07:02:57 KD6MHKYB1G8JHN5X
## 3  2018-12-29 07:02:57 TK0XEHVV2G5877K9
## 4  2018-12-29 07:02:57 KFGG5NPN24PC1N0W
## 5  2018-12-29 07:02:57 3F48CAKJCVMSCBKS
## 6  2018-12-29 07:02:57 F3N0B94VK05GTXCW
## 7  2018-12-29 07:02:57 24GCJ2CT3Z5WM1FD
## 8  2018-12-29 07:02:57 381RJ232KK7SMWQW
## 9  2018-12-29 07:02:57 7NAZ1T4HR4ZRQRND
## 10 2018-12-29 07:02:57 SATC17G2QAPYXE0C
## 11 2018-12-29 07:02:57 76R83NFST3MZNW84
## 12 2018-12-29 07:02:57 FKS52SD8WJ8FHXMV
## 13 2018-12-29 07:02:57 QM6VBM5JB235JJ1W
## 14 2018-12-29 07:02:57 XG2KNYWHHFS8X69Z
## 15 2018-12-29 07:02:57 QW821KPRM4GQFANJ
## 16 2018-12-29 07:02:57 D5KWTM5S345A3RP4
## 17 2018-12-29 07:02:57 0D901W6KX66B1BHE
## 18 2018-12-29 07:02:57 KPHZBFSC16FC7FFC
## 19 2018-12-29 07:02:57 QQH7315GMY8HRYXV
## 20 2018-12-29 07:02:57 016YBAJAB7K9777T

and can even supply your own timestamp:

(ut <- ts_generate(as.POSIXct("2017-11-01 15:00:00", origin="1970-01-01")))
## [1] "0001CZM6DGE66RJEY4N05F5R95"

unmarshal(ut)
##                    ts              rnd
## 1 2017-11-01 15:00:00 E66RJEY4N05F5R95

FIN

Kick the tyres & file issues/PRs as needed and definitely give sr.ht a spin for your code-hosting needs. It’s 100% free and open source software made up of mini-services that let you use only what you need. Zero javacript on site and no tracking/adverts. Plus, no evil giant megacorps doing heaven knows what with your browser, repos, habits and intellectual property.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.