Extracting all links from my slidedeck
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week after my useR! talk, someone I had met at the R-Ladies dinner asked me for a list of all the links in my slides. I said I’d prepare it, not because I’m a nice person, but because I knew it’d be an use case where the great tinkr package would shine! 😈
What is tinkr?
tinkr is an R package I created, and that its current maintainer Zhian Kamvar took much further that I’d ever would have. tinkr can transform Markdown into XML and back.
Under the hood, tinkr uses
- commonmark for the Markdown-to-XML conversion. CommonMark, in the form of its cmark implementation, is the C library that GitHub for instance uses to display your Markdown comments as HTML. The commonmark package is also what powers Markdown support in roxygen2.
- xslt for the XML-to-Markdown conversion. XSLT is a templating language for XSLT.
Anyway, enough said, let’s go back to today’s use case.
Extract and format links from index.qmd
With tinkr I can use XPath, the query language for XML or HTML, to extract links from my slidedeck source. Then I will format them as a list.
First, I create a yarn object from my slidedeck source.
talk_yarn <- tinkr::yarn$new("/home/maelle/Documents/conferences/user2024/index.qmd")
talk_yarn
#> <yarn>
#> Public:
#> add_md: function (md, where = 0L)
#> body: xml_document, xml_node
#> clone: function (deep = FALSE)
#> get_protected: function (type = NULL)
#> head: function (n = 6L, stylesheet_path = stylesheet())
#> initialize: function (path = NULL, encoding = "UTF-8", sourcepos = FALSE,
#> md_vec: function (xpath = NULL, stylesheet_path = stylesheet())
#> ns: http://commonmark.org/xml/1.0
#> path: /home/maelle/Documents/conferences/user2024/index.qmd
#> protect_curly: function ()
#> protect_math: function ()
#> protect_unescaped: function ()
#> reset: function ()
#> show: function (lines = TRUE, stylesheet_path = stylesheet())
#> tail: function (n = 6L, stylesheet_path = stylesheet())
#> write: function (path = NULL, stylesheet_path = stylesheet())
#> yaml: --- format: revealjs: highlight-style: a11y ...
#> Private:
#> encoding: UTF-8
#> md_lines: function (path = NULL, stylesheet = NULL)
#> sourcepos: FALSE
Then I extract all links.
links <- xml2::xml_find_all(
talk_yarn$body,
xpath = ".//md:link",
ns = talk_yarn$ns
)
head(links)
#> {xml_nodeset (6)}
#> [1] <link destination="https://user-maelle.netlify.app" title="">\n <text xm ...
#> [2] <link destination="https://www.pexels.com/photo/old-cargo-ship-on-sea-207 ...
#> [3] <link destination="https://www.pexels.com/photo/the-word-louise-is-spelle ...
#> [4] <link destination="https://www.pexels.com/photo/gray-rotary-telephone-on- ...
#> [5] <link destination="https://www.pexels.com/photo/close-up-photography-of-y ...
#> [6] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
I then throw away the links to the great website Pexels, because these are image credits rather than information useful to do R stuff.
links <- purrr::discard(
links,
\(x) startsWith(xml2::xml_attr(x, "destination"), "https://www.pexels")
)
head(links)
#> {xml_nodeset (6)}
#> [1] <link destination="https://user-maelle.netlify.app" title="">\n <text xm ...
#> [2] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
#> [3] <link destination="https://www.r-consortium.org/all-projects/call-for-pro ...
#> [4] <link destination="https://www.heltweg.org/posts/who-wrote-this-shit/" ti ...
#> [5] <link destination="https://fosstodon.org/@hadleywickham/11202130903588421 ...
#> [6] <link destination="https://nostarch.com/kill-it-fire" title="">\n <text ...
After that I can format the links and display them here using an “asis” chunk. Yes, my slidedeck uses Quarto but this blog is still powered by R Markdown, hugodown to be precise.
I’m using the formatting as an opportunity to only keep distinct links: sometimes I had very similar slides in a row, with repeated information.
format_link <- function(link) {
url <- xml2::xml_attr(link, "destination")
text <- xml2::xml_text(link)
sprintf("* [%s](%s)", text, url)
}
formatted_links <- purrr::map_chr(links, format_link)
formatted_links <- unique(formatted_links)
formatted_links |>
paste(collapse = "\n") |>
cat()
- https://user-maelle.netlify.app
- R Consortium ISC
- https://www.heltweg.org/posts/who-wrote-this-shit/
- https://fosstodon.org/@hadleywickham/112021309035884210
- https://nostarch.com/kill-it-fire
- “Refactoring Pro-Tip: Easiest Nearest Owwie First”
- https://styler.r-lib.org/
- https://masalmon.eu/2024/05/23/refactoring-tests/
- {lintr} itself
- reference index
- continuous integration
- https://masalmon.eu/2024/05/15/refactoring-xml/
- Tidyteam code review principles
- The Code Review Anxiety Workbook
- General science lifecycle
- Statistical software
- now
- then
- Happy Git and GitHub for the useR
- “Oh shit, Git!"
- “How Git works”
- Why you need small, informative Git commits
- The two phases of commits in a Git branch
- Hack your way to a good Git history
- {saperlipopette}
- Oh shit, Git!
- No Maintenance Intended
- What Does It Mean to Maintain a Package?
- Three currencies of payment for our work
- Package maintainer cheatsheet
- dev guide
- blog
- Package Development Corner
- paths of participation
- Monthly newsletter
- Blog
- R-universe
- https://ropensci.org/help-wanted
- https://ropensci.org/news
- https://devguide.ropensci.org/maintenance_evolution.html#archivalguidance
- 2021 community call
Conclusion
Using tinkr, XPath and sprintf()
, I was able to create a list of all the links shared in my useR! slidedeck. Some of them have no text, meaning that the URL is used as text for the link; or text that only makes sense in the context of the paragraph they were a part of; others are slightly more informative; but at least none of them is a “click here” link. 😅
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.