Site icon R-bloggers

How to Scrape Word Documents with R

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This article is part of a R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks.

Today we discuss an awesome skill for automating data collection from word documents:

(Click image to play video)


Here’s a common situation, you’re company has LOTS OF WORD FILES.

They contain tables of information that look like this:


Thinking like a programmer, you can extract this data using officer:


With a little bit of data wrangling with the tidyverse, you’ve got your table extracted & formatted:


Then you use ggplot2 to make a sweet plot:


Whoa – Look at 201! Getting a high “Activity Ratio” – Ratio of Lessons completed to Number of Students Enrolled:


You’ve just automated extracting word tables in R. BOOM! ????????????


SETUP R-TIPS WEEKLY PROJECT

  1. Sign Up to Get the R-Tips Weekly (You’ll get email notifications of NEW R-Tips as they are released): https://mailchi.mp/business-science/r-tips-newsletter

  2. Set Up the GitHub Repo: https://github.com/business-science/free_r_tips

  3. Check out the setup video (https://youtu.be/F7aYV0RPyD0). Or, Hit Pull in the Git Menu to get the R-Tips Code

Once you take these actions, you’ll be set up to receive R-Tips with Code every week. =)


To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.