The R-Podcast Episode 10: Adventures in Data Munging Part 2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for storage into a MySQL database. Our listener feedback segment contains another installment on the Pitfalls of R contributed by listener Frans. I want to thank everyone who has provided such positive feedback throughout the season, and I’m looking forward to providing some exciting new content for season 2. I hope you enjoy the episode and check out our new contact page if you would like to provide any feedback. Thanks for listening!
The following resources are mentioned in this episode:
- New additions to the RStudio team: blog.rstudio.org/2012/08/20/welcome-hadley-winston-and-garrett/
- Over 4,000 packages on CRAN: http://blog.revolutionanalytics.com/2012/08/two-r-community-milestones.html
- NHL Analysis web-scraping scripts on GitHub: https://github.com/thercast/nhl_analysis/tree/master/web-scraping
- XML package: http://cran.r-project.org/web/packages/XML/index.html
- RCurl package: http://cran.r-project.org/web/packages/RCurl/
- Hockey-Reference data: http://www.hockey-reference.com
- Using R for Scraping Data Presentation at UseR! 2012: http://www.slideshare.net/rtelmore/user-2012-talk
- Using RMySQL tutorial: http://playingwithr.blogspot.com/2011/05/accessing-mysql-through-r.html
- Jeroen Ooms’ lme4 web application: http://www.stat.ucla.edu/~jeroen/lme4.html
- Coursera Course on R: https://www.coursera.org/course/compdata
- RPubs: http://rpubs.com/
- Theme music provided by WillRock from the Return All Robots Remix Album at ocremix.org
- The closing theme is entitled “The Way” and provided by Jewbei from the Wild Arms: ARMed and Dangerous album at ocremix.org
Episode 10 Time Stamps
00:00 The R-Podcast #010 Adventures in Data Munging Part 2 00:33 Introduction 01:50 Wrapping up season 1 ... wait, what? 03:30 Rstudio team expands 05:41 R Community milestone 07:53 Discovering hockey-reference.com 10:54 Tips for readHTMLtable 21:10 Checking for valid data first 29:23 Minor processing needed 35:18 Saving data to MySQL database 45:26 Listener Feedback: Andrew 54:58 Frans: Pitfalls of R segment 2 63:40 Wrapping up: subscribe to the podcast, [email protected], + 1-269-849-9780, Twitter @theRcast 69:14 End
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.