Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
You need to use the data from internet, but don’t type, you can just extract or scrape them if you know the web URL.
Thanks to XML package from R. It provides amazing readHTMLtable() function.
For a study case,
I want to scrape data:
A. Scraping US Airline Customer Score table from
http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines
Code:
airline = ‘http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines’
airline.table = readHTMLTable(airline, header=T, which=1,stringsAsFactors=F)
Result:
B. Scraping World Top Chess players (Men) table from http://ratings.fide.com/top.phtml?list=men
Code:
chess = ‘http://ratings.fide.com/top.phtml?list=men’
chess.table = readHTMLTable(chess, header=T, which=5,stringsAsFactors=F)
Result:
Done. You had successfully scraping data from any web page with CloudStat.
You can get the full version of this study case (code and result) at Scraping table from html web.
Then, you can analyze as usual! Great! No more retype the data. Enjoy!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.