The R Backpages 2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Joseph Rickert
In this roundup of R-related news: Domino enables data science collaboration; Plotly adds an R graphics gallery; Revolution Analytics R user group sponsorship applications are open; and Quandl adds new data sets.
San Francisco startup takes on collaborative Data Science
Domino, a San Francisco based startup, is inviting users to sign up to beta test its vision of online, Data Science collaboration. The site is really pretty slick, and the vision of cloud computing infrastructure integrated with an easy to use collaboration interface and automatic code revisioning is compelling. Moreover, it is delightfully easy to get started with Domino. After filling out the new account form, a well thought out series of screens walks the new user through downloading the client software, running a script (R, MatLab or Python) and viewing the results online. The domino software creates a quick-start directory on your PC where it looks for scripts to run. After the installation is complete it is just a matter firing up a command window to run scripts in the cloud with:
C:\Users\…\quick-start> domino run myScript.R
The process of accessing the EC2 cluster is completely hidden from the user. This is painless cloud computing. The results are accessible from your browser or may be downloaded to your PC. Once you have some results up in the cloud to look at you can invite others to collaborate. The following screenshot shows a plot created by running an R script out of John Fox’s regression book as well as the command shell used to submit the script.
Here is the code to produce the plot.
install.packages("car",repos='http://cran.us.r-project.org') # Regression plot from # John Fox's book # An R and S-Plus Companion to Applied Regression library(car) attach(Duncan) pairs(cbind(prestige,income,education), # the pairs function produces a matrix of scatter plots panel=function(x,y){ # define a function panel for the content of the matrix points(x,y) # plot the points abline(lm(y~x), lty=2,col="blue")# add a linear regression lines(lowess(x,y),col="red") # add a nonlinear regression }, diag.panel=function(x){ # define a new panel for the diagonals par(new=T) hist(x,main="",axes=F,nclass=12) # put a histogram on each diagonal } )
(Note that this is exactly the R code you would run locally except that the first line has to contain the code to install the car package and select a mirror.) This next screenshot shows the record of all of my runs, the success and failures. Users have access to the code as well as the results from their previous runs. This is definitely a small victory for reproducible research.
Plotly adds a new R graphics gallery
Plotly the Montréal startup (also building R-based data collaboration and visualization tools) that we have previously featured has added a new gallery of interactive R plots with sample code. Here is a screen shot of the golden spiral plot.
Plotly provides a lot more than just a base R plot. If you click on “view in plotly” from you browser plotly gives you the opportunity to change the style and layout, annotate your plot, fit additional data and more.
If you are in the San Francisco Bay area on December 10th you can see Chris Palmer, one of Plotly's founders, show off what they have at the Bay Area useR Group (BARUG) Meeting.
Revolution Analytics 2014 Sponsorship Program is underway!!
Revolution is accepting requests for 2014 R user group sponsorship. Once again, R user groups worldwide are invited to apply for support at the Vector ($100), Matrix ($500) and Array ($1,000) levels. In addition to the cash grants, each sponsored group will receive a box of “goodies” that includes T-shirts, flying monkeys and more. So, if your are starting a new group, or think that sponsorship will help you grow your existing R user group, please have a look at the requirements for sponsorship on our website and apply. The deadline for applications at the Matrix and Array sponsorship levels is March 31, 2014. We will be accepting Vector level sponsorships until September 20, 2014.
Quandl exceeds 7.8 million datasets!
Quandl contiues its mission to seek out and make available the worlds financial and econometric data. Recently added data sets include:
- ALFRED: 10,000 vintage economic datasets from the Federal Reserve's archival site.
- Commodity Futures Trading Commission: Over 8,000 datasets with commitment of traders information, for futures and options, both new and legacy formats.
- PsychSignal: Bullish/bearish sentiment index and volume ratios for 6,000+ stocks.
- Penn World Table 8.0: 5,000 datasets from the latest Penn World Table (version 8.0).
- US Treasury: Marketable debt statistics, including average maturity of issuance.
- FDIC: Banking statistics including assets, liabilities, failures and deposit insurance.
- Center for Applied Studies on Applied Economics: Brazilian agricultural price indexes.
- London Platinum and Palladium Market: Platinum and Palladium prices.
- Federal Reserve Bank of Philadelphia: Philly Fed's new GDP+ indicator.
- Swiss Exchange: EOD data for over 250 stocks.
- Stock Exchange of Thailand: 6 market indexes.
- Liv-ex: Fine wine price indexes.
- Beta Arbitrage: Minimum variance portfolios and beta portfolios.
- International Securities Exchange: ISE sentiment indexes for equities and ETFs.
- Renaissance Capital: Monthly US IPO statistics.
- Osaka University: Japanese equity volatility indexes.
- Nikkei: 15 daily indexes published by the Nikkei group.
- UK Office for Budget Responsibility: Economic indicator forecasts up to 2063.
- Federal Reserve Economic Data: Added 1000s of new indicators from FRED.
- Wall Street Journal: Over 100 commodity spot prices added.
- Bureau of Labor Statistics: In progress, currently over 180,000 datasets imported.
Quandl has also made some changes to improve the uasbility of its site. The Futures topic page with 200+ contracts from 10+ exchanges. A new API Resources page makes using the API easier by providing downloadable lists of stock tickers, futures symbols, country and currency codes, and nomenclature rules for various data sources. There is also a new Data Request page that lists all the datasets that users have requested. If you are new to Quandl you may find the mini tutorial on accessing Quandl data from R useful.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.