RObservations #51: Download Kaggle Datasets into the R Console with {RKaggle}

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently found some R code in the TidyTuesday repository which pulled data from Kaggle directly into the R console and I thought the idea was incredible! After looking around and seeing that there was no packages that already did this, I was inspired to create the {RKaggle} package which allows users to download datasets from Kaggle directly into the R console.

Installation

Presently, {RKaggle} is only available to be installed from GitHub (there are plans to publish to CRAN). Its possible to install from GitHub with the {devtools} package or the {remotes} package.

# Install devtools and/or remotes if you haven't already
# install.packages(c("devtools", "remotes"))
# Use devtools
devtools::install_github("benyamindsmith/RKaggle")
# Or use remotes
# remotes::install_github("benyamindsmith/RKaggle")

Basic Usage

Presently {RKaggle} works with getting datasets only (not competitions). Here is some example code for downloading a dataset:

>library(RKaggle)
># Download and read the "canadian-prime-ministers" dataset from Kaggle
>canadian_prime_ministers <- RKaggle::get_dataset("benjaminsmith/canadian-prime-ministers")
>canadian_prime_ministers
# A tibble: 29 × 5
   No.        Name                `Political Party`     `Term Start`     `Term End`      
   <chr>      <chr>               <chr>                 <chr>            <chr>           
 1 1 (1 of 2) John A. Macdonald   Liberal-Conservative  1 July 1867      5 November 1873 
 2 2          Alexander Mackenzie Liberal               7 November 1873  8 October 1878  
 3 1 (2 of 2) John A. Macdonald   Liberal-Conservative  17 October 1878  6 June 1891     
 4 3          John Abbott         Liberal-Conservative  16 June 1891     24 November 1891
 5 4          John Thompson       Liberal-Conservative  5 December 1892  12 December 1894
 6 5          Mackenzie Bowell    Conservative          21 December 1894 27 April 1896   
 7 6          Charles Tupper      Conservative          1 May 1896       8 July 1896     
 8 7          Wilfrid Laurier     Liberal               11 July 1896     10/6/1911       
 9 8          Robert Borden       Government (Unionist) 10/10/1911       7/10/1920       
10 9 (1 of 2) Arthur Meighen      Conservative          7/10/1920        12/29/1921      
#  19 more rows
#  Use `print(n = ...)` to see more rows

Present Scope

{RKaggle} presently only supports the download and loading of the following file formats:

  • .csv
  • .tsv
  • .xlsx
  • .json
  • .rds
  • .parquet
  • .ods

There are already some issues open for accommodating other file formats. If you want to get involved submit feel free to submit a pull request!

Want to see more of my content?

Be sure to subscribe and never miss an update!

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)