twitteR now supports database persistence

[This article was first published on lp0 On Fire, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For a long time now I’ve wanted to add the ability for storing data from twitteR into a RDBMS. In the past I’ve done things by concatenating new results onto old results which simply becomes unwieldy. I know that many people have doctored up their own solutions for this but it seemed useful to have it baked in. Unfortunately I never had the time or energy to do this so the idea languished. But then dplyr happened – it provides some revolutionary tools for interacting with data stored in a database backend. I figured I’d kill two birds with one stone by finally implementing this project which in turn would give me a lot of data to play with. This is all checked in to master on github.

This is still a work in progress, so please let me know if you have any comments, particularly as regards making it more seamless to use.

 First, some basics:

  • While theoretically any DBI based backend will work, currently only RMySQL and RSQLite are supported.
  • The only types of data able to be persisted are tweets (status) objects and user objects. Granted, this likely covers 95%+ of use cases.
  • Data can be retrieved as either a list of the appropriate object or as a data.frame representing the table. Only the entire table will be retrieved – my expectation is that it will be simpler for users to interact with data via things like dplyr.
To get started, you must register your database backend. You can either create a DBI connection from one of the supported packages or call one of the available convenience methods (which will return the connection as well as register it with twitter.

register_db_backend(dbi_connection)
# or create a sqlite connection
register_sqlite_backend("/path/to/sqlite/file")
# or create a mysql connection
register_mysql_backend("my_database", "hostname", "username", "password")
view raw gistfile1.r hosted with ❤ by GitHub

To continue, suppose we have a list of tweets we want to persist. Simply call store_tweets_db() with your list and they’ll be persisted into your database. By default they will be persisted to the table tweets but you can change this with the table_name argument.

tweets = searchTwitter("#rstats", n=500)
store_tweets_db(tweets)
view raw gistfile1.r hosted with ❤ by GitHub

Finally, to retrieve your tweets from the database the function is load_tweets_db(). By default this will return a list of the appropriate object, although by specifying as.data.frame=TRUE the result will be a data.frame mirroring the actual table. Much like store_tweets_db() there is a table_name argument.

from_db = load_tweets_db()
view raw gistfile1.r hosted with ❤ by GitHub

Note that for user data there is a mirror set of functions, store_users_db() and load_users_db(), and the default table name is users.

To leave a comment for the author, please follow the link and comment on their blog: lp0 On Fire.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)