[This article was first published on Juuso's blog on Open Data Science and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was exploring ways to visualize my Twitter history, and ended up creating this interactive streamgraph of my 20 most used hashtags in Twitter:
The graph shows how my Twitter activity has varied a lot. The top three hashtags are #datascience, #rstats and #opendata (no surprises there). There are also event-related hashtags that show up only once, such as #tomorrow2015 and #iccss2015, and annually repeating ones, such as #apps4finland.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
How this was made?
Twitter has quite a strict policy for obtaining data, but they do allow one to download the full personal Twitter history, i.e. all tweets as a convenient csv file (instructions here), so that’s what I did. The visualization was created with the streamgraph R package that uses the great htmlwidgets framework for easy creation of javascript visualizations from R. The plots are designed for daily data, but this ended up being too messy, so I aggregated the data on monthly level instead. Embedding the streamgraph htmlwidget into this Jekyll blog required a bit of hazzle. As pointed out in the comments here, the widget must be first created as a standalone html file and then embedded as an iframe. Hopefully there will be a more straightforward way to include htmlwidgets to Jekyll blogs in the future! Some problems:- The size of the widget has to be fixed when creating, so it will not scale automatically. This could possibly be fixed in the streamgraph package following this.
- Font size of the graph is very small, but I could not find a way to change it, even in the javascript source.
read_chunk()
. See more details from the rmarkdown source for this post.
# Script for producing a streamgraph of tweet hashtags # Load packages library("readr") library("dplyr") library("lubridate") library("streamgraph") library("htmlwidgets") # Read my tweets tweets_df <- read_csv("files/R/tweets.csv") %>% select(timestamp, text) %>% mutate(text = tolower(text)) # Pick hashtags with regexp hashtags_list <- regmatches(tweets_df$text, gregexpr("#[[:alnum:]]+", tweets_df$text)) # Create a new data_frame with (timestamp, hashtag) -pairs hashtags_df <- data_frame() for (i in which(sapply(hashtags_list, length) > 0)) { hashtags_df <- bind_rows(hashtags_df, data_frame(timestamp = tweets_df$timestamp[i], hashtag = hashtags_list[[i]])) } # Process data for plotting hashtags_df <- hashtags_df %>% # Pick top 20 hashtags filter(hashtag %in% names(sort(table(hashtag), decreasing=TRUE))[1:20]) %>% # Group by year-month (daily is too messy) # Need to add '-01' to make it a valid date for streamgraph mutate(yearmonth = paste0(format(as.Date(timestamp), format="%Y-%m"), "-01")) %>% group_by(yearmonth, hashtag) %>% summarise(value = n()) # Create streamgraph sg <- streamgraph(data = hashtags_df, key = "hashtag", value = "value", date = "yearmonth", offset = "silhouette", interpolate = "cardinal", width = "700", height = "400") %>% sg_legend(TRUE, "hashtag: ") %>% sg_axis_x(tick_interval = 1, tick_units = "year", tick_format = "%Y") # Save it for viewing in the blog post # For some reason I can not save it to files/R/ direclty so need to use file.rename() saveWidget(sg, file="twitter_streamgraph.html", selfcontained = TRUE) file.rename("twitter_streamgraph.html", "files/R/twitter_streamgraph.html")
To leave a comment for the author, please follow the link and comment on their blog: Juuso's blog on Open Data Science and R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.