Any Time At All: tweet frequency around the clock

quantixed

10 mins ago

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Please consider this a “supplementary analysis” to my previous post looking at the frequency of tweets from my personal account over the last 12 years.

I was curious about what times I was active on Twitter (measured by when I tweeted). Others might be interested in a solution to look at this in R.

The code

As in the previous post, we need to get the data into R and then make sure we have a date object to work with. The data comes from the tweets.js file that comes as part of the Twitter data download, when you request it. The code below assumes it is in a directory called Data in the wd.

library(jsonlite)
library(lubridate)
library(ggplot2)
library(dplyr)
library(timetk)

json_file <- "Data/tweets.js"
json_data <- fromJSON(txt = json_file, flatten = TRUE)

# make date/time column
json_data$tweet_created_at <- as.POSIXct(json_data$tweet.created_at, format="%a %b %d %H:%M:%S %z %Y")

df_hour <- json_data %>% 
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "hour",
                    hh = hour(tweet_created_at),
                    dd = weekdays(tweet_created_at),
                    yy = year(tweet_created_at)) %>%
  group_by(yy, dd, hh) %>%
  summarize(nn = n())

p4 <- ggplot(df_hour, aes(x = hh, y = nn)) +
  geom_col() +
  theme_bw() +
  lims(x = c(0,24)) +
  labs(x = "Hour", y = "Tweets") +
  facet_grid(yy ~ factor(dd, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")))
ggsave("Output/Plots/tweet_timeOfDay.png", p4)

Like last time the magic comes from the timetk function summarize_by_time()

This function allows us to collapse the tweets by hour. We then group and summarise them in order to generate the plots. We can use facet_grid to lay them out. Note that, the days (dd) will be in alphabetical order unless you tell ggplot how to level each day as a factor.

As in the previous post, it’s clear how my tweeting declined from a high in 2015 and 2016. You can also see that I didn’t tweet as much at the weekend as during the week. But what about the times?

Pretty much I tweeted from 6 am through to 10 pm each day. Basically, those are my waking hours. Hmmm, not very healthy.

I had expected to see some trends, i.e. tweeting more first thing than later, or maybe tweeting more around lunchtime. But the distributions are pretty flat or at least show no consistent patterns.

We can drill down a little more and look at tweeting times per day of the week for 2015 when I was tweeting the most. In this year there is a slight trend on Tuesdays-Fridays to increasingly tweet up to 10am and then tail off. Let’s break this year down further by month.

df_2015_hour <- json_data %>% 
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "hour",
                    hh = hour(tweet_created_at),
                    mm = month(tweet_created_at, label = TRUE),
                    dd = weekdays(tweet_created_at),
                    yy = year(tweet_created_at)) %>%
  filter(yy == 2015) %>%
  group_by(mm, dd, hh) %>%
  summarize(nn = n())

p5 <- ggplot(df_2015_hour, aes(x = hh, y = nn, group = mm)) +
  geom_col() +
  theme_bw() +
  lims(x = c(0,24)) +
  labs(x = "Hour", y = "Tweets") +
  facet_grid(mm ~ factor(dd, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")))
ggsave("Output/Plots/tweet_2015timeOfDay.png", p5)

While it’s possible to drill down this far, the data gets noisy with only 4 or 5 days per month to collate for the distributions.

Summary

This post was more to show how to interrogate a dataset. The wonderful thing about R and the libraries used here, is how easy it is to quickly spin up some plots to explore a dataset. We didn’t get much insight beyond the fact that I used Twitter far too much!

—

The post title comes from “Any Time At All” by The Beatles from their “A Hard Day’s Night” album.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.