Twitcher II: tweet frequency and top tweets

quantixed

1 hour ago

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another post looking at Twitter data in R. It follows this one and this one.

I wanted to look again at my tweeting frequency over the 12 years on Twitter, but this time do it in a calendar view. Something like a GitHub commit calendar would be perfect. I have used a library for this in the past. Here, for no particular reason I used a function that I found in this post (code at the end).

To generate these plots, it was a case of loading in the data (as described previously). I am analysing data from @clathrin and not from my quantixed Twitter account.

library(jsonlite)
library(lubridate)
library(ggplot2)
library(dplyr)
library(timetk)
library(viridis)

json_file <- "Data/tweets.js"
json_data <- fromJSON(txt = json_file, flatten = TRUE)

# make date/time column
json_data$tweet_created_at <- as.POSIXct(json_data$tweet.created_at, format="%a %b %d %H:%M:%S %z %Y")

# make factor for Tweet, Reply, RT
json_data$tweet_type <- as.factor(ifelse(
  grepl("^RT ",json_data$tweet.full_text), "RT", ifelse(
    is.na(json_data$tweet.in_reply_to_user_id), "Tweet", "Reply")))

Then we need to generate the plots by summarising the data by day and using the calendarHeatMap() function described below.

# Tweet calendar
# all
df_day <- json_data %>%
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "day",
                    n = n())
p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "All tweets", subtitle = "@clathrin")
ggsave("Output/Plots/all_calendar.png", p, width = 6, height = 18)

# tweets only
df_day <- json_data %>%
  filter(tweet_type == "Tweet") %>%
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "day",
                    n = n())
p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "Tweets", subtitle = "@clathrin")
ggsave("Output/Plots/tweets_calendar.png", p, width = 6, height = 18)

# RTs only
df_day <- json_data %>%
  filter(tweet_type == "RT") %>%
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "day",
                    n = n())
p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "RTs", subtitle = "@clathrin")
ggsave("Output/Plots/rts_calendar.png", p, width = 6, height = 18)

# replies only
df_day <- json_data %>%
  filter(tweet_type == "Reply") %>%
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "day",
                    n = n())
p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "Replies", subtitle = "@clathrin")
ggsave("Output/Plots/replies_calendar.png", p, width = 6, height = 18)

As before, it’s possible to see my waning tweet frequency. I can also see that my hot days for tweeting were due to me sending lots of replies to people wishing me well on achievements. This type of view is great for that granularity – which was missing in the previous visualisations – but overall I think 12 years is too much for this calendar view and it works better in spans of up to 3-4 years.

Let’s dig a bit more into the data

Changing tack, the json_data data frame imported from tweets.js has the full text of all tweets. Inspired by this, I decided to have a look at what my most successful tweets were over this period.

# Top tweets
# likes and RTs stored as character, convert to numeric
json_data$tweet.favorite_count <- as.numeric(json_data$tweet.favorite_count)
json_data$tweet.retweet_count <- as.numeric(json_data$tweet.retweet_count)
# subset data frame for those tweets that had "interest"
df_interest <- json_data[json_data$tweet.favorite_count >= 1 | json_data$tweet.retweet_count >= 1, ]
nrow(df_interest) / nrow(json_data)

p5 <- ggplot(df_interest, aes(x = tweet.favorite_count, y = tweet.retweet_count)) +
  geom_point(alpha = 0.2) +
  theme_bw() +
  scale_x_log10(limits = c(1,10000)) +
  scale_y_log10(limits = c(1,1000)) +
  labs(x = "Stars", y = "Retweets")
ggsave("Output/Plots/tweet_RTs_Stars.png", p5)

About 30% of my tweets (this is a mixture of tweets, RTs and replies) received either at least 1 like or 1 RT. As expected there’s a correlation between likes and RTs of my tweets, with the lower energy Like being scoring higher than RT.

Let’s have a look at the best performing tweets.

top_tweets <- json_data[json_data$tweet.favorite_count > 100 | json_data$tweet.retweet_count > 100, ]
top_tweets <- data.frame(Date = top_tweets$tweet_created_at,
                         Tweet = top_tweets$tweet.full_text,
                         Stars = top_tweets$tweet.favorite_count,
                         RTs = top_tweets$tweet.retweet_count)
top_tweets <- top_tweets[order(top_tweets$Stars, decreasing = TRUE),]
write.table(top_tweets, "Output/Data/tt.txt",sep = "\r", row.names = FALSE)

The top tweets in (almost) all their glory

Here are the top tweets. I have removed the links and pictures (the URLs for pictures are in top_tweets but I have not included them here).

2022-06-30 13:56:54

“Just had a discussion about breaking free of Adobe Illustrator and avoiding BioRender. We have found Inkscape and Bioicons to be very useful open alternatives.”

Likes 1431 RTs 252

2019-01-25 20:16:49

“Returning from a grant panel. I was struck how mainstream preprints are nowadays. Several proposals received support from the panel because a preprint was available. Flipside is that “submitted to Journal X” doesn’t cut it – the panel now ask “why is it not on bioRxiv?”. 1/2”

Likes 847 RTs 373

2019-12-03 16:34:00

“#ASCBEMBO19 Remember a fabric poster is for life, not just for the meeting. Turn yours into a lovely cushion or two after the meeting and get some comfort from your data. #recycling”

Likes 774 RTs 141

2021-10-22 09:57:08

“I spoke to a postdoc recently who said they don’t tweet because “no-one would care what I have to say”.

Can I just say that the number 1 thing that would improve my feed enormously is if ECRs would tweet more (about science, lab life, anything).”

Likes 466 RTs 54

2019-09-19 11:43:46

“It’s real! I just received an advance copy of my book “The Digital Cell”. Official release is 1st Dec.”

Likes 378 RTs 57

2019-05-31 18:56:52

“Confirmed by youngest lab member today: young people don’t use full stops at the end of text messages, and it is seen as rude to do so. However if the message is from a grown-up (his words, not mine) then they are not offended because they know we don’t know the etiquette.”

Likes 334 RTs 37

2019-04-05 07:00:41

“Love this photo of Sydney Brenner and the rest of the LMB Governing Board in 1967. He’s dressed down and trying to conceal a cigarette from the camera.”

Likes 303 RTs 60

2019-10-08 16:47:31

“I love doing science in 2019. Saw this bioRxiv preprint via twitter. Downloaded the data, integrated it with a dataset we have. Discussed next steps with the lab on Slack. Experiments planned. All in a day, and the paper is unpublished.”

Likes 231 RTs 46

2022-08-01 16:07:30

“Please take a look at our new paper on making clathrin-coated pits inside living cells. Out now @eLife We show that it’s possible to make CCPs using minimal machinery and that we can even do this on the mitochondria (yes, you read that correctly)”

Likes 221 RTs 48

2022-06-06 17:36:04

“After 7 years of #NotTheCover, my lab finally #GotTheCover! Congrats again to @Nufediaz Laura and George for their work on this fantastic paper. Huge thank you to @JCellBiol (not just for picking us for the cover) who were a pleasure to deal with.”

Likes 199 RTs 12

2019-03-27 09:38:28

“Thanks to everyone for their congrats by tweet, DM, email or phone.

And yes, that is a hand-drawn unicorn emoji… love the folks in my lab.”

Likes 179 RTs 3

2022-08-05 22:49:38

“What I think about when I read tweets about CSN papers…”

Likes 175 RTs 18

2019-12-08 12:37:46

“There’s a kerfuffle on Twitter about whether biologists should learn to code. Well, of course they should and – shameless plug time – I am here at #ASCBEMBO19 to launch my book “The Digital Cell” which is a handbook to help cell biologists use computers in their research. 1/n”

Likes 164 RTs 39

2017-12-04 07:11:53

“Kim Nasmyth: You have to do science because you want to know, not because you want to get recognition. If you do what it takes to please other people, you’ll lose your moral compass. #BreakthroughPrize”

Likes 147 RTs 57

2016-01-05 10:19:23

“Citation distributions for 22 journals including Cell, Nature and Science | quantixed”

Likes 140 RTs 201

2020-07-02 08:22:04

“NEW PREPRINT from @roylelab looking at interactions between mitotic spindle associated proteins in live cells.

A brief explainer 1/7″

Likes 139 RTs 52

2020-08-20 10:00:00

“NEW PREPRINT: Intracellular nanovesicles mediate integrin trafficking during cell migration

It’s the work of @GabrielleLaroc6 (while in @roylelab) with help from @PenelopeLaBorde, @BeverleyJWilson, Nick Clarke, Daniel Moore & @integrintraffic

1/n”

Likes 129 RTs 41

2019-07-03 08:49:06

“I might have tweeted this before, but the images in this paper from 2006 are amazing. Membrane organisation in mammalian cells is ”

Likes 128 RTs 20

2019-08-05 21:08:27

“Hmmm. NSMB asked 16 researchers for their views on the current and future of synaptic vesicle fusion research. All 16 are male – where are the women?”

Likes 118 RTs 34

2019-01-25 20:16:50

“This makes most difference for early career people where track record is not so clear. Makes sense: we were there to evaluate the science and preprints enable that, but I was surprised by how widespread this view was on the panel. 2/2”

Likes 117 RTs 20

2022-09-27 11:05:00

“Can anyone suggest a cell line that is:

Human origin
Easy to culture & transfect
Great for microscopy (membrane traffic)
Near diploid and/or stable karyotype

The lines we use @roylelab do not tick all boxes and we’re wondering what we’re missing!”

Likes 116 RTs 43

2022-10-06 20:04:35

“Golgi holds the front page. Happy now @CellBiol_MRCLMB ?”

Likes 113 RTs 13

2022-03-23 08:31:54

“NEW PREPRINT from @roylelab on intracellular clathrin-coated vesicle formation. It is the work of @pixycus with the help of @SittewelleM, @GabrielleLaroc6 and @MiguelHdez_3 (). Link: A brief explainer 1/n”

Likes 104 RTs 29

2017-05-17 16:33:40

“JOIN US. 3 year Postdoc position @roylelab Membrane traffic, cell migration, imaging. Funded by MRC. Please RT”

Likes 86 RTs 164

2018-03-13 16:33:16

“Cell biologists! I have a vacancy for a post-doc to work on endocytosis @roylelab @Warwick_CMCB

Closing date is next week. Contact me with any questions. RTs would be nice!”

Likes 52 RTs 162

Conclusion

It is mind-boggling to me that 1.5K people would like a tweet about open source science tools. Compared to some people I know on Twitter, these numbers for engagement are pretty meagre. I have never tried to write tweets with the intention of “going viral”. I just tried to tweet about interesting stuff. Gratifyingly many of these “top tweets” were related to my science and to the lab.

Despite many years on the platform, I always found it very hard to predict which tweets would be well received. Initially, the success of a tweet was down to the time of day, the day of the week and how many followers you had. Latterly, it was driven algorithmically. I found that some of the very highest tweets had incredibly far reach as a result, and that – it turns out – is not entirely positive. Maybe more on that another time.

This post is my 30th for 2022 and thereby fulfils my posting goal for 2022. I hope to keep on postingand nerding out in 2023. Thanks for reading!

Code for calendar heatmap

Note, this is the work of Paul Bleicher. It is edited for viridis style colouring and to correct an off-by-one error.

#' Calendar Heatmap
#' 
#' Creates a colour coded calendar visualising time series data
#' 
#' @param dates A vector containing the dates in `Date` format.
#' @param values A vector containing the corresponding values as numeric.
#' @param title Main plot title (optional).
#' @param subtitle Main plot subtitle (optional).
#' @param legendtitle Legend title (optional).
#'   
#' @return ggplot object
calendarHeatmap <- function(dates, values, title = "", subtitle = "", legendtitle = ""){
  
  # Parameter checks
  if(missing(dates)){
    stop("Need to specify a dates vector.")
  }
  if(missing(values)){
    stop("Need to specify a values vector.")
  }
  if(!is.Date(dates)){
    stop("dates vector need to be in Date format.")
  }
  if(length(dates) != length(values)){
    stop("dates and values need to have the same length.")
  }
  
  
  # load required packages
  require(ggplot2)
  
  my_theme <- function() {
    
    # Colors
    color.background = "white"
      color.text = "#22211d"
        
      # Begin construction of chart
      theme_bw(base_size=15) +
        
        # Format background colors
        theme(panel.background = element_rect(fill=color.background, color=color.background)) +
        theme(plot.background  = element_rect(fill=color.background, color=color.background)) +
        theme(panel.border     = element_rect(color=color.background)) +
        theme(strip.background = element_rect(fill=color.background, color=color.background)) +
        
        # Format the grid
        theme(panel.grid.major = element_blank()) +
        theme(panel.grid.minor = element_blank()) +
        theme(axis.ticks       = element_blank()) +
        
        # Format the legend
        theme(legend.position = "bottom") +
        theme(legend.text = element_text(size = 8, color = color.text)) +
        theme(legend.title = element_text(size = 10, face = "bold", color = color.text)) +
        
        # Format title and axis labels
        theme(plot.title       = element_text(color=color.text, size=20, face = "bold")) +
        theme(axis.text.x      = element_text(size=12, color="black")) +
        theme(axis.text.y      = element_text(size=12, color="black")) +
        theme(axis.title.x     = element_text(size=14, color="black", face = "bold")) +
        theme(axis.title.y     = element_text(size=14, color="black", vjust=1.25)) +
        theme(axis.text.x      = element_text(size=10, hjust = 0, color = color.text)) +
        theme(axis.text.y      = element_text(size=10, color = color.text)) +
        theme(strip.text       = element_text(face = "bold")) + 
        
        # Plot margins
        theme(plot.margin = unit(c(0.35, 0.2, 0.3, 0.35), "cm"))
  }
  
  # create empty calendar
  min.date <- as.Date(paste(format(min(dates), "%Y"),"-1-1",sep = ""))
  max.date <- as.Date(paste(format(max(dates), "%Y"),"-12-31", sep = ""))
  df <- data.frame(date = seq(min.date, max.date, by="days"), value = NA)
  
  # fill in values
  df$value[match(dates, df$date)] <- values
  
  df$year  <-  as.factor(format(df$date, "%Y"))
  df$month <- as.numeric(format(df$date, "%m"))
  df$doy   <- as.numeric(format(df$date, "%j"))
  #df$dow  <- as.numeric(format(df$date, "%u"))
  #df$woy  <- as.numeric(format(df$date, "%W"))
  df$dow <- as.numeric(format(df$date, "%w"))
  df$woy <- as.numeric(format(df$date, "%U")) + 1
  
  df$dowmapped <- ordered(df$dow, levels = 6:0)
  levels(df$dowmapped) <- rev(c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"))
  
  g <- ggplot(df, aes(woy, dowmapped, fill = value)) + 
    geom_tile(colour = "darkgrey") + 
    facet_wrap(~year, ncol = 1) + # Facet for years
    coord_equal(xlim = c(2.5,54)) + # square tiles
    scale_x_continuous(breaks = 53/12*(1:12)-1.5, labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) + 
    my_theme() +
    scale_fill_gradientn(colours = c("#3b528b","#21918c","#5ec962","#fde725"), na.value = "white",
                         name = legendtitle,
                         guide = guide_colorbar(
                           direction = "horizontal",
                           barheight = unit(2, units = "mm"),
                           barwidth = unit(75, units = "mm"),
                           title.position = 'top',
                           title.hjust = 0.5
                         )) +
    labs(x = NULL, 
         y = NULL, 
         title = title, 
         subtitle = subtitle)
  
  my.lines<-data.frame(x=numeric(), 
                       y=numeric(), 
                       xend=numeric(), 
                       yend=numeric(), 
                       year=character())
  
  for(years in levels(df$year)){
    df.subset <- df[df$year == years,]
    
    y.start <- df.subset$dow[1]
    x.start <- df.subset$woy[1]
    
    x.top.left <- ifelse(y.start == 0, x.start - 0.5, x.start + 0.5)
    y.top.left <- 7.5
    x.top.right <- df.subset$woy[nrow(df.subset)] + 0.5
    y.top.right <- 7.5
    
    x.mid.left01 <- x.start - 0.5
    y.mid.left01 <- 7.5 - y.start
    x.mid.left02 <- x.start + 0.5
    y.mid.left02 <- 7.5 - y.start
    
    x.bottom.left <- x.start - 0.5
    y.bottom.left <- 0.5
    x.bottom.right <- ifelse(y.start == 6, df.subset$woy[nrow(df.subset)] + 0.5, df.subset$woy[nrow(df.subset)] - 0.5)
    y.bottom.right <- 0.5
    
    my.lines<-rbind(my.lines,
                    data.frame(x    = c(x.top.left, x.bottom.left, x.mid.left01, x.top.left, x.bottom.left), 
                               y    = c(y.top.left, y.bottom.left, y.mid.left01, y.top.left, y.bottom.left),
                               xend = c(x.top.right, x.bottom.right, x.mid.left02, x.mid.left02, x.mid.left01), 
                               yend = c(y.top.right, y.bottom.right, y.mid.left02, y.mid.left02, y.mid.left01), 
                               year = years))
    
    # lines to separate months
    for (j in 1:12)  {
      df.subset.month <- max(df.subset$doy[df.subset$month == j])
      x.month <- df.subset$woy[df.subset.month]
      y.month <- df.subset$dow[df.subset.month]
      
      x.top.mid <- x.month + 0.5
      y.top.mid <- 7.5
      
      x.mid.mid01 <- x.month - 0.5
      y.mid.mid01 <- 7.5 - y.month - 1
      x.mid.mid02 <- x.month + 0.5
      y.mid.mid02 <- 7.5 - y.month - 1
      
      x.bottom.mid <- ifelse(y.month == 6, x.month + 0.5, x.month - 0.5)
      y.bottom.mid <- 0.5
      
      my.lines<-rbind(my.lines,
                      data.frame(x    = c(x.top.mid, x.mid.mid01, x.mid.mid01), 
                                 y    = c(y.top.mid, y.mid.mid01, y.mid.mid01),
                                 xend = c(x.mid.mid02, x.mid.mid02, x.bottom.mid), 
                                 yend = c(y.mid.mid02, y.mid.mid02, y.bottom.mid), 
                                 year = years))
      
    }
    
  }
  
  # add lines
  g <- g + geom_segment(data=my.lines, aes(x,y,xend=xend, yend=yend), lineend = "square", color = "black", inherit.aes=FALSE)
  
  return(g)
}

—

The post title comes from “Twitcher” by Scorn, the first track off the 1997 Zander.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.