Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Another post looking at Twitter data in R. It follows this one and this one.
I wanted to look again at my tweeting frequency over the 12 years on Twitter, but this time do it in a calendar view. Something like a GitHub commit calendar would be perfect. I have used a library for this in the past. Here, for no particular reason I used a function that I found in this post (code at the end).
To generate these plots, it was a case of loading in the data (as described previously). I am analysing data from @clathrin and not from my quantixed Twitter account.
library(jsonlite) library(lubridate) library(ggplot2) library(dplyr) library(timetk) library(viridis) json_file <- "Data/tweets.js" json_data <- fromJSON(txt = json_file, flatten = TRUE) # make date/time column json_data$tweet_created_at <- as.POSIXct(json_data$tweet.created_at, format="%a %b %d %H:%M:%S %z %Y") # make factor for Tweet, Reply, RT json_data$tweet_type <- as.factor(ifelse( grepl("^RT ",json_data$tweet.full_text), "RT", ifelse( is.na(json_data$tweet.in_reply_to_user_id), "Tweet", "Reply")))
Then we need to generate the plots by summarising the data by day and using the calendarHeatMap()
function described below.
# Tweet calendar # all df_day <- json_data %>% summarize_by_time(.date_var = tweet_created_at, .by = "day", n = n()) p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "All tweets", subtitle = "@clathrin") ggsave("Output/Plots/all_calendar.png", p, width = 6, height = 18) # tweets only df_day <- json_data %>% filter(tweet_type == "Tweet") %>% summarize_by_time(.date_var = tweet_created_at, .by = "day", n = n()) p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "Tweets", subtitle = "@clathrin") ggsave("Output/Plots/tweets_calendar.png", p, width = 6, height = 18) # RTs only df_day <- json_data %>% filter(tweet_type == "RT") %>% summarize_by_time(.date_var = tweet_created_at, .by = "day", n = n()) p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "RTs", subtitle = "@clathrin") ggsave("Output/Plots/rts_calendar.png", p, width = 6, height = 18) # replies only df_day <- json_data %>% filter(tweet_type == "Reply") %>% summarize_by_time(.date_var = tweet_created_at, .by = "day", n = n()) p <- calendarHeatmap(as.Date(df_day$tweet_created_at), df_day$n, title = "Replies", subtitle = "@clathrin") ggsave("Output/Plots/replies_calendar.png", p, width = 6, height = 18)
As before, it’s possible to see my waning tweet frequency. I can also see that my hot days for tweeting were due to me sending lots of replies to people wishing me well on achievements. This type of view is great for that granularity – which was missing in the previous visualisations – but overall I think 12 years is too much for this calendar view and it works better in spans of up to 3-4 years.
Let’s dig a bit more into the data
Changing tack, the json_data
data frame imported from tweets.js
has the full text of all tweets. Inspired by this, I decided to have a look at what my most successful tweets were over this period.
# Top tweets # likes and RTs stored as character, convert to numeric json_data$tweet.favorite_count <- as.numeric(json_data$tweet.favorite_count) json_data$tweet.retweet_count <- as.numeric(json_data$tweet.retweet_count) # subset data frame for those tweets that had "interest" df_interest <- json_data[json_data$tweet.favorite_count >= 1 | json_data$tweet.retweet_count >= 1, ] nrow(df_interest) / nrow(json_data) p5 <- ggplot(df_interest, aes(x = tweet.favorite_count, y = tweet.retweet_count)) + geom_point(alpha = 0.2) + theme_bw() + scale_x_log10(limits = c(1,10000)) + scale_y_log10(limits = c(1,1000)) + labs(x = "Stars", y = "Retweets") ggsave("Output/Plots/tweet_RTs_Stars.png", p5)
About 30% of my tweets (this is a mixture of tweets, RTs and replies) received either at least 1 like or 1 RT. As expected there’s a correlation between likes and RTs of my tweets, with the lower energy Like being scoring higher than RT.
Let’s have a look at the best performing tweets.
top_tweets <- json_data[json_data$tweet.favorite_count > 100 | json_data$tweet.retweet_count > 100, ] top_tweets <- data.frame(Date = top_tweets$tweet_created_at, Tweet = top_tweets$tweet.full_text, Stars = top_tweets$tweet.favorite_count, RTs = top_tweets$tweet.retweet_count) top_tweets <- top_tweets[order(top_tweets$Stars, decreasing = TRUE),] write.table(top_tweets, "Output/Data/tt.txt",sep = "\r", row.names = FALSE)
The top tweets in (almost) all their glory
Here are the top tweets. I have removed the links and pictures (the URLs for pictures are in top_tweets
but I have not included them here).
2022-06-30 13:56:54
“Just had a discussion about breaking free of Adobe Illustrator and avoiding BioRender. We have found Inkscape and Bioicons to be very useful open alternatives.”
Likes 1431 RTs 252
2019-01-25 20:16:49
“Returning from a grant panel. I was struck how mainstream preprints are nowadays. Several proposals received support from the panel because a preprint was available. Flipside is that “submitted to Journal X” doesn’t cut it – the panel now ask “why is it not on bioRxiv?”. 1/2”
Likes 847 RTs 373
2019-12-03 16:34:00
“#ASCBEMBO19 Remember a fabric poster is for life, not just for the meeting. Turn yours into a lovely cushion or two after the meeting and get some comfort from your data. #recycling”
Likes 774 RTs 141
2021-10-22 09:57:08
“I spoke to a postdoc recently who said they don’t tweet because “no-one would care what I have to say”.
Can I just say that the number 1 thing that would improve my feed enormously is if ECRs would tweet more (about science, lab life, anything).”
Likes 466 RTs 54
2019-09-19 11:43:46
“It’s real! I just received an advance copy of my book “The Digital Cell”. Official release is 1st Dec.”
Likes 378 RTs 57
2019-05-31 18:56:52
“Confirmed by youngest lab member today: young people don’t use full stops at the end of text messages, and it is seen as rude to do so. However if the message is from a grown-up (his words, not mine) then they are not offended because they know we don’t know the etiquette.”
Likes 334 RTs 37
2019-04-05 07:00:41
“Love this photo of Sydney Brenner and the rest of the LMB Governing Board in 1967. He’s dressed down and trying to conceal a cigarette from the camera.”
Likes 303 RTs 60
2019-10-08 16:47:31
“I love doing science in 2019. Saw this bioRxiv preprint via twitter. Downloaded the data, integrated it with a dataset we have. Discussed next steps with the lab on Slack. Experiments planned. All in a day, and the paper is unpublished.”
Likes 231 RTs 46
2022-08-01 16:07:30
“Please take a look at our new paper on making clathrin-coated pits inside living cells. Out now @eLife We show that it’s possible to make CCPs using minimal machinery and that we can even do this on the mitochondria (yes, you read that correctly)”
Likes 221 RTs 48
2022-06-06 17:36:04
“After 7 years of #NotTheCover, my lab finally #GotTheCover! Congrats again to @Nufediaz Laura and George for their work on this fantastic paper. Huge thank you to @JCellBiol (not just for picking us for the cover) who were a pleasure to deal with.”
Likes 199 RTs 12
2019-03-27 09:38:28
“Thanks to everyone for their congrats by tweet, DM, email or phone.
And yes, that is a hand-drawn unicorn emoji… love the folks in my lab.”
Likes 179 RTs 3
2022-08-05 22:49:38
“What I think about when I read tweets about CSN papers…”
Likes 175 RTs 18
2019-12-08 12:37:46
“There’s a kerfuffle on Twitter about whether biologists should learn to code. Well, of course they should and – shameless plug time – I am here at #ASCBEMBO19 to launch my book “The Digital Cell” which is a handbook to help cell biologists use computers in their research. 1/n”
Likes 164 RTs 39
2017-12-04 07:11:53
“Kim Nasmyth: You have to do science because you want to know, not because you want to get recognition. If you do what it takes to please other people, you’ll lose your moral compass. #BreakthroughPrize”
Likes 147 RTs 57
2016-01-05 10:19:23
“Citation distributions for 22 journals including Cell, Nature and Science | quantixed”
Likes 140 RTs 201
2020-07-02 08:22:04
“NEW PREPRINT from @roylelab looking at interactions between mitotic spindle associated proteins in live cells.
A brief explainer 1/7″
Likes 139 RTs 52
2020-08-20 10:00:00
“NEW PREPRINT: Intracellular nanovesicles mediate integrin trafficking during cell migration
It’s the work of @GabrielleLaroc6 (while in @roylelab) with help from @PenelopeLaBorde, @BeverleyJWilson, Nick Clarke, Daniel Moore & @integrintraffic
1/n”
Likes 129 RTs 41
2019-07-03 08:49:06
“I might have tweeted this before, but the images in this paper from 2006 are amazing. Membrane organisation in mammalian cells is
Likes 128 RTs 20
2019-08-05 21:08:27
“Hmmm. NSMB asked 16 researchers for their views on the current and future of synaptic vesicle fusion research. All 16 are male – where are the women?”
Likes 118 RTs 34
2019-01-25 20:16:50
“This makes most difference for early career people where track record is not so clear. Makes sense: we were there to evaluate the science and preprints enable that, but I was surprised by how widespread this view was on the panel. 2/2”
Likes 117 RTs 20
2022-09-27 11:05:00
“Can anyone suggest a cell line that is:
- Human origin
- Easy to culture & transfect
- Great for microscopy (membrane traffic)
- Near diploid and/or stable karyotype
The lines we use @roylelab do not tick all boxes and we’re wondering what we’re missing!”
Likes 116 RTs 43
2022-10-06 20:04:35
“Golgi holds the front page. Happy now @CellBiol_MRCLMB ?”
Likes 113 RTs 13
2022-03-23 08:31:54
“NEW PREPRINT from @roylelab on intracellular clathrin-coated vesicle formation. It is the work of @pixycus with the help of @SittewelleM, @GabrielleLaroc6 and @MiguelHdez_3 (
Likes 104 RTs 29
2017-05-17 16:33:40
“JOIN US. 3 year Postdoc position @roylelab Membrane traffic, cell migration, imaging. Funded by MRC. Please RT”
Likes 86 RTs 164
2018-03-13 16:33:16
“Cell biologists! I have a vacancy for a post-doc to work on endocytosis @roylelab @Warwick_CMCB
Closing date is next week. Contact me with any questions. RTs would be nice!”
Likes 52 RTs 162
Conclusion
It is mind-boggling to me that 1.5K people would like a tweet about open source science tools. Compared to some people I know on Twitter, these numbers for engagement are pretty meagre. I have never tried to write tweets with the intention of “going viral”. I just tried to tweet about interesting stuff. Gratifyingly many of these “top tweets” were related to my science and to the lab.
Despite many years on the platform, I always found it very hard to predict which tweets would be well received. Initially, the success of a tweet was down to the time of day, the day of the week and how many followers you had. Latterly, it was driven algorithmically. I found that some of the very highest tweets had incredibly far reach as a result, and that – it turns out – is not entirely positive. Maybe more on that another time.
This post is my 30th for 2022 and thereby fulfils my posting goal for 2022. I hope to keep on postingand nerding out in 2023. Thanks for reading!
Code for calendar heatmap
Note, this is the work of Paul Bleicher. It is edited for viridis style colouring and to correct an off-by-one error.
#' Calendar Heatmap #' #' Creates a colour coded calendar visualising time series data #' #' @param dates A vector containing the dates in `Date` format. #' @param values A vector containing the corresponding values as numeric. #' @param title Main plot title (optional). #' @param subtitle Main plot subtitle (optional). #' @param legendtitle Legend title (optional). #' #' @return ggplot object calendarHeatmap <- function(dates, values, title = "", subtitle = "", legendtitle = ""){ # Parameter checks if(missing(dates)){ stop("Need to specify a dates vector.") } if(missing(values)){ stop("Need to specify a values vector.") } if(!is.Date(dates)){ stop("dates vector need to be in Date format.") } if(length(dates) != length(values)){ stop("dates and values need to have the same length.") } # load required packages require(ggplot2) my_theme <- function() { # Colors color.background = "white" color.text = "#22211d" # Begin construction of chart theme_bw(base_size=15) + # Format background colors theme(panel.background = element_rect(fill=color.background, color=color.background)) + theme(plot.background = element_rect(fill=color.background, color=color.background)) + theme(panel.border = element_rect(color=color.background)) + theme(strip.background = element_rect(fill=color.background, color=color.background)) + # Format the grid theme(panel.grid.major = element_blank()) + theme(panel.grid.minor = element_blank()) + theme(axis.ticks = element_blank()) + # Format the legend theme(legend.position = "bottom") + theme(legend.text = element_text(size = 8, color = color.text)) + theme(legend.title = element_text(size = 10, face = "bold", color = color.text)) + # Format title and axis labels theme(plot.title = element_text(color=color.text, size=20, face = "bold")) + theme(axis.text.x = element_text(size=12, color="black")) + theme(axis.text.y = element_text(size=12, color="black")) + theme(axis.title.x = element_text(size=14, color="black", face = "bold")) + theme(axis.title.y = element_text(size=14, color="black", vjust=1.25)) + theme(axis.text.x = element_text(size=10, hjust = 0, color = color.text)) + theme(axis.text.y = element_text(size=10, color = color.text)) + theme(strip.text = element_text(face = "bold")) + # Plot margins theme(plot.margin = unit(c(0.35, 0.2, 0.3, 0.35), "cm")) } # create empty calendar min.date <- as.Date(paste(format(min(dates), "%Y"),"-1-1",sep = "")) max.date <- as.Date(paste(format(max(dates), "%Y"),"-12-31", sep = "")) df <- data.frame(date = seq(min.date, max.date, by="days"), value = NA) # fill in values df$value[match(dates, df$date)] <- values df$year <- as.factor(format(df$date, "%Y")) df$month <- as.numeric(format(df$date, "%m")) df$doy <- as.numeric(format(df$date, "%j")) #df$dow <- as.numeric(format(df$date, "%u")) #df$woy <- as.numeric(format(df$date, "%W")) df$dow <- as.numeric(format(df$date, "%w")) df$woy <- as.numeric(format(df$date, "%U")) + 1 df$dowmapped <- ordered(df$dow, levels = 6:0) levels(df$dowmapped) <- rev(c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")) g <- ggplot(df, aes(woy, dowmapped, fill = value)) + geom_tile(colour = "darkgrey") + facet_wrap(~year, ncol = 1) + # Facet for years coord_equal(xlim = c(2.5,54)) + # square tiles scale_x_continuous(breaks = 53/12*(1:12)-1.5, labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) + my_theme() + scale_fill_gradientn(colours = c("#3b528b","#21918c","#5ec962","#fde725"), na.value = "white", name = legendtitle, guide = guide_colorbar( direction = "horizontal", barheight = unit(2, units = "mm"), barwidth = unit(75, units = "mm"), title.position = 'top', title.hjust = 0.5 )) + labs(x = NULL, y = NULL, title = title, subtitle = subtitle) my.lines<-data.frame(x=numeric(), y=numeric(), xend=numeric(), yend=numeric(), year=character()) for(years in levels(df$year)){ df.subset <- df[df$year == years,] y.start <- df.subset$dow[1] x.start <- df.subset$woy[1] x.top.left <- ifelse(y.start == 0, x.start - 0.5, x.start + 0.5) y.top.left <- 7.5 x.top.right <- df.subset$woy[nrow(df.subset)] + 0.5 y.top.right <- 7.5 x.mid.left01 <- x.start - 0.5 y.mid.left01 <- 7.5 - y.start x.mid.left02 <- x.start + 0.5 y.mid.left02 <- 7.5 - y.start x.bottom.left <- x.start - 0.5 y.bottom.left <- 0.5 x.bottom.right <- ifelse(y.start == 6, df.subset$woy[nrow(df.subset)] + 0.5, df.subset$woy[nrow(df.subset)] - 0.5) y.bottom.right <- 0.5 my.lines<-rbind(my.lines, data.frame(x = c(x.top.left, x.bottom.left, x.mid.left01, x.top.left, x.bottom.left), y = c(y.top.left, y.bottom.left, y.mid.left01, y.top.left, y.bottom.left), xend = c(x.top.right, x.bottom.right, x.mid.left02, x.mid.left02, x.mid.left01), yend = c(y.top.right, y.bottom.right, y.mid.left02, y.mid.left02, y.mid.left01), year = years)) # lines to separate months for (j in 1:12) { df.subset.month <- max(df.subset$doy[df.subset$month == j]) x.month <- df.subset$woy[df.subset.month] y.month <- df.subset$dow[df.subset.month] x.top.mid <- x.month + 0.5 y.top.mid <- 7.5 x.mid.mid01 <- x.month - 0.5 y.mid.mid01 <- 7.5 - y.month - 1 x.mid.mid02 <- x.month + 0.5 y.mid.mid02 <- 7.5 - y.month - 1 x.bottom.mid <- ifelse(y.month == 6, x.month + 0.5, x.month - 0.5) y.bottom.mid <- 0.5 my.lines<-rbind(my.lines, data.frame(x = c(x.top.mid, x.mid.mid01, x.mid.mid01), y = c(y.top.mid, y.mid.mid01, y.mid.mid01), xend = c(x.mid.mid02, x.mid.mid02, x.bottom.mid), yend = c(y.mid.mid02, y.mid.mid02, y.bottom.mid), year = years)) } } # add lines g <- g + geom_segment(data=my.lines, aes(x,y,xend=xend, yend=yend), lineend = "square", color = "black", inherit.aes=FALSE) return(g) }
—
The post title comes from “Twitcher” by Scorn, the first track off the 1997 Zander.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.