Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In previous Monitoring Productivity Experiment post I looked into the hours I spent in computer, now will look into the hours Others spend in computer, which is far more interesting 🙂 To find things like what day people spend more time on computer, how many hours they work, and general activity patterns.
Collecting data
In osx, is possible to use growl to display a message when a skype user signs in. So I configured growl to log the sign-in’s and sign-out’s of my skype contacts.
Like so:
> touch ~/Desktop/growl.log > defaults write com.Growl.GrowlHelperApp GrowlLoggingEnabled -bool YES > defaults write com.Growl.GrowlHelperApp GrowlLogType 1 > defaults write com.Growl.GrowlHelperApp "Custom log 1" ~/Desktop/growl.log
Instructions here.
And then I left my skype signed in for a few weeks, while I was on vacations. (not the best energy saving approach i know…)
Parsing data
Read the log file and create a semicolon separated file:
#!/usr/bin/env ruby puts "timestamp;user;status" File.open(ARGV[0]).each_line do |l| if l.include? "online" or l.include? "offline" date = l.split('Skype')[0].strip user = l.scan(/Skype:([^\(]*)/)[0][0].strip status = l.include?("online") ? "online" : "offline" puts "#{date};#{user};#{status}" end end
Load it into R
data = read.csv("/my/proj/skype-growl/log.csv", sep=";", header=TRUE) # parse dates "Aug 24, 2011 3:58:01 PM" data$date = as.POSIXct(strptime(data$timestamp,"%b %d, %Y %I:%M:%S %p")) # DateTime data$hour = format(data$date, format="%H:%M:%S") # string data$time = as.POSIXct(data$hour, format = "%H:%M:%S") # DateTime data$day = format(data$date, format="%m/%d/%y") # string data$weekday = format(data$date, format="%A") # string # filter for complete days of data data = sqldf("select * from data where day >= '08/25/2011' and day <= '09/21/2011'") sqldf("select count(distinct(day)) from data")
27 days of data.
The sign-in’s and sign-out’s of a random person
randomperson = sqldf("select user from data group by random() limit 1") d = sqldf(sprintf("select * from data where user = '%s' and day >= '09/04/2011' and day <= '09/12/2011'", randomperson[1,1])) ggplot(data=d, aes(y=time, x=date)) + geom_point(aes(color=status), alpha=0.6) + scale_x_datetime(major = "1 days") + scale_y_datetime(major = "1 hours")
- 10-Sep is Saturday and 11-Sep is Sunday, means skype was off on the weekend
- start of workday between 9h-11h
- end of workday between 18h-19h
- skype is offline after working hours, except on night of Monday 05-Sep
Online Activity Patterns
Plotting all sign-in’s and sign-out’s over each weekday we can get a feeling for overall online activity:
ggplot(data, aes(x=time,..density..)) + geom_histogram() + facet_grid(weekday ~ .)
- Night time has less activity, and gets progressively smaller as night goes by
- Around 9am activity spikes (people start work?)
- Around 17h/18h activity spikes (people ending work?)
- The 9h & 17h spikes are not so well formed in the weekend, thus very likelly connected to work
- Sundays after dinner time(or so) people seems to start get online again before going to sleep
- On weekends computer gets more use later in the day
How many hours people work?
More tricky to accurately measure but we can have a guess:
- assuming that people are working during working hours of workdays
- assuming that nobody start works before 6am, and nobody ends work after 21pm
Then, the first activity after 6am is start of work, and the last activity change before 21pm is the end of work.
d = sqldf("select user, day, weekday, min(hour) as start, max(hour) as end from data where hour >= '06:00:00' and hour <= '21:00:00' and weekday <> 'Saturday' and weekday <> 'Sunday' group by user, day") d$totalhours = difftime(as.POSIXct(d$end, format = "%H:%M:%S"), as.POSIXct(d$start, format = "%H:%M:%S")) d$totalhours = as.numeric(d$totalhours, units="hours") # excude less than 2 hours/day, means bots, vacations, etc... dt = sqldf("select * from d where totalhours > 2") al3x.load() # my own collection of R functions al3x.hist(dt, "totalhours")
Workday total hours are mostly between 6 and 12 hours, most common being the 8.5 hours/day.
Which day people spend more time in computer?
We can try counting the amount of sign-in’s/sign-out’s changes per day, means people are more likely to be in computer.
d = sqldf("select weekday, count(status) as amount from data group by weekday order by sum(time) DESC") ggplot(d, aes(x=weekday,y=amount)) + geom_bar(stat="identity")
As the above could be biased in a number of ways lets use another way to measure it and if the results match then original estimate should be ok.
For example, way to go about it is to sum up the total working hours for each day:
d = sqldf("select user, day, weekday, min(hour) as start, max(hour) as end from data where hour >= '06:00:00' and hour <= '21:00:00' group by user, day") d$totalhours = difftime(as.POSIXct(d$end, format = "%H:%M:%S"), as.POSIXct(d$start, format = "%H:%M:%S")) d$totalhours = as.numeric(d$totalhours, units="hours") sqldf("select weekday, sum(totalhours) as amount from d group by weekday order by sum(totalhours) DESC")
Getting:
weekday amount 1 Tuesday 15404.471 2 Wednesday 15191.946 3 Monday 14298.472 4 Friday 12426.091 5 Thursday 11638.443 6 Saturday 5222.874 7 Sunday 5198.367
Almost same results, great.
Thus Tuesday is the day people spend more time in computer, and in decreasing order:
Tuesday > Wednesday > Monday > Friday > Thursday > (Saturday or Sunday)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.