Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post, we will take a look back at 2020, and analyze my step count data to understand some of the impacts that the COVID-19 crisis had on my walking behavior during that crazy year.
The Data
Step Counts & Measurement Devices
The step count data come from 2 sources in 2020 – I had a Fitbit for the first 8 months of the year, but it died in August. At that point, I switched to the Mi Band 5, which recorded my steps from the second half of August through the end of the year. There was a period of about 2.5 weeks where my step counts were not recorded – in between the time when my Fitbit died and when I got the Mi Band. In total, we have 345 observations of daily total step counts from 2020.
Both step count data sources are accessible (with a little work) via R: you can see my write up of how to access data from Fitbit here and my post on how to access data from the Mi Band here.
Time Period: Pre-Covid vs. Covid
The major event this past year that re-organized nearly all aspects of our lives was the COVID-19 pandemic. The pandemic and related rules and regulations shifted my movement quite a bit. Since March of 2020, for example, I have been working from home, and most of my activities have been done on foot, rather than by car.
In order to understand the differences between the pre-COVID and COVID periods, we will look at differences in step counts before the beginning of the first shutdowns of schools, restaurants, and public assembly, which occurred on March 14th, 2020 in the country where I live. All observations which occurred before this date are considered as the pre-COVID period, while all observations on or after this date are considered as the COVID period.
You can find the data and all the code from this blog post on Github here.
The head of the dataset (named daily_data) looks like this:
date | daily_total | dow | week_weekend | device | month | time_period |
---|---|---|---|---|---|---|
2020-01-01 | 16903 | Wed | Weekday | Fitbit | 1 | pre_covid |
2020-01-02 | 16707 | Thu | Weekday | Fitbit | 1 | pre_covid |
2020-01-03 | 18046 | Fri | Weekday | Fitbit | 1 | pre_covid |
2020-01-04 | 18262 | Sat | Weekend | Fitbit | 1 | pre_covid |
2020-01-05 | 16172 | Sun | Weekend | Fitbit | 1 | pre_covid |
2020-01-06 | 12009 | Mon | Weekday | Fitbit | 1 | pre_covid |
2020-01-07 | 16923 | Tue | Weekday | Fitbit | 1 | pre_covid |
2020-01-08 | 11248 | Wed | Weekday | Fitbit | 1 | pre_covid |
2020-01-09 | 18335 | Thu | Weekday | Fitbit | 1 | pre_covid |
2020-01-10 | 12539 | Fri | Weekday | Fitbit | 1 | pre_covid |
Average Daily Step Count Per Week Across 2020
One of the complicated things about visualizing a year’s worth of step count data is that there are a lot of data points – too many to plot individually and extract high-level take aways from the data. Therefore, my first analysis was of the average daily step counts per week.
The chart below shows the average daily step counts for each week of the year. I first group the data by week (automatically extracted from the date column using the lubridate package). For each week, I calculate the average number of steps per day, and also determine the month that the start of the week took place in. I then make a bar chart, displaying the averages per week across the course of the year. I color the bars according to month, and add a dashed vertial line during the week of the first COVID lockdown.
The code to produce this plot looks like this:
And produces the following plot:
It looks like the week after the lockdown, I was walking a lot less than the previous couple of weeks. However, from the second week of the COVID period, my average step counts increased quite a bit. This matches my memory of this time period – staying inside for a week, but then going a bit stir crazy and getting outside to move around as much as possible. By this point, I was no longer commuting to work, and so it was easier to make time to get outside. The days were getting longer and the weather was nicer than normal, and I seem to have taken advantage of this in April and May.
There is a gap of 2.5 weeks at the beginning of August. As I note above, this was during the period after my Fitbit died, but before my Mi Band 5 had arrived. The step counts for September, October and November (the first months with the Mi Band) appear to be lower than those of the previous months (where the step counts were measured via Fitbit).
A Simple Model of Daily Step Counts in 2020
In order to disentangle the impact of these factors, I made a simple regression model of my daily step counts. The predictors were the various factors we have discussed so far: time period (pre-COVID vs. COVID), measurement device (Fitbit vs. Mi Band), and whether the day was a weekday or a weekend (I know from previous analyses of my steps that the patterns are quite different on weekdays and weekends).
We can run this model and request the results with the following code:
Which returns this summary table:
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 16150.9 | 417.85 | 38.65 | 0 |
time_periodpre_covid | -1654.35 | 661.48 | -2.5 | 0.01 |
deviceMiBand | -2797.26 | 556.67 | -5.03 | 0 |
week_weekendWeekend | 3014.81 | 547.69 | 5.5 | 0 |
Observations | 345 | |||
R2 / R2 adjusted | 0.142 / 0.134 |
Lots of information to unpack here! Let’s go through the coefficients and interpret them to understand my walking patterns in 2020.
- Intercept: The intercept value is 16150.9. This is my average daily step count, when the values of all the other variables in the model are zero / at their reference categories (e.g. days during COVID, recorded with the Fitbit, and weekdays). Another way of saying this is that, on average, I walked 16150.9 steps during weekdays in the COVID period when I was measuring steps with the Fitbit.
- time_periodpre_covid: This is a dummy variable that represents the comparison between the pre-COVID and the COVID periods. The value given in the table represents the average value of the pre-COVID period compared to the COVID period. In other words, keeping all the other variables in the model constant, I walked 1654.4 steps fewer during the pre-COVID period vs. during the COVID period. In short, in 2020, I walked more during COVID than I did before the pandemic!
- deviceMiBand: This is a dummy variable that represents the comparison between measurement devices (e.g. Fitbit vs. Mi Band). The value of -2797.3 means that, keeping all the other variables in the model constant, I walked on average 2797.3 fewer steps per day when the data were recorded with the Mi Band. It is unclear whether this is due to measurement differences between the fitness trackers, or whether something changed in my walking behavior during the period when I got the Mi Band. Note that the difference between the time period (COVID / pre-COVID) is +1654 steps, whereas the difference between the devices (Fitbit / Mi Band) is -2797 steps; this means that, the bump seen during COVID (+1654) is erased when I change tracking devices (because the Mi Band average is 2797 steps lower than the Fitbit average).
- week_weekendWeekend: This is a dummy variable that represents the comparison between weekdays and weekends. The value of 3014.8 means that, keeping all the other variables in the model constant, I walked on average 3014.8 steps more per day on weekends vs. weekdays. We saw this same pattern in my previous blog post about extracting data from the Mi Band.
Note that the R squared values for this model are not very high – there is clearly a great deal of variation in my step counts that is not explained by the few predictors in our model. Let’s calculate some basic model error metrics using a function I described in a previous post:
This code returns the following to the console:
Our root mean squared error is 4560.47 and our mean absolute error is 3289.29. Not huge values in an absolute sense, but it must be noted that the MAE is around 20% of the intercept value from our above linear model (e.g. 3289/16151 is about 20%).
To get a better sense of the model performance, let’s plot out the daily observations and the model predidictions on the same graph.
We first need to add the predictions to our data set, which we can do like this:
Our new data frame, called daily_data_predict, looks like this:
date | daily_total | dow | week_weekend | device | month | time_period | fit | lwr | upr |
---|---|---|---|---|---|---|---|---|---|
2020-01-01 | 16903 | Wed | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-02 | 16707 | Thu | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-03 | 18046 | Fri | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-04 | 18262 | Sat | Weekend | Fitbit | 1 | pre_covid | 17511.36 | 16197.24 | 18825.47 |
2020-01-05 | 16172 | Sun | Weekend | Fitbit | 1 | pre_covid | 17511.36 | 16197.24 | 18825.47 |
2020-01-06 | 12009 | Mon | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-07 | 16923 | Tue | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-08 | 11248 | Wed | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-09 | 18335 | Thu | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
2020-01-10 | 12539 | Fri | Weekday | Fitbit | 1 | pre_covid | 14496.54 | 13400.06 | 15593.03 |
We can now make our plot. We will plot the daily step count totals for all 345 days in our dataset, which is a lot of points – too many, in my opinion, to easily pick up the patterns revealed by the regression model we calculated above. However, we can plot the result of the model predictions on top of the points, which will show us visually what the coefficients in our above table mean. Furthermore, by comparing the distance between the points and the regression line, we can get a visual sense for the predictive performance of the model.
We can make the plot with the following code:
Which returns the following plot:
This plot is a nice complement to the regression table above. We see the impact of all of our predictor variables quite clearly. The predicted daily step count increases after the first COVID lockdown (shown, as above, with a vertical striped red line), the predicted daily step counts for the weekends (upper line, maroon color) are higher than the predicted step counts for the weekdays (lower line, green color), and the predicted daily step counts for the period where I had the Mi Band (from the end of August til the end of the year) are lower than the predicted step counts for the period where I had the Fitbit (January until July).
Furthermore, this plot gives some perspective on the model performance. The plot shows clearly the basic patterns mentioned above, but also shows there is a great deal of variation in my daily step counts that is not explained by the variables in the model (indeed, the R2 in the tables above suggests that the regression model explains only 13% of the variance in step counts). On average, our predictions are off by 3289 steps; the plot gives a visual representation of the scale of the differences between the actual step counts vs. the predictions across the entire year.
Summary and Conclusion
In this post, we used data from two different step trackers (Fitbit and Mi Band) in order to understand my walking patterns in 2020. We first looked at the daily average step counts per week across the entire year and saw indications that I walked more during the pandemic than before it. We then made a basic regression model to quantify the differences across time periods (pre-COVID vs. COVID), trackers (Fitbit vs. Mi Band) and type of day (weekday vs. weekend).
My main takeaways from this analysis are:
-
The COVID-19 pandemic seems to have had an impact on my walking behavior, such that once the pandemic started, I ended up walking more. Indeed, after the first lockdown, I stopped commuting to work and did most of my activities by foot rather than by car. Furthermore, at many points during the pandemic and resulting lockdowns, physical exercise was one of the only valid reasons to leave the house. I definitely took advantage of this and ended up walking more as a result.
-
My daily step counts were lower when I was wearing the Mi Band vs. the Fitbit. However, it’s not quite certain why. This article from Wirecutter (a review site run by the New York Times) shows the results of tests that suggest that the Fitbit I had (Fitbit HR) counts more steps than the Mi Band, so part of the difference could be due to this. However, I started wearing the Mi Band towards the end of the summer holiday period, which was followed by a return to work and the re-opening of the schools, and it’s possible that my schedule changed in such a way that I walked fewer steps during this time.
-
I walk more during the weekends than during the week days. I saw this in a previous analysis of some of these data, but this analysis of the entire year of 2020 confirms this fact. As I mentioned last time, this is a reversal from my walking patterns 4 years ago. Given that I worked from home for much of 2020, it makes sense that I walked less during the weekdays. It’s hard to accumulate many steps when your desk is 50 steps from your bedroom!
Coming Up Next
In the next post, we will analyze data from my music collection, and examine song tempos across the course of an album. How do artists sequence their albums – with fast or slow songs at the beginning, middle, or end? Is this sequencing the same or different across different music genres? We’ll explore these questions and more in the next post.
Stay tuned!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.