Site icon R-bloggers

When Less is More: Data Tables That Make a Difference

[This article was first published on Rstats – OUseful.Info, the blog…, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous post, From Visual Impressions to Visual Opinions, I gave various examples of charts that express opinions. In this post, I’ll share a few examples of how we can take a simple data table and derive multiple views from it that each provide a different take on the same story (or does that mean, tells different stories from the same set of "facts"?)

Here’s the original, base table, showing the recorded split times from a single rally stage. The time is the accumulated stage time to each split point (i.e. the elapsed stage time you see for a driver as they reach each split point):

From this, we immediately note the ordering (more on this in another post) which seems not useful. It is, in fact, the road order (i.e. the order in which each driver started the stage).

We also note that the final split is not the actual final stage time: the final split in this case was a kilometer or so before the stage end. So from the table, we can’t actually determine who won the stage.

Making a Difference

The times presented are the actual split times. But one thing we may be more interested in is the differences to see how far ahead or behind one driver another driver was at a particular point. We can subtract one driver’s time from anothers to find this difference. For example, how did the times at each split compare to first on road Ogier’s (OGI)?

Note that we can “rebase” the table relative to any driver by subtracting the required driver’s row from every other row in the original table.

From this “rebased” table, which has fewer digits (less ink) in it than the original, we can perhaps more easily see who was in the lead at each split, specifically, the person with the minimum relative time. The minimum value is trivially the most negative value in a column (i.e. at each split), or, if there are no negative values, the minimum zero value.

As well a subtracting one row from every other row to find the differences realative to a specified driver, we can also subtract the first column from the second, the second from the third etc to find the time it took to get from one split point to the next (we subtract 0 from the first split point time since the elapsed time into stage at the start of the stage is 0 seconds).

The above table shows the time taken to traverse the distance from one split point to the next; the extra split_N column is based on the final stage time. Once again, we could subtract one row from all the other rows to rebase these times relative to a particular driver to see the difference in time it took each driver to traverse a split section, relative to a specified driver.

As well as rebasing relative to an actual driver, we can also rebase relative to variously defined “ultimate” drivers. For example, if we find the minimum of each of the “split traverse” table columns, we create a dummy driver whose split section times represent the ultimate quickest times taken to get from one split to the next. We can then subtract this dumny row from every row of the split section times table:

In this case, the 0 in the first split tells us who got to the first split first, but then we lose information (withiut further calculation) about anything other than relative performance on each split section traverse. Zeroes in the other columns tell us who completed that particular split section traverse in the quickest time.

Another class of ultimate time dummy driver is the accumulated ultimate section time driver. That is, take the ultimate split sections then find the cumulative sum of them. These times then represent the dummy elapsed stage times of an ultimate driver who completed each split in the fastest split section time. If we rebase against that dummy driver:

In this case, there may be only a single 0, specifically at the first split.

A third possible ultimate dummy driver is the one who “as if” recorded the minimum actual elapsed time at each split. Again, we can rebase according to that driver:

In this case, will be at least one zero in each column (for the driver who recorded that particular elapsed time at each split).

Visualising the Difference

Viewing the above tables as purely numerical tables is fine as far as it goes, but we can also add visual cues to help us spot patterns, and different stories, more readily.

For example, looking at times rebased to the ultimate split section dummy driver, we get the following:

We see that SOL was flying from the second split onwards, getting from one split to another in pretty much the fastest time after a relatively poor start.

The variation in columns may also have something interesting to say. SOL somehow made time against pretty much every between split 4 and 5, but in the other sections (apart from the short last section to finish), there is quite a lot of variability. Checking this view against a split sectioned route map might help us understand whether there were particular features of the route that might explain these differences.

How about if we visualise the accumulated ultimate split section time dummy driver?

Here, we see that TAN was recording the best time compared the ultimate time as calculated against the sum of best split section times, but was still off the ultimate pace: it was his first split that made the difference.

How about if we rebase against the dummy driver that represents the driver with the fastest actual recorded accumulated time at each split:

Here, we see that TAN led the stage at each split point based on actual accumulated time.

Remember, all these stories were available in the original data table, but sometimes it takes a bit of differencing to see them clearly…

Based on Visualising WRC Rally Timing and Results Data, Chapter 7: Working With Split Times (includes R code). Source repo: RallyDataJunkie/visualising-wrc-rally-results

To leave a comment for the author, please follow the link and comment on their blog: Rstats – OUseful.Info, the blog….

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.