Adventures in Excel: Filtering and grouping data

[This article was first published on Adam’s Software Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

This started off as a simple experiment and mushroomed (a little). I have a card account, and I wanted to know how much I owed for the previous and current month. Simple enough. And it was, until I started to think about different possible ways to do this. In this post I describe five different approaches that answer the question: “How much do I owe in May, Jun, Jul, etc.?” The spreadsheet can be downloaded from here: AccountQueries.xlsm

The data

The data consists of a list of card transactions with the date and some details. It is defined as an Excel table named ‘TableTransactions’. This makes it easy to reference in formulas and functions requiring a named range/table. The columns of interest here are the ‘Date’ (specifically the month) and the (sum of) the ‘Withdrawal’ column: i.e. how much is owed for that month.

TableTransactions

To answer the question “How much do I owe?” we could adopt a manual approach and simply select the cells we want to sum (as in the screenshot). This works, but is not reusable and not great when you have more data. I was looking for approaches that would work as the data is extended. This is what I came up with.

1. An Excel Formula

Using an Excel formula is the simplest and most obvious approach. Despite this simplicity, the formula is a little involved: =SUM(CHOOSECOLS(FILTER(TableTransactions,TEXT(TableTransactions[Date],"mmm")=B3, 0),4)).

ExcelFormula

Starting with the innermost formula and working outwards: TEXT(TableTransactions[Date],"mmm") takes the date and extracts the (short) month part of the date as text. This is compared to the input month (ignoring months where we haven’t spent anything). This condition is part of the FILTER formula, which returns all the rows from TableTransactions where the condition is met. We only want the ‘Withdrawal’ column, so we restrict the filtered results using CHOOSECOLS to select column 4 as the column of interest. Finally, we use SUM to aggregate the amounts in this column. There are several variations of this approach using other Excel formulas (e.g. XLOOKUP, INDEX, and MATCH). However, this formula answers the question we wanted.

2. A Lambda Expression

This is an extension of the previous approach. We can write an expression using LAMBDA which takes a single parameter – the short month name – and passes it to the SUM(CHOOSECOLS(...)) formula and evaluates it. Furthermore, we can give this a name like AmountOwedLambda(...) using the Name Manager.

Name Manager

Now we have a reusable named function, which seems quite neat. However, it is still rather opaque when you view the formula in the Name Manager.

3. A Pivot Table

In some ways using a pivot table seems the most natural Excel-way to summarise the data. It is very simple to create. Select Insert > PivotTable > From Table/Range. For the rows, we select Months and for the Values we sum the ‘Withdrawal’ column – which we rename to ‘Amount Owed’.

PivotTable

That is all there is to it. This is really simple and extends well for other months (and other dates).

4. A Power Query

Rather than use a PivotTable to summarise the data, we can use PowerQuery. There are (at least) two variations on this. The first is simply to specify the filter/group clauses in PowerQuery. This is done in ‘Monthly Summary I’. The exact steps are recorded in M-language statements. In this case, we clean up the Date column to produce a column with a short month name, and then group by that month. In a custom column we sum by the ‘Withdrawal’ column. Once we have finished defining the steps, we get back a table in a worksheet.

PowerQuery

The results are more or less the same as with the Pivot Table. Any updates to the original transactions table are reflected in the ouput of the PowerQuery after doing a refresh.

The second approach is to prepare a filtered query. This is somewhat fiddly. First you need a table (or a named range) representing the input parameter. Once this is defined in PowerQuery, put it to one side. Then define the PowerQuery like the basic table but filtered with a hardcoded month. Finally, edit the M language query to use the input parameter in the query instead of the hardcoded month.

FilteredQuery

In the worksheet, we just enter a month and press refresh, and the results are returned. Full details about how this is set up are given in the references section of the worksheet.

5. A VBA Function

One final approach is to write a VBA function. For this we need to create a VBA project, add a module that contains the code, and save the file as a macro-enabled workbook. This may seem like overkill for what I orginally wanted. However, there are some advantages. It is flexible, allowing us to decide how to treat the input – in this case converting a short month name to a month number – and it allows us to break down the parts of the formula into a loop (for better or worse).

VBACode

It is also useful in that the function (the code) is reasonably well-documented.

Wrap up

So there it is. Five ways to achieve pretty much the same thing in Excel: Excel formulas, a custom Lambda, a pivot table, PowerQuery and finally VBA. Each has advantages and drawbacks. But it doesn’t stop there. We could just as easily use SQL, or perhaps even better, PowerPivot and a DAX query. And for a really over-the-top solution, we could consider using the excellent Excel-DNA library to write a user-defined function in C#.

To leave a comment for the author, please follow the link and comment on their blog: Adam’s Software Lab.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)