Donor analysis in R – Smith for Congress
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a previous post I introduced the Smith for Congress data set. The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign.
Individual contributions are not required to be disclosed by a campaign unless the individual donates more than $200 during a single electoral cycle. The Smith for Congress campaign has, for their own reasons, published every individual contribution. This disclosure allows us an unprecedented look into how a modern campaign raises money. I’ve collected and scrubbed these contributions and published them for research use. In this post I will perform a detailed donor analysis on with R to better understand how the Smith for Congress campaign financed its 2010 election. Full code and graphs can be found on the simple-analysis github repository for this post:
Prepartion
We need to download the data and load it into R. The latest data can always be downloaded from: Smith for Congress Latest
# latest smith for congress data as of this writing is March 23 2011. cd <- read.csv("smithforcongress-03232011.csv") #subset the data to just the 2010 cycle cd0 <- cd[cd$cycle == 2010,] # clean up a date variable, and drop amounts < $1. cd$contribution_date <- as.Date(cd$contribution_date,format="%m/%d/%Y") cd0 <- cd0[-which(cd0$amount < 1),] |
Data for the 2010 electoral cycle consists of 11,721 contributions made by 6949 individuals, totaling over $770,000. Here is a sample:
personid | amount | ctd_aggregate | contribution_date | cycle |
---|---|---|---|---|
9zvlnzw1qj9bvq7k1x47v486a | 10 | 20 | 2009-04-01 | 2010 |
iy8xcopedihv9vwqpg3iwmal | 15 | 35 | 2009-04-01 | 2010 |
1f0lct995ckygk6y4vaxk2q44 | 20 | 20 | 2009-04-01 | 2010 |
bf2d43vdjdg07pgfmph6ghy7o | 20 | 20 | 2009-04-01 | 2010 |
7sj05z74r8y10fcctvx4a38pn | 20 | 20 | 2009-04-01 | 2010 |
Data Summary
Since the number of individual donors (6,949) is so much lower than the number of contributions (11,717) we can guess a good portion of those donors gave multiple times. The long-form contribution data is somewhat difficult to work when looking at multiple contributions from the same person. We’ll generate a summary data frame to help with our analysis. The following variables will be captured per individual donor:
- Date of first contribution
- The total value of all contributions by this individual
- The total number of contributions by this individual
- The amount of the first three contributions. Blank or NA if they have made less than 3 contributions.
- The difference in time for the first three contributions. Blank or NA if they have made less than 3 contributions.
summarize.contributions <- function(x) { xo <- x[order(x$contribution_date),] dtx <- as.integer(diff(x$contribution_date)) return(data.frame( first.contribution=xo$contribution_date[1], num.contributions = nrow(xo), dt1=dtx[1], dt2=dtx[2], dt3=dtx[3], am1=xo$amount[1], am2=xo$amount[2], am3=xo$amount[3], total.value=sum(x$amount) )) } cd0s <- ddply(cd0, "personid", summarize.contributions) |
Now the cd0s data frame holds our summary table, which looks like this:
personid | first.contribution | num.contributions | dt1 | dt2 | dt3 | am1 | am2 | am3 | total.value |
---|---|---|---|---|---|---|---|---|---|
1023ryaqqbvz76kh3yq0r2ngq | 2010-10-18 | 1 | NA | NA | NA | 25 | NA | NA | 25 |
1036lg58hd4skceuyqrr2peb4 | 2010-03-25 | 2 | 166 | NA | NA | 35 | 25 | NA | 60 |
106f366ysq6xe9ci731wejh0k | 2009-12-11 | 4 | 91 | 185 | 63 | 50 | 50 | 50 | 250 |
1081wyujzkgninrt1srf79tbo | 2009-08-27 | 3 | 58 | 114 | NA | 25 | 30 | 10 | 65 |
1094yhx62fcdx3c012mlpxnex | 2009-10-15 | 1 | NA | NA | NA | 1000 | NA | NA | 1000 |
Giving Levels
With detailed giving levels we can infer a lot of information about a campaign, and about how the fundraisers are doing their jobs. If most of the giving was in the $15-20 range we can assume they focus on small donors and maybe online contributions. If most of the giving is in the $100-250 range then maybe the campaign throws lots of medium sized dinners. If most of the donations are close to the legal maximum of $4800 then the campaign is focused on major donors, and might be ignoring smaller donors all together.
Plotting a histogram of total donation amount per individual will give us better insight into the giving levels.
> qplot(total.value,data=cd0s,geom="histogram",binwidth=50) nrow(cd0[cd0$amount<250,]) / nrow(cd0) summary(cd0s$total.value) |
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | 1 | 25 | 50 | 111 | 100 | 4800 |
---|
In 2010, 75% of contributors gave $100 or less total to the campaign. The summary table shows us the median total value donated was $50, while the overall average was $111. The maximum was $4800, which is also the maximum allowed by law for 2010. We can infer that while there was certainly some major-donor solicitation, the fundraisers were focused on much smaller donors.
Repeat donors
Now that we know more about giving levels, it would be helpful to better understand giving frequency. The amount of repeat giving may give us insight in to how involved the fundraisers are getting, and maybe even how often they are asking for money.
We’ll use a histogram and a cross-tab of the total number of contributions by individuals to help us with this analysis:
qplot(num.contributions,data=cd0s,geom="histogram",binwidth=1) table(cd0s$num.contributions) |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 13 | 14 | 18 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4242 | 1599 | 621 | 256 | 120 | 60 | 28 | 7 | 7 | 5 | 1 | 1 | 1 | 1 |
Our plot and table shows about two thirds (61%, 4,242) of the contributors to Smith for Congress only gave one time, leaving 2,707 people who gave more than once. Most of the people who gave more than once gave twice, but there were still several hundred people who gave 3 or 4 times each.
To understand how important repeat giving might be we need more detailed information. We need to look at the total amount donated by each group of contributors; we’ll also include the cumulative total, cumulative percentage, and individual percentage of total for each group.
gft <- ddply(cd0s,"num.contributions",function(x) { data.frame(total=sum(x$total.value),n=nrow(x))}) gft$percent <- gft$total / sum(gft$total) * 100 gft$running.total <- cumsum(gft$total) gft$running.percent <- gft$running.total / sum(gft$total) * 100 |
Our gft data frame looks like this:
num.contributions | total | n | percent | running.total | running.percent |
---|---|---|---|---|---|
1 | 284043 | 4242 | 36.821 | 284043 | 37 |
2 | 212697 | 1599 | 27.572 | 496740 | 64 |
3 | 118998 | 621 | 15.426 | 615738 | 80 |
4 | 72197 | 256 | 9.359 | 687935 | 89 |
5 | 43513 | 120 | 5.641 | 731448 | 95 |
6 | 24428 | 60 | 3.167 | 755876 | 98 |
7 | 4825 | 28 | 0.625 | 760701 | 99 |
8 | 3988 | 7 | 0.517 | 764689 | 99 |
9 | 4340 | 7 | 0.563 | 769029 | 100 |
10 | 990 | 5 | 0.128 | 770019 | 100 |
13 | 167 | 1 | 0.022 | 770186 | 100 |
14 | 675 | 1 | 0.088 | 770861 | 100 |
18 | 360 | 1 | 0.047 | 771221 | 100 |
20 | 200 | 1 | 0.026 | 771421 | 100 |
We see the campaign raised $284,000 (36.8% of the total raised) from the 4,242 contributors that gave only once, and $212,000 (27.5% of the total raised) from the 1,599 contributors who gave two times. We also see the campaign raised $487,378 from 2,702 repeat donors; that is almost 64% of the total value raised for the entire cycle from individuals. It is obvious the Smith for Congress campaign is good at attracting small dollar donors, one-third whom gave more man once. This is a pretty impressive repeat donor rate.
Finally I’d like to look at what kind of donations make up each level of giving. We know repeat donors gave $487,000, but we don’t know if that was mostly in $50 donations or in $250 donations. We can use a box and whisker plot to break down each giving level. I’m leaving off contribution levels 8 – 14 since giving was so sparse at those levels. We’ll be plotting this histogram with a log transform on the y axis since few very large values will skew graph and render it mostly useless. I used a trick from this stack overflow thread to get the formatting correct on the Y axis:
formatBack <- function(x) paste(round(10^x, 2), "$", sep=' ') qplot(factor(num.contributions),log10(total.value),data=cd0s[cd0s$num.contributions < 8,],geom="boxplot",ylab="Total Value (log)",xlab="Giving Frequency",main="Giving Levels by Giving Frequency, Smith for Congress 2010") + scale_y_continuous(formatter=formatBack) # same data, but in table format ddply(cd0s,"num.contributions",function(x) { data.frame(total=sum(x$total.value),n=nrow(x), min=min(x$total.value),mean=mean(x$total.value), median=median(x$total.value),std=sd(x$total.value),max=max(x$total.value))}) |
num.contributions | total | n | min | mean | median | std | max |
---|---|---|---|---|---|---|---|
1 | 284043 | 4242 | 1 | 67 | 35 | 149 | 2400 |
2 | 212697 | 1599 | 2 | 133 | 70 | 280 | 4800 |
3 | 118998 | 621 | 4 | 192 | 105 | 299 | 3800 |
4 | 72197 | 256 | 20 | 282 | 144 | 443 | 3800 |
5 | 43513 | 120 | 5 | 363 | 175 | 616 | 4129 |
6 | 24428 | 60 | 30 | 407 | 168 | 749 | 4700 |
7 | 4825 | 28 | 33 | 172 | 175 | 103 | 475 |
8 | 3988 | 7 | 80 | 570 | 160 | 1094 | 3048 |
9 | 4340 | 7 | 90 | 620 | 225 | 627 | 1450 |
10 | 990 | 5 | 100 | 198 | 200 | 72 | 280 |
13 | 167 | 1 | 167 | 167 | 167 | NA | 167 |
14 | 675 | 1 | 675 | 675 | 675 | NA | 675 |
18 | 360 | 1 | 360 | 360 | 360 | NA | 360 |
20 | 200 | 1 | 200 | 200 | 200 | NA | 200 |
This latest plot and table are both incredibly text heavy, but this is the critical intelligence required to start a fundraising plan.
We see the average total contribution increases with the giving frequency, this makes sense. The average increases in an approximately linear fashion which suggests the individual contribution amounts are staying constant. This may be a function of some campaign fundraising tactic, like “donate $35 now for a free tshirt.” We can also get a sense of how much success the Smith for Congress major donor program enjoys. An individual can legally donate $2,400 for both a primary and a general election per cycle. We can count how many individuals have maxed out at $4800 and measure how much impact the major donors have on the total amounts raised:
# how many individuals gave the max for one election nrow(cd0s[cd0s$total.value == 2400,]) nrow(cd0s[cd0s$total.value == 4800,]) |
We see 7 individuals who gave the maximum for one election, and only 2 individuals who maxed out for the entire cycle. The maxed out donors make up only 1.2% of total giving; this is very low for the average campaign. This tells us major donors aren’t the most important segment to Smith for Congress, but it could also mean that the campaign isn’t able or isn’t willing to ask the max amount from large donors.
Take Away
We can take away the following facts from our analysis:
- 40% of individual donors gave more than once to Smith for Congress
- 80% of donors gave $100 or less to the campaign
- Repeat donors gave $487,000 total to the campaign
- Two out of 6,949 (0.028 percent) donors gave the maximum amount allowable by law for a total of 1.2% of the total amount raised
From all this we can infer that Smith for Congress is running a very strong repeat donor program, and isn’t focused on only high-dollar donors. This information could be very useful in a number of different ways. A treasurer for Smith for Congress could use this information to design a 2012 fundraising plan and campaign budget. A candidate similar to Smith, or running in a similar district, could use this same information to plan their own campaign. Or a rival campaign could use this during opposition research and financial planning. Or researchers could use this to build better generic models of US House individual fundraising. I hope this shows that detailed campaign finance analysis is pretty simple when you’ve got access to the relevant data, which unfortunately is very uncommon.
Thanks for reading, questions or comments are always appreciated: [email protected]
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.