Using xBalance with MatchIt
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a previous post, I demonstrated how to create a propensity score matching, test balance, and analyze the outcome variable using the optmatch
and RItools
packages. The same strategy can be used with other matching algorithms, for example the various methods included in the MatchIt package.
I’ll use the same basic question and data from my previous article. The MatchIt
package wraps optmatch
to provide its “full” and “optimal” matching methods, so I will the “full” option to maintain consistency with my previous article. The first step is loading the packages and the data:
> library(MatchIt)
> library(optmatch)
> library(RItools)
> data(nuclearplants)
The interface for MatchIt
is similar to optmatch
for propensity score matches, except that the matchit()
function compresses the process into a single step of specifying the propensity formula and producing the match, while fullmatch()
allows a user to specify any number of distance matrices. In the end, the interface is fairly similar. As with the previous article, I match on a subset of the covariates.
> example.formula <- formula(pr ~ t1 + t2 + cap)
> match.opt <- fullmatch(
mdist(glm(example.formula,
data = nuclearplants,
family = binomial())))
> all.mit <- matchit(example.formula,
data = nuclearplants,
method = "full")
The all.mit
object contains (among other items) a vector indicating each object’s matched set. For compatibility, save it as a factor:
> match.mit <- as.factor(all.mit$subclass)
Unsurprisingly, as MatchIt
uses optmatch
the two matches are identical.
> lapply(split(nuclearplants, match.opt), rownames)
$m.1
[1] "N" "Z" "a"
$m.10
[1] "I" "G"
$m.2
[1] "A" "B" "D" "V" "F" "b"
$m.5
[1] "U" "c"
$m.6
[1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"
$m.8
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"
> lapply(split(nuclearplants, match.mit), rownames)
$`1`
[1] "N" "Z" "a"
$`2`
[1] "I" "G"
$`3`
[1] "A" "B" "D" "V" "F" "b"
$`4`
[1] "U" "c"
$`5`
[1] "H" "K" "L" "M" "C" "P" "R" "Y" "e" "f"
$`6`
[1] "J" "O" "Q" "S" "T" "E" "W" "X" "d"
Now that I have a factor listing the groups, I can run xBalance
to assess the balance properties of the match:
> xBalance(pr ~ . - (cost + pr),
data = nuclearplants,
strata = match.mit,
report = "chisquare.test")
---Overall Test---
chisquare df p.value
strat 5.1 9 0.82
---
Signif. codes: 0 ‘***’ 0.001 ‘** ’ 0.01 ‘* ’ 0.05 ‘. ’ 0.1 ‘ ’ 1
With a reported p-value of 0.82, there is little evidence against the null of balance, so we would fail to reject it.
This walk through used the the “full” method for matchit()
, but the same techniques will work with other matchit()
methods, such as coarsened exact matching or nearest neighbor. If you are reasonably confident that you wish to use optimal matching, you should consider using the optmatch
package directly, instead of using it through MatchIt
. In future posts I will be demonstrating important techniques to speed up the matching process (which can be a great benefit to large datasets) and how you can create matches that incorporate more subject matter information than can be included in a simple logit model.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.