Site icon R-bloggers

analyze the surveillance epidemiology and end results (seer) with r and monetdb

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the surveillance epidemiology and end results program is the aggregation of all cancer registry statistics in the united states.  created by congressional decree, seer has captured a nationally-representative quarter of american cancer incidence since 1973.  when acs, cdc, nci, and naaccr publish their collaborative annual report, they use seer.  when the aacr predicts that america will have 18 million cancer survivors by 2022, they use seer too.  you can use seer three.

the national cancer institute blessedly provides a bouquet of free statistical software to import and analyze this microdata.  obviously, my code won’t compete with the legions of epidemiological software programmers at the largest of the nih institutes.  but plenty of other r users have written packages to work with this stuff, so maybe, just maybe, someone will find value in my automated importation syntax.  plus, the seer microdata include a sas import script – which triggers my fight or fight harder reflex.  list of things i hate, descending sort order: mosquitoes, cancer, then sas a very distant third.  but still.

aside from easing the importation of this data into the r language, i suppose i have contributed one tangible improvement to the seer-analyst community: these download and import scripts will put all eight million records into wickedly-fast monetdb.  so long as you can perform your analysis using sql, you can perform your analysis (on all eight million records) in basically one second.  haa-cha!  i’ve said it before, i’ll say it again: the import takes forrrrrever (leave it overnight). but once it’s loaded, it’ll outrun lightning.  this new github repository contains four scripts:


download.R

import all tables into rda.R

import individual-level tables into monetdb.R

replicate case counts table.R



click here to view these four scripts



for more detail about surveillance epidemiology and end results microdata, visit:


notes:

seer is publicly-available, you just gotta sign and e-mail in this form, then wait two business days for them to send you the login and password needed for the box that pops up when you click this download link.


confidential to sas, spss, stata, and sudaan users: it’s black tie dinner night at the governor’s mansion and you’re still wearing a t-shirt.  ready to change into your tuxedo?  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.