Site icon R-bloggers

Infant Natality & Mortality Rates

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reach out to us via LinkedIn 

View the Github repo for this project here


Introduction

The Center for Disease Control (CDC) annually publishes material related to the birth rates and death rates of infants born in the United States. They gather a tremendous amount of data relevant to the child, including categories such as the education of the parents, age, health status, and tobacco use. Other categories include the health status of the new born, APGAR score, delivery method used, etc. There is countless material gathered for every child born in the United States. Due to the large volume of data available, it can be hard to keep track of it all. That being said, we as a team of Data Scientists dove right into this data in an attempt to uncover any interesting relationships between maternal health (i.e. mother’s BMI and smoking habits) and subsequent infant natality and mortality rates. 

 

Goal

The goal of this project was two-fold:

  1. To analyze and interpret the data collected from the CDC website on infant natality and mortality rates
  2. To develop a user-friendly application that aims to educate expecting mothers about the potential impacts their lifestyle choices may have on the health of their future child.

Please follow along with our custom Shiny App as you peruse through the rest of this blog post.

 

Data Acquisition

All of the data used in this analysis was collected through the CDC website, which includes: 

Since the original dataset from the CDC was quite large (~3.8 million observations and 90+ variables per year of data), we built a custom parser to only extract data that was relevant to our scope of work. Exploratory data analysis (EDA) was conducted in both Python and R.

Our team focused on the following factors which include, but are not limited to: 

Once the data was cleansed of missing data values and irrelevant variables, our team created several visualizations based on our analysis of the key factors listed above. We then built an interactive web application via R Shiny where expecting mothers or couples wishing to start a family can explore national statistics as well as customized statistics based on their own demographic or current health conditions.

 

Exploratory Analysis

Data presented in the app includes, but is not limited to as follows: 

 

 

 

 

 

Further Development

So far the app applies data implemented together that helps gives mothers the correct information they need for their child. Further development would include applying useful information for hospitals and doctors to use in order for them to understand what is needed for the infant, what health recommendations they could give to the to the mother, to the father, etc.

Given more time and resources, our team’s next steps would be to find other maternal factors beyond maternal weight and tobacco use while possibly making predictions using a machine learning model. These tools could be used to find further insights such as: 

Thank you for the taking the time to read our blog post! Please don’t hesitate to reach out to us with any questions, comments or concerns regarding this project.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.