Unveiling the Power of get_cms_meta_data() in healthyR.data

Steven P. Sanderson II, MPH

12 hours ago

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1">

Introduction

Hey, R users! 🌟 Today, we’re going to look at a great new addition to the healthyR.data package—the get_cms_meta_data() function! This function is a helpful tool for retrieving and analyzing metadata from CMS (Centers for Medicare & Medicaid Services) datasets. Whether you’re a healthcare analyst, data scientist, or R programming fan, you’ll find this function very useful. Let’s break it down and explore how it works.

< section id="overview-of-get_cms_meta_data" class="level2">

Overview of `get_cms_meta_data()`

The get_cms_meta_data() function lets you retrieve metadata from CMS datasets easily. You can customize your search using various parameters, ensuring you get precisely the data you need. Here’s the syntax:

get_cms_meta_data(
  .title = NULL,
  .modified_date = NULL,
  .keyword = NULL,
  .identifier = NULL,
  .data_version = "current",
  .media_type = "all"
)

< section id="arguments" class="level3">

Arguments:

.title: Search by title.
.modified_date: Search by modified date (format: “YYYY-MM-DD”).
.keyword: Search by keyword.
.identifier: Search by identifier.
.data_version: Choose between “current”, “archive”, or “all”. Default is “current”.
.media_type: Filter by media type (“all”, “csv”, “API”, “other”). Default is “all”.

< section id="return-value" class="level3">

Return Value:

A tibble containing data links and relevant metadata about the datasets.

< section id="details" class="level3">

Details:

The function fetches JSON data from the CMS data URL and extracts relevant fields to create a tidy tibble. It selects specific columns, handles nested lists by unnesting them, cleans column names, and processes dates and media types to make the data more useful for analysis. The columns in the returned tibble include:

title
description
landing_page
modified
keyword
described_by
fn
has_email
identifier
start
end
references
distribution_description
distribution_title
distribution_modified
distribution_start
distribution_end
media_type
data_link

< section id="practical-examples" class="level2">

Practical Examples

Let’s see the get_cms_meta_data() function in action with a couple of examples.

< section id="example-1-basic-usage" class="level3">

Example 1: Basic Usage

First, we’ll load the necessary libraries and fetch some metadata:

# Library Loads
library(healthyR.data)
library(dplyr)

# Get data
cms_data <- get_cms_meta_data()
glimpse(cms_data)

Rows: 107
Columns: 19
$ title                    <chr> "Accountable Care Organization Participants",…
$ description              <chr> "The Accountable Care Organization Participan…
$ landing_page             <chr> "https://data.cms.gov/medicare-shared-savings…
$ modified                 <date> 2024-01-29, 2024-04-23, 2024-01-12, 2024-01-…
$ keyword                  <list> <"Medicare", "Value-Based Care", "Coordinate…
$ described_by             <chr> "https://data.cms.gov/resources/accountable-c…
$ fn                       <chr> "Shared Savings Program - CM", "Shared Saving…
$ has_email                <chr> "SharedSavingsProgram@cms.hhs.gov", "SharedSa…
$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/976…
$ start                    <date> 2014-01-01, 2017-01-01, 2021-01-01, 2021-01-…
$ end                      <date> 2024-12-31, 2024-12-31, 2021-12-31, 2021-12-…
$ references               <chr> "https://data.cms.gov/resources/acos-aco-part…
$ distribution_description <chr> "latest", "latest", "latest", "latest", "late…
$ distribution_title       <chr> "Accountable Care Organization Participants",…
$ distribution_modified    <date> 2024-01-29, 2024-04-23, 2024-01-12, 2024-01-…
$ distribution_start       <date> 2024-01-01, 2024-01-01, 2021-01-01, 2021-01-…
$ distribution_end         <date> 2024-12-31, 2024-12-31, 2021-12-31, 2021-12-…
$ media_type               <chr> "API", "API", "API", "API", "API", "API", "AP…
$ data_link                <chr> "https://data.cms.gov/data-api/v1/dataset/976…

# Attributes
atb <- attributes(cms_data)
atb$names

 [1] "title"                    "description"             
 [3] "landing_page"             "modified"                
 [5] "keyword"                  "described_by"            
 [7] "fn"                       "has_email"               
 [9] "identifier"               "start"                   
[11] "end"                      "references"              
[13] "distribution_description" "distribution_title"      
[15] "distribution_modified"    "distribution_start"      
[17] "distribution_end"         "media_type"              
[19] "data_link"

atb$class

[1] "cms_meta_data" "tbl_df"        "tbl"           "data.frame"

atb$url

[1] "https://data.cms.gov/data.json"

atb$date_retrieved

[1] "2024-05-28 10:20:18 EDT"

atb$parameters

$.data_version
[1] "current"

$.media_type
[1] "all"

$.title
NULL

$.modified_date
NULL

$.keyword
NULL

$.identifier
NULL

In this example, we’re simply calling get_cms_meta_data() without any parameters. This fetches the default dataset metadata. The glimpse() function from the dplyr package provides a quick overview of the data structure.

< section id="example-2-custom-search-by-keyword-and-title" class="level3">

Example 2: Custom Search by Keyword and Title

Now, let’s refine our search by specifying a keyword and title:

get_cms_meta_data(
  .keyword = "nation",
  .title = "Market Saturation & Utilization State-County"
) |>
  glimpse()

Rows: 1
Columns: 19
$ title                    <chr> "Market Saturation & Utilization State-County"
$ description              <chr> "The Market Saturation and Utilization State-…
$ landing_page             <chr> "https://data.cms.gov/summary-statistics-on-u…
$ modified                 <date> 2024-04-02
$ keyword                  <list> <"National", "States & Territories", "Countie…
$ described_by             <chr> "https://data.cms.gov/resources/market-satur…
$ fn                       <chr> "Market Saturation - CPI"
$ has_email                <chr> "MarketSaturation@cms.hhs.gov"
$ identifier               <chr> "https://data.cms.gov/data-api/v1/dataset/89…
$ start                    <date> 2023-10-01
$ end                      <date> 2023-12-31
$ references               <chr> "https://data.cms.gov/resources/market-satura…
$ distribution_description <chr> "latest"
$ distribution_title       <chr> "Market Saturation & Utilization StateCounty"
$ distribution_modified    <date> 2024-04-02
$ distribution_start       <date> 2023-10-01
$ distribution_end         <date> 2023-12-31
$ media_type               <chr> "API"
$ data_link                <chr> "https://data.cms.gov/data-api/v1/dataset/890…

In this example, we filter the metadata by the keyword “nation” and the title “Market Saturation & Utilization State-County”. The pipe operator (|>) is used to pass the result directly into the glimpse() function for a quick preview.

< section id="breaking-down-the-code" class="level2">

Breaking Down the Code

Let’s break down the code blocks to understand what they’re doing:

< section id="basic-usage" class="level3">

Basic Usage

Load Libraries:
```
library(healthyR.data)
library(dplyr)
```
We load the healthyR.data package to access the get_cms_meta_data() function and the dplyr package for data manipulation.
Fetch Metadata:
```
cms_data <- get_cms_meta_data()
```
We call get_cms_meta_data() without any parameters to get the default dataset metadata.
Preview Data:
```
glimpse(cms_data)
```
The glimpse() function gives us a quick look at the structure and contents of the fetched metadata.

< section id="custom-search" class="level3">

Custom Search

Custom Search Call:
```
get_cms_meta_data(
  .keyword = "nation",
  .title = "Market Saturation & Utilization State-County"
) |>
glimpse()
```
Here, we call get_cms_meta_data() with specific parameters for keyword and title to narrow down our search. The result is passed to glimpse() using the pipe operator for an immediate preview.

< section id="conclusion" class="level2">

Conclusion

The get_cms_meta_data() function is a versatile and flexible tool for accessing CMS metadata, making your data analysis tasks more efficient and effective. Whether you’re looking for specific datasets or just exploring the available metadata, this function has got you covered.

Try out get_cms_meta_data() in your next R project and explore the potential of CMS data with ease! Happy coding! 🚀

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.