Taking a Subset of a Data Frame in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I just wrote a new chapter for my students describing how to subset a data frame in R. The full text is available at https://docs.google.com/document/d/1K5U11-IKRkxNmitu_lS71Z6uLTQW_fp6QNbOMMwA5J8/edit?usp=sharing but here’s a preview:
Let’s load in ChickWeight, one of R’s built in datasets. This contains the weights of little chickens at 12 different times throughout their lives. The chickens are on different diets, numbered 1, 2, 3, and 4. Using the str command, we find that there are 578 observations in this data frame, and two different categorical variables: Chick and Diet.
> data(ChickWeight) > head(ChickWeight) weight Time Chick Diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5 76 8 1 1 6 93 10 1 1 > str(ChickWeight) Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame': 578 obs. of 4 variables: $ weight: num 42 51 59 64 76 93 106 125 149 171 ... $ Time : num 0 2 4 6 8 10 12 14 16 18 ... $ Chick : Ord.factor w/ 50 levels "18"<"16"<"15"<..: 15 15 15 15 15 15 15 15 15 15 ... $ Diet : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "formula")=Class 'formula' length 3 weight ~ Time | Chick .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> - attr(*, "outer")=Class 'formula' length 2 ~Diet .. ..- attr(*, ".Environment")=<environment: R_EmptyEnv> - attr(*, "labels")=List of 2 ..$ x: chr "Time" ..$ y: chr "Body weight" - attr(*, "units")=List of 2 ..$ x: chr "(days)" ..$ y: chr "(gm)"
Get One Column: Now that we have a data frame named ChickWeight loaded into R, we can take subsets of these 578 observations. First, let’s assume we just want to pull out the column of weights. There are two ways we can do this: specifying the column by name, or specifying the column by its order of appearance. The general form for pulling information from data frames is data.frame[rows,columns] so you can get the first column in either of these two ways:
ChickWeight[,1] # get all rows, but only the first column ChickWeight[,c("weight")] # get all rows, and only the column named “weight”
Get Multiple Columns: If you want more than one column, you can specify the column numbers or the names of the variables that you want to extract. If you want to get the weight and diet columns, you would do this:
ChickWeight[,c(1,4)] # get all rows, but only 1st and 4th columns ChickWeight[,c("weight","Diet")] # get all rows, only “weight” & “Diet” columns
If you want more than one column and those columns are next to each other, you can do this:
ChickWeight[,c(1:3)]
Get One Row: You can get the first row similarly to how you got the first column, and any other row the same way:
ChickWeight[1,] # get first row, and all columns ChickWeight[82,] # get 82nd row, and all columns
Get Multiple Rows: If you want more than one row, you can specify the row numbers you want like this:
> ChickWeight[c(1:6,15,18,27),] weight Time Chick Diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5 76 8 1 1 6 93 10 1 1 15 58 4 2 1 18 103 10 2 1 27 55 4 3 1
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.