Sample and Population Variance in R

finnstats

3 months ago

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Visit for the most up-to-date information on Data Science, employment, and tutorials finnstats.

If you want to read the original article, go here Sample and Population Variance in R

Sample and Population Variance in R, The variance is a metric for determining how dispersed data values are around the mean.

Variance is the expectation of a random variable’s squared departure from its mean in probability theory and statistics, and it informally indicates how far a set of (random) values is spread out from its mean.

How to Use the scale() Function in R » finnstats

The formula for calculating a population’s variance is

σ2 = Σ (xi – μ)2 / N

where μ is the population mean, xi is the ith population element, N is the population size, and is basically Σ a fancy symbol for “sum.”

To determine a sample’s variance, use the following formula:

s2 = Σ (xi – xbar)2 / (n-1)

where xbar represents the sample mean, xi represents the sample’s ith element, and n represents the sample size.

Calculate Sample & Population Variance in R

Assume we have the following R dataset and stored in data1.

Cluster Analysis in R » Unsupervised Approach » finnstats

Let’s create a data set values

data1<- c(12,84, 5, 17, 18, 11, 13, 19, 69, 92,15,10,55)

The var() function in R can be used to calculate sample variance.

Let’s calculate the sample variance

var(data1)
957.8974

The population variance can be calculated by multiplying the sample variance by (n-1)/n as follows.

Now we can calculate the length of the data1

n <- length(data1)
n
13

It’s ready to find population variance

var(data1) * (n-1)/n
884.213

It’s important to remember that the population variance is always lower than the sample variance.

Goodness of Fit Test- Jarque-Bera Test in R » finnstats

In practice, we calculate sample variances for datasets because collecting data for a whole population is uncommon.

Calculate the Sample Variance of Multiple Columns as an example

Let’s say we have the following R data frame:

Now we can create a data frame

data2 <- data.frame(X=c(12, 35, 55, 48, 54, 12, 8, 10),
                   Y=c(12, 24, 33, 77, 5, 46, 71, 106),
                   Z=c(1, 2, 63, 8, 12, 77, 92, 102))
data2
   X   Y   Z
1 12  12   1
2 35  24   2
3 55  33  63
4 48  77   8
5 54   5  12
6 12  46  77
7  8  71  92
8 10 106 102

To determine the sample variance of each column in the data frame, we can use the sapply() function:

Yes, now based on sapply we can find each column’s sample variance.

Regression Analysis Example-Ultimate Guide » finnstats

sapply(data2, var)
   X         Y         Z
 439.6429 1238.7857 1863.9821

We can also determine the sample standard deviation of each column using the following code, which is essentially the square root of the sample variance:

To find each column’s sample standard deviation

sapply(data2, sd)
  X        Y        Z
20.96766 35.19639 43.17386

When it comes to data analysis, Sapply is a highly handy function.

Stringr in r 10 data manipulation Tips and Tricks » finnstats

Don’t forget to express your happiness by leaving a comment.
Sample and Population Variance in R.
If you are interested to learn more about data science, you can find more articles here finnstats.

The post Sample and Population Variance in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.