How do confidence intervals work?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The post How do confidence intervals work? appeared first on Data Science Tutorials
What do you have to lose?. Check out Data Science tutorials here Data Science Tutorials.
How do confidence intervals work?, In statistics, we’re frequently interested in calculating population parameters—numbers that capture some aspect of a population as a whole.
The following are the two most typical population parameters:
- Population mean: the average value of a population’s variable (e.g. the mean height of males in the U.S.)
- Population proportion: the percentage of a particular characteristic in a population (e.g. the proportion of residents in a county who support a certain law)
Even if we’re interested in measuring these parameters, it’s typically too expensive and time-consuming to go around and get information on each person in a community in order to calculate the population parameter.
As an alternative, we usually select a random sample from the entire population and estimate the population parameter using the data from the sample.
Consider the situation where we want to calculate the average weight of a particular species of cows in India. It would take a lot of time and money to weigh each individual cow in India, where there are thousands of them.
Instead, we could just randomly select 50 cows, and then estimate the true population mean using the weight of the cows in this sample.
The issue is that there is no assurance that the mean weight of cows in the sample will exactly match the mean weight of cows in the entire population. For instance, we might unintentionally choose a sample that has mostly light or mostly heavy cows.
We can design a confidence interval to include this uncertainty. A range of values that, with a particular degree of confidence, are likely to contain a population parameter is known as a confidence interval. The general formula used to compute it is as follows.
Confidence Interval = (point estimate) +/- (critical value)*(standard error)
With a certain degree of certainty, this formula generates an interval with a lower bound and an upper bound that most likely contains a population parameter.
Confidence Interval = x +/- z*(s/√n)
where:
x: sample mean
z: the chosen z-value
s: sample standard deviation
n: sample size
The confidence level you select will determine the z-value you use. The z-value that correlates to the most widely used confidence levels is displayed in the following image.
Consider the following scenario: We randomly select a sample of cows and record the following data:
Number of samples: 25
Average sample weight is 400.
S = 20 sample standard deviation
The 90% confidence interval for the actual population mean weight can be calculated as follows.
90% Confidence Interval: 400 +/- 1.645*(20/√25) = [393.42, 406.58]
This confidence interval is interpreted as follows:
There is a 90% likelihood that the cows population mean weight is contained within the confidence interval of [393.42, 406.58].
The true population mean does not have a 10% chance of being outside of the 90% confidence interval, to put it another way.
The genuine population mean weight of cows has a 10% possibility of being larger than 40.6.58 kg or less than 393.42 kg.
The fact that a confidence interval’s size can be influenced by two numbers, namely.
- The sample size: The confidence interval is more precise the larger the sample size.
- The confidence level: The confidence interval is bigger the higher the confidence level.
Different Confidence Interval Types
Confidence intervals can take many different forms. The most widely used ones are listed here:
Confidence Interval for a Mean
A range of values that, with a certain degree of confidence, is likely to include the population mean is known as a confidence interval for a mean. Here is the formula to determine this interval:
Confidence Interval = x +/- z*(s/√n)
Confidence Interval for the Difference Between Means
A range of values that, with a certain degree of confidence, are likely to represent the genuine difference between two population means is known as a confidence interval (C.I.) for a difference between means.
Here is the formula to determine this interval:
Confidence interval = (x1–x2) +/- t*√((sp2/n1) + (sp2/n2))
Confidence Interval for a Proportion
A range of numbers that, with a particular level of confidence, are likely to include a population proportion is known as a confidence interval.
Here is the formula to determine this interval
Confidence Interval = p +/- z*(√p(1-p) / n)
Confidence Interval for the Difference in Proportions
A range of numbers that, with a particular level of confidence, are likely to include the genuine difference between two population proportions is known as a confidence interval.
Here is the formula to determine this interval:
Confidence interval = (p1–p2) +/- z*√(p1(1-p1)/n1 + p2(1-p2)/n2)
Further Resources:-
Because the greatest way to learn any programming language, even R, is by doing.
How do augmented analytics work? – Data Science Tutorials
How to compare variances in R – Data Science Tutorials
Two Sample Proportions test in R-Complete Guide – Data Science Tutorials
The post How do confidence intervals work? appeared first on Data Science Tutorials
Learn how to expert in the Data Science field with Data Science Tutorials.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.