Site icon R-bloggers

Creating Pareto Charts in R with the qcc Package

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

A Pareto chart is a type of bar chart that shows the frequency of different categories in a dataset, ordered by frequency from highest to lowest. It is often used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

To create a Pareto chart in R, we can use the qcc package. The qcc package provides a number of functions for quality control, including the pareto.chart() function for creating Pareto charts.

< section id="examples" class="level1">

Examples

< section id="example-1-creating-a-pareto-chart-from-a-data-frame" class="level2">

Example 1: Creating a Pareto chart from a data frame

The following code shows how to create a Pareto chart from a data frame:

library(qcc)

# Create a data frame with the product and its count
df <- data.frame(
  product = c("Office desks", "Chairs", "Filing cabinets", "Bookcases"),
  count = c(100, 80, 70, 60)
)

# Create the Pareto chart
pareto.chart(df$count, main = "Pareto Chart of Product Sales")

   
Pareto chart analysis for df$count
    Frequency Cum.Freq. Percentage Cum.Percent.
  A 100.00000 100.00000   32.25806     32.25806
  B  80.00000 180.00000   25.80645     58.06452
  C  70.00000 250.00000   22.58065     80.64516
  D  60.00000 310.00000   19.35484    100.00000

This code will create a Pareto chart of the product sales, with the office desks bar at the top and the bookcases bar at the bottom. The cumulative percentage line is also plotted, which shows the percentage of total sales that each product accounts for.

< section id="example-2-creating-a-pareto-chart-from-a-vector" class="level2">

Example 2: Creating a Pareto chart from a vector

We can also create a Pareto chart from a vector. The following code shows how to create a Pareto chart of the number of defects found in a manufacturing process:

# Create a vector with the number of defects found in each category
defects <- c(10, 8, 7, 6, 5)

# Create the Pareto chart
pareto.chart(defects, main = "Pareto Chart of Defects")

   
Pareto chart analysis for defects
    Frequency Cum.Freq. Percentage Cum.Percent.
  A  10.00000  10.00000   27.77778     27.77778
  B   8.00000  18.00000   22.22222     50.00000
  C   7.00000  25.00000   19.44444     69.44444
  D   6.00000  31.00000   16.66667     86.11111
  E   5.00000  36.00000   13.88889    100.00000

This code will create a Pareto chart of the number of defects found, with the most common defect category at the top and the least common defect category at the bottom. The cumulative percentage line is also plotted, which shows the percentage of total defects that each category accounts for.

< section id="customizing-the-pareto-chart" class="level2">

Customizing the Pareto chart

We can customize the appearance of the Pareto chart using a number of arguments to the pareto.chart() function. For example, we can change the title of the chart, the labels of the x- and y-axes, the colors of the bars, and the line type of the cumulative percentage line.

The following code shows how to customize the Pareto chart from the first example:

# Create a data frame with the product and its count
df <- data.frame(
  product = c("Office desks", "Chairs", "Filing cabinets", "Bookcases"),
  count = c(100, 80, 70, 60)
)

# Create the Pareto chart
pareto.chart(
  df$count,
  main = "Pareto Chart of Product Sales",
  xlab = "Product",
  ylab = "Count",
  col = heat.colors(length(df$count)),
  lwd = 2
)

   
Pareto chart analysis for df$count
    Frequency Cum.Freq. Percentage Cum.Percent.
  A 100.00000 100.00000   32.25806     32.25806
  B  80.00000 180.00000   25.80645     58.06452
  C  70.00000 250.00000   22.58065     80.64516
  D  60.00000 310.00000   19.35484    100.00000

This code will create a Pareto chart with a title of “Pareto Chart of Product Sales”, x-axis label of “Product”, y-axis label of “Count”, bar colors in a heatmap palette, and a cumulative percentage line width of 2.

< section id="conclusion" class="level1">

Conclusion

The qcc package provides a convenient way to create Pareto charts in R. Pareto charts can be used to identify the most common problems or causes of a problem, so that resources can be focused on addressing them.

< section id="encouragement" class="level1">

Encouragement

I encourage readers to try creating their own Pareto charts in R. You can use the examples in this blog post as a starting point. You can also find more examples and documentation for the qcc package on the CRAN website.

Here are some ideas for Pareto charts that you could create:

Once you have created a Pareto chart, you can use the insights that you gain from it to improve your processes or products.

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version