The box plot is a graphical representation showing the median, quartiles and the smallest and largest values of a data set. It will show a visual shape of the data distribution.
The following example shows a simple boxplot of the given data.
data <- c(1,5,6,7,4,5,6,9,2,12)
boxplot(data)
By this graphic we will get some information about the distribution of the example data.
- Median: The large black line in the middle shows the median of the data.
- Quartiles: The box around the median shows the lower and upper quartile of the data. These quartiles contain 25% of the data greater and less than the median. The quartiles therefore contain 50% of the data.
- Whiskers: The lower and upper line showing the minimum and maximum value of the data, except the outliers. Outliers are data deviate more than 1.5 times of the quartile.
- Outliers: Outliers will be shown as points outside of the whiskers range.
In the above example the value ’12‘ is outside of the standard whiskers range which is 1.5 times of the quartile. You may set the whiskers range by using the range parameter. The default is ‚1.5‘ and ‚0‘ will extend the whiskers to the data extremes.
If we increase the range for the previous example, we will see that the value ’12‘ is inside the whiskers now.
data <- c(1,5,6,7,4,5,6,9,2,12)
boxplot(data, range=2)
Multiple plots
If we have a dataset to analyze we may also create multiple plots within a boxplot. In the following examples we will use the built-in dataset mtcars to create some more boxplots.
At first let’s have a look at this dataset.
print(mtcars)
As you can see it contains a table with several characteristics for the list of cars.
To create a plot with several boxes we will show the horsepower of the cars by the number of cylinders.
boxplot(hp~cyl, data=mtcars)
Boxplot parameters
Of course the boxplot function will offer a lot of parameters to create a handsome plot. In our example we will set an unlimited range to catch all outliers, create labels for the axes, set some colors and change to horizontal boxplots instead of vertical ones. This will create the following plot.
boxplot(hp~cyl, data=mtcars, range=0, ylab=’Number of Cylinders‘, xlab=’Horsepower‘, col=c(‚gold‘,’lightblue‘,’lightgreen‘), border=’brown‘, horizontal=TRUE)
Combine boxplot and stripchart
Within this final example we will combine the boxplot and the stripchart to show the values of the data set additional to the boxplot.
data <- c(1,5,6,7,4,5,6,9,2,12)
boxplot(data, col=’gold‘, range=0)
stripchart(data, method=’jitter‘, jitter=0.01, vertical=TRUE, add=TRUE, pch=16, col=’blue‘)