Data visualization in R: Boxplot

The box plot is a graphical representation showing the median, quartiles and the smallest and largest values of a data set. It will show a visual shape of the data distribution.

The following example shows a simple boxplot of the given data.

data <- c(1,5,6,7,4,5,6,9,2,12)

boxplot(data)

r32_a

By this graphic we will get some information about the distribution of the example data.

  • Median: The large black line in the middle shows the median of the data.
  • Quartiles: The box around the median shows the lower and upper quartile of the data. These quartiles contain 25% of the data greater and less than the median. The quartiles therefore contain 50% of the data.
  • Whiskers: The lower and upper line showing the minimum and maximum value of the data, except the outliers. Outliers are data deviate more than 1.5 times of the quartile.
  • Outliers: Outliers will be shown as points outside of the whiskers range.

In the above example the value ’12‘ is outside of the standard whiskers range which is 1.5 times of the quartile. You may set the whiskers range by using the range parameter. The default is ‚1.5‘ and ‚0‘ will extend the whiskers to the data extremes.

If we increase the range for the previous example, we will see that the value ’12‘ is inside the whiskers now.

data <- c(1,5,6,7,4,5,6,9,2,12)

boxplot(data, range=2)

r32_b

Multiple plots

If we have a dataset to analyze we may also create multiple plots within a boxplot. In the following examples we will use the built-in dataset mtcars to create some more boxplots.

At first let’s have a look at this dataset.

print(mtcars)

As you can see it contains a table with several characteristics for the list of cars.

To create a plot with several boxes we will show the horsepower of the cars by the number of cylinders.

boxplot(hp~cyl, data=mtcars)

r32_c

Boxplot parameters

Of course the boxplot function will offer a lot of parameters to create a handsome plot. In our example we will set an unlimited range to catch all outliers, create labels for the axes, set some colors and change to horizontal boxplots instead of vertical ones. This will create the following plot.

boxplot(hp~cyl, data=mtcars, range=0, ylab=’Number of Cylinders‘, xlab=’Horsepower‘, col=c(‚gold‘,’lightblue‘,’lightgreen‘), border=’brown‘, horizontal=TRUE)

r32_d

 
Combine boxplot and stripchart

Within this final example we will combine the boxplot and the stripchart to show the values of the data set additional to the boxplot.

data <- c(1,5,6,7,4,5,6,9,2,12)

boxplot(data, col=’gold‘, range=0)

stripchart(data, method=’jitter‘, jitter=0.01, vertical=TRUE, add=TRUE, pch=16, col=’blue‘)

r32_e

Advertisements
Dieser Beitrag wurde unter R veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s