Data visualization in R: Histogram

A histogram shows vertical bars representing the frequency distribution of a quantitative variable. Histograms are a great way to get to know your data. They allow you to easily see where a large and a little amount of the data can be found. In short, the histogram consists of an x-axis and a y-axis, where the y-axis shows how frequently the values on the x-axis occur in the data. The bars group ranges of values or continuous categories on the x-axis. These bars of histograms are often called bins.

The following example shows some data and the according histogram.
 
values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
hist(values)

R29_a

Get histogram information
There are a number of things that R does by default in creating this histogram. To understand the histogram parameters we may have a look at the calculated values. You can do this by saving the histogram as an object and then printing it. This is helpful because you can see how R has decided to break up your data by default.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
histinfo <- hist(values)
histinfo

R29_b

Set number of breaks to change number of bins
R chooses how to split up your data by using an algorithm. If you want to change this default behavior you may define you own number or range for the bins. You can use the breaks parameter to define the number of cells for the histogram.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
par(mfrow=c(1,3))
hist(values, main=’default‘ )
hist(values, breaks=3, main=’3 breaks‘)
hist(values, breaks=1, main=’1 break‘)

R29_c

The bins don’t correspond to exactly the number you put in, because of the way R runs its algorithm to break up the data. If you want more control over the breakpoints between bins you can pass a vector of breakpoints to the break parameter.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
hist(values, breaks=c(0,60,120,180,240,300))

R29_d

Colors
To colorize the histogram you may define a color palette and pass it to the col parameter of the histogram.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
colors <- c(‚red‘,’blue‘,’yellow‘,’orange‘,’green‘,’cyan‘)
hist(values, col=colors)

R29_e

Show density and density curve

By default the histogram shows the frequency of data. You may change this behavior and show the probability density of the data. This will allow you to see how likely it is that an interval of values of the x-axis occurs. You can change this by setting the freq argument to false or set the prob argument to true. If you create such a probability density plot, you can additionally add a density curve to your dataset by using the lines function.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
colors <- c(‚red‘,’blue‘,’yellow‘,’orange‘,’green‘,’cyan‘)
hist(values, col=colors, prob=TRUE)
lines(density(values), lwd=2)

R29_f

Werbung
Dieser Beitrag wurde unter R veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit deinem WordPress.com-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s