## Data visualization in R: Histogram

A histogram shows vertical bars representing the frequency distribution of a quantitative variable. Histograms are a great way to get to know your data. They allow you to easily see where a large and a little amount of the data can be found. In short, the histogram consists of an x-axis and a y-axis, where the y-axis shows how frequently the values on the x-axis occur in the data. The bars group ranges of values or continuous categories on the x-axis. These bars of histograms are often called bins.

The following example shows some data and the according histogram.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
hist(values) Get histogram information
There are a number of things that R does by default in creating this histogram. To understand the histogram parameters we may have a look at the calculated values. You can do this by saving the histogram as an object and then printing it. This is helpful because you can see how R has decided to break up your data by default.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
histinfo <- hist(values)
histinfo Set number of breaks to change number of bins
R chooses how to split up your data by using an algorithm. If you want to change this default behavior you may define you own number or range for the bins. You can use the breaks parameter to define the number of cells for the histogram.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
par(mfrow=c(1,3))
hist(values, main=’default‘ )
hist(values, breaks=3, main=’3 breaks‘)
hist(values, breaks=1, main=’1 break‘) The bins don’t correspond to exactly the number you put in, because of the way R runs its algorithm to break up the data. If you want more control over the breakpoints between bins you can pass a vector of breakpoints to the break parameter.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
hist(values, breaks=c(0,60,120,180,240,300)) Colors
To colorize the histogram you may define a color palette and pass it to the col parameter of the histogram.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
colors <- c(‚red‘,’blue‘,’yellow‘,’orange‘,’green‘,’cyan‘)
hist(values, col=colors) Show density and density curve

By default the histogram shows the frequency of data. You may change this behavior and show the probability density of the data. This will allow you to see how likely it is that an interval of values of the x-axis occurs. You can change this by setting the freq argument to false or set the prob argument to true. If you create such a probability density plot, you can additionally add a density curve to your dataset by using the lines function.

values <- c(29,33,91,88,77,110,157,185,138,189,201,205,187,177,168,201,222,278,270,285)
colors <- c(‚red‘,’blue‘,’yellow‘,’orange‘,’green‘,’cyan‘)
hist(values, col=colors, prob=TRUE)
lines(density(values), lwd=2) Dieser Beitrag wurde unter R veröffentlicht. Setze ein Lesezeichen auf den Permalink.