Adding Error Bars

Problem

You want to add error bars to a graph.

Solution

Use geom_errorbar() and map variables to the values for ymin and ymax. Adding the error bars is done the same way for bar graphs and line graphs, as shown in Figure 7.14 (notice that default y range is different for bars and lines, though):

error bars on a bar graph (left); on a line graph (right) error bars on a bar graph (left); on a line graph (right)

Figure 7.14: error bars on a bar graph (left); on a line graph (right)

Discussion

In this example, the data already has values for the standard error of the mean (se), which we'll use for the error bars (it also has values for the standard deviation, sd, but we're not using that here):

To get the values for ymax and ymin, we took the y variable, Weight, and added/subtracted se.

We also specified the width of the ends of the error bars, with width = .2. It's best to play around with this to find a value that looks good. If you don't set the width, the error bars will be very wide, spanning all the space between items on the x-axis.

For a bar graph with groups of bars, the error bars must also be dodged; otherwise, they'll have the exact same x coordinate and won't line up with the bars. (See Recipe 3.2 for more information about grouped bars and dodging.)

We'll work with the full cabbage_exp data set this time:

The default dodge width for geom_bar() is 0.9, and you'll have to tell the error bars to be dodged the same width. If you don't specify the dodge width, it will default to dodging by the width of the error bars, which is usually less than the width of the bars (Figure 7.15):

                              # Bad: dodge width not specified                ggplot(cabbage_exp,                aes(x =                Date,                y =                Weight,                fill =                Cultivar))                +                                                geom_col(position =                "dodge")                +                                                geom_errorbar(aes(ymin =                Weight                -                                se,                ymax =                Weight                +                                se),                 position =                "dodge",                width =                .2)                 # Good: dodge width set to same as bar width (0.9)                ggplot(cabbage_exp,                aes(x =                Date,                y =                Weight,                fill =                Cultivar))                +                                                geom_col(position =                "dodge")                +                                                geom_errorbar(aes(ymin =                Weight                -                                se,                ymax =                Weight                +                                se),                 position =                position_dodge(0.9),                width =                .2)            

error bars on a grouped bar graph without dodging width specified (left); with dodging width specified (right) error bars on a grouped bar graph without dodging width specified (left); with dodging width specified (right)

Figure 7.15: error bars on a grouped bar graph without dodging width specified (left); with dodging width specified (right)

Note

Notice that we used position = "dodge", which is shorthand for position = position_dodge(), in the first version. But to pass a specific value, we have to spell it out, as in position_dodge(0.9).

For line graphs, if the error bars are a different color than the lines and points, you should draw the error bars first, so that they are underneath the points and lines. Otherwise the error bars will be drawn on top of the points and lines, which won't look right.

Additionally, you should dodge all the geometric elements so that they will align with the error bars, as shown in Figure 7.16:

              pd <-                                position_dodge(.3)                # Save the dodge spec because we use it repeatedly                                ggplot(cabbage_exp,                aes(x =                Date,                y =                Weight,                colour =                Cultivar,                group =                Cultivar))                +                                                geom_errorbar(                 aes(ymin =                Weight                -                                se,                ymax =                Weight                +                                se),                 width =                .2,                 size =                0.25,                 colour =                "black",                 position =                pd   )                +                                                geom_line(position =                pd)                +                                                geom_point(position =                pd,                size =                2.5)                 # Thinner error bar lines with size = 0.25, and larger points with size = 2.5                          

Error bars on a line graph, dodged so they don't overlap

Figure 7.16: Error bars on a line graph, dodged so they don't overlap

Notice that we set colour = "black" to make the error bars black; otherwise, they would inherit colour. We also made sure the Cultivar was used as a grouping variable by mapping it to group.

When a discrete variable is mapped to an aesthetic like colour or fill (as in the case of the bars), that variable is used for grouping the data. But by setting the colour of the error bars, we made it so that the variable for colour was not used for grouping, and we needed some other way to inform ggplot that the two data entries at each x were in different groups so that they would be dodged.

See Also

See Recipe 3.2 for more about creating grouped bar graphs, and Recipe 4.3 for more about creating line graphs with multiple lines.

See Recipe 15.18 for calculating summaries with means, standard deviations, standard errors, and confidence intervals.

See Recipe 4.9 for adding a confidence region when the data has a higher density along the x-axis.