Thursday 26 June 2014

ggpie: pie graphs in ggplot2

How does one make a pie chart in R using ggplot2? (If you're impatient: see the final code here).

I know, I know, pie charts are often not very good ways of displaying data. It is hard to visually compare the relative sizes of slices (particularly the smaller ones and if they are scattered with larger ones in between), and you get no sense of scale.

However, sometimes a good ol' pie chart can convey your point in a way that many of the general public will find easy to understand. For example, this is how I've spent my "work" hours this week so far:

(I manually inserted the line break into "marking exams" so it wouldn't get cut off - not sure how to automatically do this).

Terrible, I know. But the plot conveys quite clearly the point I want to make: I wasted away almost half (!) of my week so far playing nethack when I should have been doing my PhD, which is much more time than I have spent on my PhD itself!

(Aside: nethack is a most excellent ASCII game that can be obtained here, or from your repositories on most Linux systems. In my defence, during the month of June the Junethack tournament is run and clan overcaffeinated (me) is in a deadly battle with clan demilichens to win the "most unique deaths" trophy. And this is the last week of June).

How to do it

My data looks like this (I've been using hamster time tracker, though I keep forgetting to track things.):

How I've spent my PhD hours this week
activity time
Nethack 15.2
PhD 7.4
Marking exams 4
Meetings 2.3
Lunch 2
Writing this post 1.5

My first attempt at building a pie chart of this data follows the ggplot2 documentation for coord_polar and this excellent post on r-chart. There are also a number of relevant questions on StackOverflow.

First we construct a stacked bar chart, coloured by the activity type (fill=activity). Note that x=1 is a dummy variable purely so that ggplot has an x variable to plot by (we will remove the label later).

(Aside: if x is a factor (e.g. factor("dummy")) for some reason the bar does not get full width, and the resulting pie chart has a funny little hole in the middle).

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        geom_bar(stat="identity") +
        ggtitle("How I've spent my PhD hours this week")
print(p)

Then we use coord_polar() to turn it into a pie chart:

p <- p + coord_polar(theta='y')
print(p)

(Aside: does anyone find it weird that although my x axis was 1 and my y axis was time, on the resultant graph the x axis is now 'time' and the y axis is now 1?)

Let's tweak this so it doesn't look so bad.

First, some pure aesthetics: let's outline each slice of the pie in black using color='black' in geom_bar(). This causes there to be an ugly black line and outline on each square of the legend, so we remove that.

# We have to start the plot again because `color='black'` goes into geom_bar
p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        coord_polar(theta='y')
p <- p +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA)))
print(p)

Now, remove the axes:

  • both the axis labels ("time", "1"),
  • the tick marks on the vertical axis,
  • the tick labels on the vertical axis (0.75, 1.00, 1.25).

Note - for some reason, although 1 was our x variable, you remove its tick label by setting axis.text.y...

p <- p +
    theme(axis.ticks=element_blank(),  # the axis ticks
          axis.title=element_blank(),  # the axis labels
          axis.text.y=element_blank()) # the 0.75, 1.00, 1.25 labels.
print(p)

Now, I should also remove the '0', '5', '10', '15' tick marks/labels, being the cumulative number of hours spent so far, since they're not meaningful.

However, I want to label each slice of the pie, and it is convenient to put my labels in place of the '0', '5', etc.

In terms of their position, they should be located at the midpoint of each pie slice. Think back to the stacked bar chart I produced at the start, and recall that the y axis (time) shows cumulative hours spent.

This means the y coordinate of the end of each slice is given by cumsum(df$time) (cumulative sum of time spent so far), and so the coordinate of the midpoint of each slice is given by:

y.breaks <- cumsum(df$time) - df$time/2
y.breaks
## [1]  7.60 18.90 24.60 27.75 29.90 31.65

In order to implement this in the pie chart, we use scale_y_continuous with the breaks argument being the coordinates we've just calculated, and the labels argument being the activity name.

p <- p +
    # prettiness: make the labels black
    theme(axis.text.x=element_text(color='black')) +
    scale_y_continuous(
        breaks=y.breaks,   # where to place the labels
        labels=df$activity # the labels
    )
print(p)

Altogether now:

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA))) +
        # polar coordinates
        coord_polar(theta='y') +
        # label aesthetics
        theme(axis.ticks=element_blank(),  # the axis ticks
              axis.title=element_blank(),  # the axis labels
              axis.text.y=element_blank(), # the 0.75, 1.00, 1.25 labels
              axis.text.x=element_text(color='black')) +
        # pie slice labels
        scale_y_continuous(
            breaks=cumsum(df$time) - df$time/2,
            labels=df$activity
        )

TL;DR

I've tied it together into a function ggpie:

library(ggplot2)
# ggpie: draws a pie chart.
# give it:
# * `dat`: your dataframe
# * `by` {character}: the name of the fill column (factor)
# * `totals` {character}: the name of the column that tracks
#    the time spent per level of `by` (percentages work too).
# returns: a plot object.
ggpie <- function (dat, by, totals) {
    ggplot(dat, aes_string(x=factor(1), y=totals, fill=by)) +
        geom_bar(stat='identity', color='black') +
        guides(fill=guide_legend(override.aes=list(colour=NA))) + # removes black borders from legend
        coord_polar(theta='y') +
        theme(axis.ticks=element_blank(),
            axis.text.y=element_blank(),
            axis.text.x=element_text(colour='black'),
            axis.title=element_blank()) +
    scale_y_continuous(breaks=cumsum(dat[[totals]]) - dat[[totals]] / 2, labels=dat[[by]])    
}

For example:

library(grid) # for `unit`
ggpie(df, by='activity', totals='time') +
    ggtitle("A fun but wasteful week.") +
    theme(axis.ticks.margin=unit(0,"lines"),
          plot.margin=rep(unit(0, "lines"),4))

(Note: for some reason the pie plots have an unreasonably large amount of white space, and the theme(*.margin) settings are an attempt to control that. However, I still get a lot of vertical space that I'm not sure how to compress).

Clearly the labels could do with more work (if they are too long they go out of the plot boundary, or they bump into the pie chart), but not bad for an hour and a half's work :)