Data Storytelling Example

Visuals

Storytelling

2022

Published

April 7, 2022

Code

if(!require(easypackages)){install.packages("easypackages")}
library(easypackages)
packages("tidyverse", "patchwork", "thematic","ggchicklet", "ggtext", 
         prompt = FALSE)

#install.packages("ggchicklet", repos = "https://cinc.rud.is")

Recreating the Storytelling with Data look with ggplot

I found a great video from Storytelling with Data (SWD). In this video, a data storyteller demonstrates how a dataviz that does not demonstrate a clear story can be improved. Let’s take a look at the dataviz but, first, here’s the data.

Code

dat <- tibble(id = 1:19, 
              fulfilled = c(803, 865, 795, 683, 566, 586, 510, 436, 418, 364, 379, 372, 374, 278, 286, 327, 225, 222, 200),
              accuracy = c(86, 80, 84, 82, 86, 80, 80, 93, 88, 87, 85, 85, 83, 94, 86, 78, 89, 88, 91),
              error = c(10, 14, 10, 14, 10, 16, 15, 6, 11, 7, 12, 13, 8, 4, 12, 12, 7, 10, 7),
              null = 100 - accuracy - error) %>% 
  mutate(across(accuracy:null, ~. / 100))
dat

# A tibble: 19 × 5
      id fulfilled accuracy error  null
   <int>     <dbl>    <dbl> <dbl> <dbl>
 1     1       803     0.86  0.1   0.04
 2     2       865     0.8   0.14  0.06
 3     3       795     0.84  0.1   0.06
 4     4       683     0.82  0.14  0.04
 5     5       566     0.86  0.1   0.04
 6     6       586     0.8   0.16  0.04
 7     7       510     0.8   0.15  0.05
 8     8       436     0.93  0.06  0.01
 9     9       418     0.88  0.11  0.01
10    10       364     0.87  0.07  0.06
11    11       379     0.85  0.12  0.03
12    12       372     0.85  0.13  0.02
13    13       374     0.83  0.08  0.09
14    14       278     0.94  0.04  0.02
15    15       286     0.86  0.12  0.02
16    16       327     0.78  0.12  0.1 
17    17       225     0.89  0.07  0.04
18    18       222     0.88  0.1   0.02
19    19       200     0.91  0.07  0.02

This data set contains a lot of accuracy and error rates from different (anonymous) warehouses. Additionally, there are “null rates”. These are likely related to data quality issues. Furthermore, this data set is apparently taken from a client the data storytellers helped. In any case, here is a ggplot2 recreation of the client’s initial plot. Note that the plot does not match exactly but it’s close enough to get the gist.

Code

dat_long <- dat %>% pivot_longer(cols = accuracy:null, names_to = 'type', values_to = 'percent')
head(dat_long)

# A tibble: 6 × 4
     id fulfilled type     percent
  <int>     <dbl> <chr>      <dbl>
1     1       803 accuracy    0.86
2     1       803 error       0.1 
3     1       803 null        0.04
4     2       865 accuracy    0.8 
5     2       865 error       0.14
6     2       865 null        0.06

Code

theme_set(theme_minimal())
dat_long %>% ggplot(aes(id, percent, fill = factor(type, levels = c('null', 'accuracy', 'error')))) +
  geom_col() + labs(title = 'Warehouse Accuracy Rates',
                    x = 'Warehouse ID',
                    y = '% of total orders',
                    fill = element_blank()) +
  scale_y_continuous(labels = ~scales::percent(., accuracy = 1), breaks = seq(0, 1, 0.1))

As it is right know, the plot shows data. But what is the message of this dataviz? To make the message more explicit, the plot is transformed during the course of the video. Take a look at what story the exact same data can tell.

Code

# out.width = "300px", echo = FALSE}
knitr::include_graphics("story1.png")

From reading the SWD book, I know that some of the techniques that were used in this picture can be used in many settings. Therefore, I decided to document the steps I took to recreate the dataviz with ggplot.

I tried to make this documentation as accessible as possible. Consequently, if you are already quite familiar with how to customize a ggplot’s details, then some of the explanations or references may be superfluous. Feel free to skip them. That being said, let’s transform the plot.

Flip the axes for long names

Although it is not really an issue here, warehouses or other places might be more identifiable by a (long) name rather than an ID. To make sure that these names are legible, show them on the y-axes. When I first learned ggplot, there was the layer coord_flip() to do that job for us. Nowadays, though, you can often avoid coord_flip() because a lot of geoms already understand what you mean, when you map categorical data to the y-aesthetic. But make sure that ggplot will know that you mean categorical data (especially if the labels are numerical like here).

Code

categorial_dat <- dat_long %>% mutate(id = as.character(id))

categorial_dat %>% ggplot(aes(x = percent, y = id)) +
  geom_col(aes(group = factor(type, levels = c('error', 'null', 'accuracy'))),
           col = 'white') # set color to distinguish bars better)

Notice that I used the group- instead of fill-aesthetic because I only need grouping. Also, it is always a good idea to avoid excessive use of colors. This will allow us to emphasize parts of our story with colors later on.

Add reference points

Another good idea it to put your data into perspective. To do so, include a reference point. This can be a summary statistic like the average error rate. For more great demonstration of reference points you can also check out the evolution of a ggplot by Cédric Scherer.

Code

averages <- dat_long %>% group_by(type) %>% 
  summarise(percent = mean(percent)) %>% mutate(id = 'ALL')

Code

dat_with_summary <- categorial_dat %>% bind_rows(averages)
head(dat_with_summary)

# A tibble: 6 × 4
  id    fulfilled type     percent
  <chr>     <dbl> <chr>      <dbl>
1 1           803 accuracy    0.86
2 1           803 error       0.1 
3 1           803 null        0.04
4 2           865 accuracy    0.8 
5 2           865 error       0.14
6 2           865 null        0.06

Code

dat_with_summary %>% ggplot(aes(x = percent, y = id)) +
  geom_col(aes(group = factor(type, levels = c('error', 'null', 'accuracy'))),
           col = 'white')

Order your data

To allow your reader to gain a quick overview, put your data into some form of sensible ordering. This eases the burden of having to make sense of what the visual shows. Also, notice that we already did part of that. See, with the order of the levels in the group aesthetic, we influenced the ordering of the stacked bars. Here, we made sure that important quantities start at the left resp. right edges.

Why is that helpful, you ask? Well, the bars that start on the left all start at the same reference point. Therefore comparisons are quite easy for these bars. The same holds true for the right edge. Consequently, it is best that we reserve these vip seats for the important data. Check out what happens if I were to put the accuracy in the middle.

Code

dat_with_summary %>% ggplot(aes(x = percent, y = id)) +
  geom_col(aes(group = factor(type, levels = c('error', 'accuracy', 'null'))),
           col = 'white')

Now, we can’t really make out which warehouses have a higher accuracy. Given that the accuracy is likely something we care about, this is bad. But we can change the order even more. For instance, we can also order the bars by error rate. Here, fct_reorder() is our friend.

Code

ordered_dat <- dat_with_summary %>% 
  mutate(type = factor(type, levels = c('error', 'null', 'accuracy')),
         id = fct_reorder(id, percent, .desc = T)) 
head(ordered_dat)

# A tibble: 6 × 4
  id    fulfilled type     percent
  <fct>     <dbl> <fct>      <dbl>
1 1           803 accuracy    0.86
2 1           803 error       0.1 
3 1           803 null        0.04
4 2           865 accuracy    0.8 
5 2           865 error       0.14
6 2           865 null        0.06

Code

ordered_dat %>% ggplot(aes(x = percent, y = id)) + geom_col(aes(group = type),
                                                            col = 'white')

Highlight your story points

Next, it’s time to highlight your story points. This can be done with the gghighlight. Alternatively, we can set the colors manually. The latter approach gave me the best results in this case, so we’ll go with that. But I am still a big fan of gghighlight, so don’t discard its power just yet.

Code

# Set colors as variable for easy change later
unhighlighted_color <- 'grey80'
highlighted_color <- '#E69F00'
avg_error <- 'black'
avg_rest <- 'grey40'

# Compute new column with colors of each bar
colored_dat <- ordered_dat %>% 
  mutate(custom_colors = case_when(
    id == 'ALL' & type == 'error' ~ avg_error,
    id == 'ALL' ~ avg_rest,
    type == 'error' & percent > 0.1 ~ highlighted_color,
    T ~  unhighlighted_color))

head(colored_dat)

# A tibble: 6 × 5
  id    fulfilled type     percent custom_colors
  <fct>     <dbl> <fct>      <dbl> <chr>        
1 1           803 accuracy    0.86 grey80       
2 1           803 error       0.1  grey80       
3 1           803 null        0.04 grey80       
4 2           865 accuracy    0.8  grey80       
5 2           865 error       0.14 #E69F00      
6 2           865 null        0.06 grey80

Code

p <- colored_dat %>% 
  ggplot(aes(x = percent, y = id)) + geom_col(aes(group = type),
                                              col = 'white',
                                              fill = colored_dat$custom_colors)
p

Notice how your eyes are immediately drawn to the intended region. That’s the power of colors! Also, note that setting the colors manually like this worked because fill in geom_col() is vectorized. This is not always the case.

Remove axes expansion and allow drawing outside of grid

Did you notice that there is still some clutter in the plot? Removing clutter from a plot is a central element of the SWD look. Personally, I like this approach. So, let’s get down to the essentials and remove what does not need to be there. In this case, there are still (faint) horizontal lines behind each bar. Furthermore, this causes the warehouse IDs to be slightly removed from the bars. We change that through formatting the coordinate system with coord_cartesian().

Code

p <- p +
  coord_cartesian(xlim = c(0, 1), ylim = c(0.5, 20.5), 
                  expand = F, # removes white spaces at edge of plot
                  clip = 'off') # allows drawing outside of panel
p

Here, we turned off the expansion to avoid wasting white space. Now, the IDs are at their designated place and we do not see lines from their names to the bars anymore. If you want even more power on the space expansion you can leave expand = T and modify the expansion for each axis with scale_*_continuous() and the expansion() function. Check out Christian Burkhart’s neat cheatsheet that teaches you everything you need to understand expansions.

On an unrelated note, you may wonder why I set clip = 'off'. This little secret will be revealed soon. For now, just know that it allows you to draw geoms outside the regular panel.

Move and format axes

You may have noticed that the x-axis in the finished plot is at the top of the panel rather than at the bottom. While that is unusual, it helps the reader to get straight to the point as the data is in view earlier. This assumes that the eyes of a typical dataviz reader will first look at the top left corner and then zigzag downwards.

In ggplot2, moving the axes and setting the break points happens in a scale layer. It is here where we use the scales::percent() function to transform the axes labels. Additionally, changing labels happens in labs() and the remaining axes and text changes happen in theme().

Code

unhighlighed_col_darker <- 'grey60'

p <- p + scale_x_continuous(breaks = seq(0, 1, 0.2),
                            labels = scales::percent,
                            position = 'top') +
  labs(title = 'Accuracy rates for highest volume warehouses',
       y = 'WAREHOUSE ID',
       x = '% OF TOTAL ORDERS FULFILLED') +
  theme(axis.line.x = element_line(colour = unhighlighed_col_darker),
        axis.ticks.x = element_line(colour = unhighlighed_col_darker),
        axis.text = element_text(colour = unhighlighed_col_darker),
        text = element_text(colour = unhighlighed_col_darker),
        plot.title = element_text(colour = 'black'))
p

Notice that we have customized the theme elements via element_*() functions. Basically, each geom type like “line”, “rect”, “text”, etc. has their own element_*() function. The theme() function expects attributes to be changed using these.

Align labels

Aligning plot elements, e.g. labels, to form clean lines is another major aspect of the SWD look. Before I read about it, I did not even notice it but once you see it you cannot go back. Basically, plots feel “more harmonious” if there are clear (not necessarily drawn) lines like with the left and right edge of the stacked bars. But this concept does not stop with the bars and can be used for the labels too. Let’s demonstrate that by moving the labels with more of theme().

Code

p <- p + theme(axis.title.x = element_text(hjust = 0),
               axis.title.y = element_text(hjust = 1),
               plot.title.position = 'plot')
    # aligns the title to the whole plot and not the (inner) panel
p

Once again, the design enforces that important information like what’s on an axis is in the top left corner. This was done by changing hjust. In this case hjust = 0 corresponds to left-justified whereas hjust = 1 corresponds to right-justified. Of course, vjust works similarly. For more details w.r.t. hjust and vjust, check out this stackoverflow answer that gives you everything that you need in one visual. For your convenience, here is a slightly changed form of that visual.

Code

# out.width = "300px", echo = FALSE}
knitr::include_graphics("story2.png")

But once you start aligning the axes titles, you notice that the 0% and 100% labels fall outside the grid. We could try to set hjust of axis.text.x in theme() but sadly this is not vectorized. Subsequently, all hjust values must be the same. That’s not bueno. Therefore, I drew the axes labels manually with annotate() but make sure that you remove the current labels in scale_x_continuous(). Also, now you know why we had to set clip = 'off' earlier. The axes labels are outside of the regular panel.

Code

p <- p +
  # Overwriting previous scale will generate a warning but that's ok
  scale_x_continuous(breaks = seq(0, 1, 0.2),
                     # We still want the axes ticks
                     labels = rep('', 6), 
                     # Empty strings as labels
                     position = 'top') +
  annotate('text', x = seq(0, 1, 0.2), y = 20.75,
           label = scales::percent(seq(0, 1, 0.2), accuracy = 1),
           size = 3,
           hjust = c(0, rep(0.5, 4), 1),
           # individual hjust here
           vjust = 0, 
           col = unhighlighed_col_darker) +
  theme(axis.title.x = element_text(hjust = 0, vjust = 0))

Scale for x is already present.
Adding another scale for x, which will replace the existing scale.

Code

        # change vjust to avoid overplotting
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
p

Add text labels

The same trick can be used to add the category description (accuracy, null, error) to the right top corner and label the highlighted bars. For the latter part, we simply extract the corresponding rows from our data and use that in conjunction with geom_text().

Code

text_labels <- colored_dat %>% 
  filter(type == 'error', percent > 0.1) %>% 
  mutate(percent = scales::percent(percent, accuracy = 1))

p <- p + geom_text(data = text_labels, 
                   aes(x = 1, label = percent), 
                   hjust = 1.1,
                   col = 'white',
                   size = 4)
p

Notice that I used a hjust value greater than 1 here to add some white space on the right side of the labels. Otherwise, the percent sign will be too close to the bar’s edge.

Next, we add the category descriptions. This is a bit more tricky, though, because we want to highlight a word too, So, we will add a richtext.

Code

library(ggtext)
p <- p + 
  annotate('richtext',
           x = 1,
           y = 21.25,
           label = "ACCURATE | NULL | <span style = 'color:#E69F00'>ERROR</span>",
           hjust = 1,
           vjust = 0, 
           col = unhighlighed_col_darker, 
           size = 4,
           label.colour = NA,
           fill = NA)
p

Add story text

Now that the bar plot is finished we can work on the story text. For that, we create another plot that contains only the text. Later on, we will combine both of our plots with the patchwork package. There are no really knew techniques here, so let’s get straight to the code.

Code

# Save text data in a tibble
tib_summary_text <- tibble(x = 0, 
                           y = c(1.65, 0.5), 
                           label = c("<span style = 'color:grey60'>OVERALL:</span> **The error rate is 10% across all<br>66 warehouses**. <span style = 'color:grey60'>The good news is that<br>the accuracy rate is 85% so we\'re hitting<br>the mark in nearly all our centers due to<br>the quality initiatives implemented last year.</span>",
  "<span style = 'color:#E69F00'>OPPORTUNITY TO IMPROVE:</span> <span style = 'color:grey60'>10 centers<br>have higher than average error rates of<br>10%-16%.</span> <span style = 'color:#E69F00'>We recommend investigating<br>specific details and **scheduling meetings<br>with operations managers to<br>determine what's driving this.**</span>"))

# Create text plot with geom_richtext() and theme_void()
text_plot <- tib_summary_text %>% 
  ggplot() +  geom_richtext(aes(x, y, label = label),
                            size = 3,
                            hjust = 0,
                            vjust = 0,
                            label.colour = NA) +
  coord_cartesian(xlim = c(0, 1), ylim = c(0, 2), clip = 'off') +
  # clip = 'off' is important for putting it together later.
  theme_void()

text_plot

Add main message as new title and subtitle

As I said before, we will put the two plots together with patchwork. Putting the plots together gives us another opportunity: We can now set additional titles and subtitles of the whole plot. Use these to add the main message of your plot.

But make sure that there is enough white space around them by setting the title margins in theme(). Otherwise, your plot will feel “too full”. Adding spacing is achieved through a margin() function in element_text(). Though, in this case we use element_markdown() which works exactly the same but enables Markdown syntax like using asterisks for bold texts.

library(patchwork)

Code

# Save texts as variables for better code legibility
# Here I used Markdown syntax
# To enable its rendering, use element_markdown() in theme
title_text <- "**Action needed:** 10 warehouses have <span style = 'color:#E69F00'>high error rates</span>"
subtitle_text <- "<span style = 'color:#E69F00'>DISCUSS:</span> what are <span style = 'color:#E69F00'>**next steps to improve errors**</span> at highest volume warehouses?<br><span style = 'font-size:10pt;color:grey60'>The subset of centers shown (19 out of 66) have the highest volume of orders fulfilled</span>"
caption_text <- "Wasn't this fun?"

# Compose plot
p + text_plot +
  # Make text plot narrower
  plot_layout(widths = c(0.6, 0.4)) +
  # Add main message via title and subtitle
  plot_annotation(title = title_text,
                  subtitle = subtitle_text,
                  caption = caption_text,
                  theme = theme(plot.title = element_markdown(
                    margin = margin(b = 0.4, unit = 'cm'),
                    # 0.4cm margin at bottom of title
                    size = 16),
                    plot.subtitle = element_markdown(
                      margin = margin(b = 0.4, 
                                      unit = 'cm'),
                      # 0.4cm margin at bottom of title
                      size = 11.5),
                    plot.caption.position = 'plot',
                    plot.caption = element_markdown(
                      hjust = 0, 
                      size = 7, 
                      #ColoUr = unhighlighed_col_darker, 
                      lineheight = 1.25),
                    plot.background = element_rect(fill = 'white', colour = NA)))

Code

# This is only a trick to make sure that background really is white
# Otherwise, some browsers or photo apps will apply a dark mode

Get the sizes right

In the last plot, I cheated. I gave you the correct code I used to generate the picture. But I did not execute it. Instead, I only displayed the code and then showed you the (imported) picture from the start of this blog post. Why did I do this? Because getting the sizes right sucks!

If you have dealt with ggplot enough, then you will know that text sizes are often set in absolute rather than in relative terms. Therefore, if you make the bar plot smaller in width (like we did), then the bars may be appropriately scaled to the new width but, more often than not, the texts are not. In this case, this led to way too large fonts as beautifully demonstrated in Christophe Nicault’s helpful blog post.

So, how do you avoid this? First off, choose size and fonts last (choose the font first, though). This will save you a lot of repetitive work when you change the alignment in your plot. But this tip will only get you so far, because you have to fix some sizes in between to get a feeling for the visualization you are trying to create.

Therefore, try to get you canvas into an appropriate size first. I try to do this by using the camcorder package at the start of my visualization process. This will ensure that my plots are saved as a png-file with predetermined dimensions and the resulting file is displayed in the Viewer pane in RStudio (as opposed to the Plots pane).

For example, at the start of working on this visualization I have called

Code

camcorder::gg_record(
  dir = 'img', dpi = 300, width = 16, height = 9, units = 'cm')

This made getting the sizes right for my final output somewhat easier because the canvas size remains the same throughout the process. Though be sure to call gg_record() after library(ggtext) or make sure that you call gg_record() again if you add ggtext only later. Otherwise, your plots will revert back to being displayed in the Plots pane (with relative sizing). Finally, if you want to use camcorder in conjunction with showtext, then be sure that showtext will know what dpi value you chose when calling gg_record().

Another Dataviz

This plot comes to you via another excellent entry of the storytelling with data (SWD) blog. To draw rectangles with rounded rectangles we can leverage the ggchicklet package. Though, for some mysterious reason, the geom_* that we need is hidden in that package. Therefore, we will have to dig it out. That’s the easy way to do it. And honestly, this is probably also the practical way to do it.

However, every now and then I want to do things the hard way. So, my dear reader, this is why I will also show you how to go from rectangles to rounded rectangles the hard way. But only after showing you the easy way first, of course. Only then, in the second part of this blog post, will I take the sadistically-inclined among you on a tour to the world of grobs.

Grobs, you say? Is that an instrument? No, Patrick, it is an graphical object. Under the hood, we can transform a ggplot to a list of graphical objects. And with a few hacks, we can adjust that list. This way, the list will contain not rectGrobs but roundrectGrobs. Then, we can put everything back together, close the hood and enjoy our round rectangles. Now, enough intro, let’s go.

Basic plot

First, let us recreate the “bad” plot that the above SWD blog post remodels. In the end, we will work on the remodeled data viz too.

Code

# out.width = "300px", echo = FALSE}
knitr::include_graphics("story3.png")

Read Data

The underlying data was not available to I had to guess the values from the plot.

Code

# Use read_csv2 because it's an European file
dat <- read.csv('story_ratios.csv', sep=";")
head(dat)

                 location inventory_turnover store_upper store_lower
1              Basin City               12.7         8.0        4.50
2 King's Landing Location                4.4         7.6        4.30
3                 Hilwood               10.0        15.0        6.00
4           Shermer Store                6.8         4.4        2.50
5             The Citadel                7.4         5.5        3.00
6               Sunnydale                8.0         5.1        1.25

Compute Averages

Let me point out that taking the average of the ratios may not necessarily give an appropriate result (in a statistical kind of sense). But, once again, this should not bother us as we only want to learn how to plot.

Code

avgs <- dat %>% pivot_longer(cols = -1, names_to = 'type', values_to = 'ratio') 
head(avgs)

# A tibble: 6 × 3
  location                type               ratio
  <chr>                   <chr>              <dbl>
1 Basin City              inventory_turnover  12.7
2 Basin City              store_upper          8  
3 Basin City              store_lower          4.5
4 King's Landing Location inventory_turnover   4.4
5 King's Landing Location store_upper          7.6
6 King's Landing Location store_lower          4.3

Code

avgs <- avgs %>% group_by(type) %>% summarise(ratio = mean(ratio)) %>% 
  mutate(location = 'Region Average')

avgs

# A tibble: 3 × 3
  type               ratio location      
  <chr>              <dbl> <chr>         
1 inventory_turnover  9.78 Region Average
2 store_lower         7.11 Region Average
3 store_upper        12.1  Region Average

Code

### Combine with data 
dat_longer <- dat %>% pivot_longer(cols = -1, names_to = 'type', values_to = 'ratio') 
dat_longer_with_avgs <- dat_longer %>% bind_rows(avgs)

head(dat_longer_with_avgs)

# A tibble: 6 × 3
  location                type               ratio
  <chr>                   <chr>              <dbl>
1 Basin City              inventory_turnover  12.7
2 Basin City              store_upper          8  
3 Basin City              store_lower          4.5
4 King's Landing Location inventory_turnover   4.4
5 King's Landing Location store_upper          7.6
6 King's Landing Location store_lower          4.3

Code

## Colors we will use throughout this blog post
color_palette <- thematic::okabe_ito(8)

# Make sure that bars are in the same order as in the data set
dat_factored <- dat_longer %>% 
  mutate(location = factor(location, levels = dat$location)) 

p <- dat_factored %>% ggplot(aes(location, ratio)) +
  geom_col(data = filter(dat_factored, type == 'inventory_turnover'),
           fill = color_palette[2]) +
  theme_minimal()
p

Turn labels and get rid of axis text

Code

p <- p +
  labs(x = element_blank(), y = element_blank()) +
  theme(axis.text.x = element_text(angle = 50, hjust = 1)  )
p

Remove expansion to get x-labels closer to the bars

Code

p <- p + coord_cartesian(ylim = c(0, 30), expand = F)
p

Remove other grid lines

Code

p <- p + theme(panel.grid.minor = element_blank(),
               panel.grid.major.x = element_blank(),
               panel.grid.major.y = element_line(color = 'black', size = 0.75))

Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

Code

Format y-axis

Code

p <- p + scale_y_continuous(breaks = seq(0, 30, 5),
                            labels = scales::label_comma(accuracy = 0.1))
p

Add points

Code

p <- p + geom_point(data = filter(dat_factored, type == 'store_lower'),
                    col = color_palette[1],
                    size = 3) +
  geom_point(data = filter(dat_factored, type == 'store_upper'),
             col = color_palette[3],
             size = 3) 
p

Add average lines

Code

p <- p +  geom_hline(yintercept = avgs[[3, 'ratio']], 
                     size = 2.5, 
                     col = color_palette[3]) +
  geom_hline(yintercept = avgs[[2, 'ratio']], 
             size = 2.5, 
             col = color_palette[1])

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Code

Add text labels

Code

p + geom_text(data = filter(dat_factored, type == 'inventory_turnover'),
              aes(label = scales::comma(ratio, accuarcy = 0.1)),
              nudge_y = 0.8, size = 2.5)

Improved plot

Now, let us begin building the improved plot. First, let us get the long labels onto the y-axis and use regular rectangles before we worry about the rounded rectangles.

Flip axes and use rectangles to show upper and lower bounds.

Unfortunately, geom_rect() does not work as intended.

Code

dat_with_avgs <- dat_longer_with_avgs %>% 
  pivot_wider(names_from = 'type', values_from = 'ratio')
head(dat_with_avgs)

# A tibble: 6 × 4
  location                inventory_turnover store_upper store_lower
  <chr>                                <dbl>       <dbl>       <dbl>
1 Basin City                            12.7         8          4.5 
2 King's Landing Location                4.4         7.6        4.3 
3 Hilwood                               10          15          6   
4 Shermer Store                          6.8         4.4        2.5 
5 The Citadel                            7.4         5.5        3   
6 Sunnydale                              8           5.1        1.25

Code

dat_with_avgs %>% ggplot() + geom_rect(aes(xmin = store_lower, 
                                           xmax = store_upper, 
                                           ymin = location, 
                                           ymax = location))

Instead, let us create a new numeric column containing a location’s rank based on its inventory_turnover. This is done with row_number(). While we’re at it, let us create a new tibble that also contains information on the colors each geom will use. Then, we can map to these new columns in ggplot and make sure that the values are used as is by setting scale_*_identity(). This is one convenient way to control the aesthetics of each geom without functional programming. With the image from above in mind, we know that our final plot will need

different col, fill and size values in geom_point()
different fill and alpha values in geom_rect()

Here’s what this tibble looks like.

Code

bar_height <- 0.4 
no_highlight_col <- 'grey70'
average_highlight_col <- 'grey40'
below_highlight <- color_palette[2]

sorted_dat <- dat_with_avgs %>% 
  mutate(num = row_number(inventory_turnover)) %>% 
  # Sort so that everything is in order of rank
  # Important for text labels later on
  arrange(desc(num)) %>% 
  mutate(rect_color = case_when(
    inventory_turnover < store_lower ~ below_highlight,
    location == 'REGION AVERAGE' ~ average_highlight_col,
    T ~ no_highlight_col),
    
    rect_alpha = if_else(inventory_turnover < store_lower,
                         0.5,
                         1),
    point_color = if_else(inventory_turnover < store_lower,
                          below_highlight,
                          'black'),
    point_fill = if_else(inventory_turnover < store_lower,
                         below_highlight,
                         'white'),
    point_size = if_else(inventory_turnover < store_lower,
                         3,
                         2))
sorted_dat

# A tibble: 24 × 10
   locat…¹ inven…² store…³ store…⁴   num rect_…⁵ rect_…⁶ point…⁷ point…⁸ point…⁹
   <chr>     <dbl>   <dbl>   <dbl> <int> <chr>     <dbl> <chr>   <chr>     <dbl>
 1 Castle…    14.7    24.3    20      24 #009E73     0.5 #009E73 #009E73       3
 2 Wellsv…    13.6     7.6     2.5    23 grey70      1   black   white         2
 3 Basin …    12.7     8       4.5    22 grey70      1   black   white         2
 4 Atlant…    12.3    12.8     7.8    21 grey70      1   black   white         2
 5 Neverl…    12.1    18      13.4    20 #009E73     0.5 #009E73 #009E73       3
 6 Bluffi…    11.7     4.2     3      19 grey70      1   black   white         2
 7 Bikini…    11.6    12.5     7.8    18 grey70      1   black   white         2
 8 Metrop…    11.3    24      11.3    17 grey70      1   black   white         2
 9 Hill V…    11      22       7.5    16 grey70      1   black   white         2
10 Venusv…    10.4    15.5    12.3    15 #009E73     0.5 #009E73 #009E73       3
# … with 14 more rows, and abbreviated variable names ¹location,
#   ²inventory_turnover, ³store_upper, ⁴store_lower, ⁵rect_color, ⁶rect_alpha,
#   ⁷point_color, ⁸point_fill, ⁹point_size

Now, we can create our plot. Notice that I set shape = 21 in geom_point() to use both the fill and col aesthetic.

Code

sorted_dat %>% 
  ggplot() + geom_rect(aes(xmin = store_lower, 
                           xmax = store_upper, 
                           ymin = num - bar_height, 
                           ymax = num + bar_height, 
                           fill = rect_color,
                           alpha = rect_alpha)) +
  geom_point(aes(x = inventory_turnover,
                 y = num,
                 fill = point_fill,
                 col = point_color,
                 size = point_size),
             shape = 21,
             stroke = 1) +
  scale_fill_identity() +
  scale_color_identity() +
  scale_size_identity() +
  scale_alpha_identity() +
  theme_minimal()

Use `ggchicklet` for rounded rectangles

The whole point of this blog post is to use rounded rectangles. So let’s do that. The ggchicklet package has a geom called geom_rrect(). It works just like geom_rect() but accepts another value r which is used to determine the radius of the rounded rectangles. Unfortunately, this geom is not an exported function of this package. This means that if you write ggchicklet:: (e.g. in RStudio) and press TAB you won’t see geom_rrect(). Thus, you have to access the internal function via ::: (three colons).

Code

p <- sorted_dat %>% ggplot() + 
  ggchicklet:::geom_rrect(aes(xmin = store_lower, 
                              xmax = store_upper, 
                              ymin = num - bar_height, 
                              ymax = num + bar_height, 
                              fill = rect_color,
                              alpha = rect_alpha),
                          r = unit(0.5, 'npc')) +
                    # Use relative npc unit (values between 0 and 1)
                    # This ensures that radius is not too large for your canvas

  geom_point(aes(x = inventory_turnover,
                 y = num,
                 fill = point_fill,
                 col = point_color,
                 size = point_size),
             shape = 21,
             stroke = 1) +
  scale_fill_identity() +
  scale_color_identity() +
  scale_size_identity() +
  scale_alpha_identity() +
  theme_minimal()
p

Remove grid lines, move axis and add some text elements

We will set the y-axis labels manually later on. Otherwise, we cannot change its colors one-by-one. For now, let’s get rid of superfluous grid lines, move the x-axis and add a title.

Notice that I draw the axis line manually with a segment annotation. This seems weird, I know. Unfortunately, it cannot be helped because I still need room for the y-axis labels. And if I do not plot the axis line manually, then the axis line will start all the way to the left. Make sure that you set clip = 'off' in coord_cartesian() for the annotation to be displayed.

Code

title_lab <- 'Review stores with turnover ratios that are below their\nforecasted range'
title_size <- 14
axis_label_size <- 8
text_size <- 18

p <- p + scale_x_continuous(breaks = seq(0, 25, 5),
                            position = 'top') +
  coord_cartesian(xlim = c(-5, 25), 
                  ylim = c(0.75, 24.75),  
                  expand = F,
                  clip = 'off') +
  annotate('segment',
           x = 0,
           xend = 25,
           y = 24.75,
           yend = 24.75,
           col = no_highlight_col,
           size = 0.25) +
  labs(x = 'Inventory Turnover Ratio',
       y = element_blank(),
       title = title_lab) +
  theme(text = element_text(size = text_size,
                            color = average_highlight_col),
        plot.title.position = 'plot',
        panel.grid = element_blank(),
        axis.title.x = element_text(size = axis_label_size,
                                    hjust = 0.21,
                                    color = no_highlight_col),
        axis.text.x = element_text(size = axis_label_size,
                                   color = no_highlight_col),
        axis.ticks.x = element_line(color = no_highlight_col, size = 0.25),
        axis.text.y = element_blank(),
        axis.line.x = element_blank())
p

Add y-axis labels

Code

y_axis_text_size <- 3

p + geom_text(aes(x = 0,
                  y = num,
                  label = location,
                  col = no_highlight_col,
                  hjust = 1,
                  size = y_axis_text_size))

Highlight words

Let us turn to text highlights. For that we will need ggtext. This will let us use geom_richtext() instead of geom_text(). Notice that I have note saved the last geom_text() modification in p. Otherwise, we would get two overlapping layers of text.

Code

sorted_dat_with_new_labels <- sorted_dat %>% 
  mutate(location_label = case_when(
    inventory_turnover < store_lower ~ glue::glue(
      '<span style = "color:{below_highlight}">**{location}**</span>'),
    location == 'REGION AVERAGE' ~ glue::glue(
      '<span style = "color:{average_highlight_col}">**{location}**</span>'),
    T ~ location))

p <- p + geom_richtext(data = sorted_dat_with_new_labels,
                       aes(x = 0,
                           y = num,
                           label = location_label,
                           col = no_highlight_col,
                           hjust = 1,
                           size = y_axis_text_size),
                       label.color = NA,
                       fill = NA)
p

Fantastic! Next, we only have to highlight words in our call to action. Make sure that plot.title in theme() is an element_markdown().

Code

title_lab_adjusted <- glue::glue(
  "Review stores with **turnover ratios** that are <span style = 'color:{below_highlight}'>below their</span><br><span style = 'color:#7fceb9;'>**forecasted range**</span>")

p +
  labs(title = title_lab_adjusted) + theme(plot.title = element_markdown(),
    panel.background = element_rect(color = NA, fill = 'white'))

Recreating the Storytelling with Data look with ggplot

Flip the axes for long names

Add reference points

Order your data

Highlight your story points

Remove axes expansion and allow drawing outside of grid

Move and format axes

Align labels

Add text labels

Add story text

Another Dataviz

Basic plot

Read Data

Compute Averages

Improved plot

Flip axes and use rectangles to show upper and lower bounds.

Use ggchicklet for rounded rectangles

Remove grid lines, move axis and add some text elements

Highlight words

Use `ggchicklet` for rounded rectangles