--- title: "catmaply" author: "Yves Mauron" date: "`r Sys.Date()`" output: html_document: fig_width: 10 fig_height: 9 self_contained: yes toc: yes pandoc_args: - +RTS - "-K128m" - "-RTS" pdf_document: toc: yes --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heatmaps are used in various forms of analytics, however, this R package specifically focuses on providing an efficient way for creating interactive heatmaps for categorical data or continuous data that can be grouped into categories. This package is originally being developed for Verkehrsbetriebe Zürich (VBZ), the public transport operator in the Swiss city of Zurich, to illustrate the utilization of different routes and vehicles during different times of the day. Therefore, it groups utilization data (e.g. persons per m^2) into different categories (e.g. low, medium, high utilization) and illustrates it for certain stops over time in a heatmap. This package can easily be integrated into a shiny dashboard which supports additional interactions with other plots (e.g. boxplot, histogram, forecast) by using plotly events. A mini-demo app is provided in a separate github repository named [catmaply_shiny](https://github.com/yvesmauron/catmaply_shiny/). This work is based on the plotly.js engine. ### Please submit feature requests This package is still under active development. If you have features you would like to have added, please submit your suggestions (and bug-reports) at: ### News You can see the most recent changes of the package in [NEWS.md](https://github.com/VerkehrsbetriebeZuerich/catmaply/blob/master/NEWS.md). ## Installation To install the latest ("cutting-edge") GitHub version run: ```{r} # make sure that you have the corrent RTools installed. # as you might need to build some packages from source # if you don't have RTools installed, you can install it with: # install.packages('installr'); install.Rtools() # not tested on windows # or download it from here: # https://cran.r-project.org/bin/windows/Rtools/ # in any case, make sure that you select the correct version, # otherwise the installation will fail. # then you'll need devtools # if (!require('devtools')) # install.packages('devtools') # finally install the package # devtools::install_github('VerkehrsbetriebeZuerich/catmaply') ``` To get the latest version on CRAN, perform: ```{r} #install.packages("catmaply") ``` Thereafter, you can start using the package as usual: ```{r} library(catmaply) library(dplyr) ``` ## Usage Catmaply provides data of VBZ to easily start experimenting with the package. For demonstration purposes and simplicity, we will use it in this notebook. As usual, you can access the data as follows. ```{r out.width='100%'} data("vbz") df <- na.omit(vbz[[1]]) %>% filter(.data$vehicle == "PO") str(df) ``` The main columns of the `vbz` data.frame can be described as follows: - `trip_seq` shows the order of the trips - `stop_name` shows the names of the stops, that need to be ordered by `stop_seq` - `occ_category` shows the category of the data point (e.g. 1 - very few people in the bus, 5 - bus is full) - `occupancy` is e.g. the number of people per m^2. ### Default behaviour Catmaply expects at least arguments for both axis (`x`, `y`) and the fields (`z`). To visualize the occupancy for all stops and trips, we can put the `stop_seq` on `y`, `trip_seq` on `x` and `occupancy` on `z` as follows: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x = trip_seq, y = stop_seq, z = occ_category ) ``` By default, catmaply produces an interactive heatmap with a rangeslider, legend items that show and/or hide data for a specific occupancy category by clicking on it and a hover label, that shows the values for `x`, `y` and `z` on hover. Also, please note that you can use both, column names with and without quotes as column references for e.g. x, y, z. E.g. if we want to put the stop_names on the `y` axis, we can simply put the `stop_name` on `y` and order it using `stop_seq` (the x axis has of course the same functionality); as shown in the following: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x = trip_seq, y = stop_name, y_order = stop_seq, z = occ_category ) ``` What about more expressive labels for the legend? You can use any column for the legend entries, as long as it matches the category in the fields. So, if you would like to show `occ_cat_name` ("low", "medium", "high") in the legend instead of `occ_category` (1,2,3,..), you can set the parameter `legend_col` and overwrite the legend labels as follows: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x = trip_seq, y = stop_name, y_order = stop_seq, z = occ_category, legend_col = occ_cat_name ) ``` Not happy with the color palette? Then let's change it :-). To change the color palette you can either submit a color palette vector or a function that is able to return one. > Note: that the color palette function needs to take n as first argument, whereas n defines the number of colors to be produced. ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x = trip_seq, y = stop_name, y_order = stop_seq, z = occ_category, color_palette = viridis::magma, legend_col = occ_cat_name ) ``` You don't like all this interactivity of the legend or you need more performance for large plots? Let's turn the interactive legend off by setting `legend_interactive = FALSE`. ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x = trip_seq, y = stop_name, y_order = stop_seq, z = occ_category, color_palette = viridis::magma, legend_interactive = FALSE, legend_col = occ_cat_name ) ``` ### Color ranges per category How about illustrating differences in one category? You can illustrate values that are particularly low (high) within one category by specifying one color range per category. To show one color range per category, we have to put a continuous number in the fields and categorize it with a categorical column, so in our example: - `occupancy` in the fields - `occ_category` is the categorization over these fields. Also, lets add a more expressive x_label to the plot, essentially concatenating columns `trip`, `vehicle` and `circulation_name`. ```{r fig.height=8, out.width='100%', warning = FALSE} df <- df%>% mutate(x_label = paste( formatC(trip_seq, width=3, flag="0"), vehicle, formatC(circulation_name, width=2, flag = "0"), sep="-") ) catmaply( df, x = x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::magma, legend_col = occ_cat_name ) ``` This works also for non-interactive legends. ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x = x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::magma, legend_interactive = FALSE, legend_col = occ_cat_name ) ``` ### Axis formatting Now, lets mess around with axis formatting; let's change - `font_color` to the "purplish" color used in the logo (#6D65AB) - `font_size` to 10 pt. - `font_family` to "verdana" - `x_tickangle` to 80 and, just for fun, - `y_tickangle` to -10 ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x='x_label', x_order = 'trip_seq', x_tickangle = 80, y = "stop_name", y_order = "stop_seq", y_tickangle = -10, z = "occupancy", categorical_color_range = TRUE, categorical_col = 'occ_category', color_palette = viridis::magma, font_size = 10, font_color = '#6D65AB', font_family = "verdana", legend_col = occ_cat_name ) ``` ### Hover What about a custom hover label? Catmaply allows to define custom hover templates with the parameter `hover_template`; which can take `html` tags to make it more appealing. Here is an example that creates bold column names of the respective values; also it rounds the occupancy to 2 decimal points. ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, x_tickangle = 80, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( 'Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), legend_col = occ_cat_name ) ``` > Note: usually it is a good idea to add the `` tag at the end to hide trace information. Legend and fancy hover template is too much info; you want simplicity and, thus, hide the legend altogether? Go ahead.. :-) ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, x_tickangle = 80, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), legend_col = occ_cat_name, legend = FALSE ) ``` Minimalist or visual person that does not like hover nor legend...? ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, x_tickangle = 80, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_hide = TRUE, legend_col = occ_cat_name, legend = FALSE ) ``` Ok, no hover but legend it is... (for completeness). ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, x_tickangle = 80, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_hide = TRUE, legend_col = occ_cat_name, legend = TRUE ) ``` ### Time Axis Hmm, didn't we say that we want to show the development over time? Wouldn't it make sense then, if we could illustrate time dynamically on the x axis? Lets check out how a dynamic x axis can be created if you put a column of type `PSIXct` or `POSIXt` on the x axis. Lets check it out by calculating the departure datetime of each drive. ```{r fig.height=8, out.width='100%', warning = FALSE} df <- df %>% na.omit() %>% dplyr::group_by( trip_seq ) %>% dplyr::mutate( departure_date_time = min(na.omit(lubridate::ymd_hms(paste("2020-08-01", departure_time)))) ) %>% dplyr::ungroup() catmaply( df, x=departure_date_time, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ) ) ``` Currently, formatting of the time axis is optimised to analyse daily data; e.g. if you sample the utilization throughout the year and then summarise it to get the utilization of a typical day. Thus, the formatting of the max zoom level is still hours and not years. However, you can change the individual formatting of the respective zoom level by setting the `tickformatstops` parameter. So, if you want to e.g. remove the h, m, s and ms that indicate the unit of time above, you could achieve this as follows (more infos can be found in the tick formatting example of [plotly](https://plotly.com/r/tick-formatting/): ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=departure_date_time, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), tickformatstops=list( list(dtickrange = list(NULL, 1000), value = "%H:%M:%S.%L"), list(dtickrange = list(1000, 60000), value = "%H:%M:%S"), list(dtickrange = list(60000, 3600000), value = "%H:%M"), list(dtickrange = list(3600000, 86400000), value = "%H:%M"), list(dtickrange = list(86400000, 604800000), value = "%H:%M"), list(dtickrange = list(604800000, "M1"), value = "%H:%M"), list(dtickrange = list("M1", "M12"), value = "%H:%M"), list(dtickrange = list("M12", NULL), value = "%H:%M") ) ) ``` ### Slider Besides the rangeslider, it is also possible to use a simple slider/scrollbar. This might be especially favourable for annotated heatmaps shown in the next section You can switch to a slider as easy as follows: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), rangeslider = FALSE, # to prevent warning legend_interactive = FALSE, # to prevent warning slider = TRUE # activate slider ) ``` To add a prefix for the current value (the one above the slider); set the `slider_currentvalue_prefix` as follows: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), rangeslider = FALSE, # to prevent warning legend_interactive = FALSE, # to prevent warning slider = TRUE, # activate slider, slider_currentvalue_prefix = "Trip: " ) ``` Also, you can show/hide various elements of the slider, more specifically the steps, ticks and current value. This gets you almost a scrollbar feeling: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), rangeslider = FALSE, # to prevent warning legend_interactive = FALSE, # to prevent warning slider = TRUE, # activate slider, slider_currentvalue_visible = FALSE, slider_step_visible = FALSE, slider_tick_visible = FALSE ) ``` The slider steps are created automatically for you, however, you can also define them yourself. Catmaply provides two modes to alter the way the slider steps are created: auto, and custom list. Mode auto alters the way the steps are automatically created for you. You need to provide parameter `slider_steps` a list with the following parameters: - `slider_start` (numeric): the starting-point of the slider on the `x` axis. - `slider_range` (numeric): the size of the window of the slider. - `slider_shift` (numeric): how much units the window should be moved to the right for each step. - `slider_step_name` (column): the name of the step; which must match the occurrence of `x`. For example, if want to create a slider that starts at trip 5 with a window of size 15 trips and a shift of size 10 trips; you can do this as follows: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), rangeslider = FALSE, # to prevent warning legend_interactive = FALSE, # to prevent warning slider = TRUE, # activate slider, slider_currentvalue_visible = FALSE, slider_steps=list( slider_start=1, slider_range=15, slider_shift=10, slider_step_name="x" # same name as x axis (must be character) ) ) ``` Besides mode auto, you can also get full control of the slider steps by providing a list with the following elements: - `name` name of the step - `range` range to be covered by the step A simple example is shown below: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::inferno, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), rangeslider = FALSE, # to prevent warning legend_interactive = FALSE, # to prevent warning slider = TRUE, # activate slider, slider_currentvalue_visible = FALSE, slider_steps = list( list(name="Very important step one", range=c(12, 37)), list(name="Very important step two", range=c(87, 111)) ) ) ``` ### Annotations Sometimes it makes sense to add annotations to a heatmap. With catmaply, you can add and format annotations relatively easily with the following parameters: - `text` column name holding the values of the text to be displayed. - `text_color` the color of the text (similar to font_color). - `text_size` the size to be used for the text (similar to font_size). - `text_font_family` the font family to be used for the text (similar to font_family). Adding annotations to the previous slider example can be achieved as follows: ```{r fig.height=8, out.width='100%', warning = FALSE} catmaply( df, x=x_label, x_order = trip_seq, y = stop_name, y_order = stop_seq, z = occupancy, text = occ_category, text_color="#000", text_size=12, text_font_family="Open Sans", categorical_color_range = TRUE, categorical_col = occ_category, color_palette = viridis::plasma, hover_template = paste( '
Trip:', trip_seq, '
Stop Name:', stop_name, '
Occupancy category:', occ_cat_name, '
Occupancy:', round(occupancy, 2), '' ), rangeslider = FALSE, # to prevent warning legend_interactive = FALSE, # to prevent warning slider = TRUE, # activate slider, slider_currentvalue_visible = FALSE, slider_steps=list( slider_start=1, slider_range=15, slider_shift=10, slider_step_name="x" # same name as x axis (must be character) ) ) ``` > Note: annotations can also be used with rangeslider or no slider at all. However, lots of annotations might have a negative influence on the performance. Only text values that are not NA are used for annotations. ## Credits This package only exists thanks to the amazing work done by many people in the open source community. Beyond the many people working on the pipeline of R, thanks should go to the plotly team, and especially to Carson Sievert and others working on the R package of plotly. Also, a special thanks to VBZ for providing advice on the functionality of the package as well as providing the vbz dataset; also, I would like to thank VBZ for investing time to test the package and for using it in your awesome shiny VBZ dashboard. ## Session info ```{r} sessionInfo() ```