Here are some notes on how to use
R (specifically the
ggtree package) to draw phylogenetic trees. In this first section, I will show:
In other sections, I would like to cover how to make circular figures with heatmaps.
Firstly, what is
‘ggtree’ extends the ‘ggplot2’ plotting system which implemented the grammar of graphics. ‘ggtree’ is designed for visualization and annotation of phylogenetic trees and other tree-like structures with their annotation data. https://github.com/YuLab-SMU/ggtree
I prefer to make worked examples from real data. Many common problems I encounter do not appear in simulated/toy datasets. To that end I have chosen some genomes from Salmonella enterica serovar Minnesota. If you would like to know more, we discussed these in a recent publication: Alikhan et al. (2022) PLoS Genet 18(6): e1010174. https://doi.org/10.1371/journal.pgen.1010174
The raw data is here if you want to follow along:
This does not directly correspond to the Minnesota tree in the paper, so do not expect it to match.
The most basic annotate tree with coloured tips for countries with an included key/legend and scale.
In terms of configuring the tree scale on
Tip labels can be tricky. Some trees, like this example one, can look very cluttered when tip labels are shown. I do not believe there is an easy fix for this. If you do encounter this problem you can try:
ggtree(tree, layout="circular"), see section below on "Choosing a layout"
For rectangular and dendrogram layouts you can use
as_ylab to align all the labels to the edge.
Different layouts have different benefits and drawbacks. Layouts can support different number of tips on the figure. In general, rectangular displays the data most clearly, but circular layouts can fit more tips (and labels) before it becomes cluttered. In practice I would start with a rectangular layout (like the basic sample above) and if it is too cluttered, I would then try a circular layout.
There are other layouts, but I avoid these for different reasons. Of these, daylight and equal angle can look very pretty but cannot show more than tens of tips. They also cannot indicate the root clearly, which can be a problem for people who insist that all phylogenetic trees must have a root. I do not strictly agree with this. Phylogenies can be used to just to illustrate which taxa cluster with which, and in that case an unrooted tree is fine. The author, however, should clearly state they are not trying to determine which clade came first (evolutionary speaking) but they are just illustrating that the clades are there.
Here are some limits to help you pick the best layout given the number of tips in the tree:
|Layout||Max number of tips||Max number of tips (with labels)|
You can also draw the tree ignoring branch lengths, which might make it easier to show the topology. e.g.
ggtree(tree, layout="daylight", branch.length = 'none').
In that case, be sure to state clearly that the branch lengths are not to scale.
See https://xiayh17.gitee.io/treedata-book/chapter4.html section 4.2.2 for different layouts you can choose.
The banner image is an AI generated picture (Midjourney) with prompt; 'phylogenetic tree :: schematic drawing :: steampunk style'. You can share and adapt this image following a CC BY-SA 4.0 licence