Plotting species accumulation curves using ggplot

tidyverse vegan R

A slow and steady introduction

Guy Sutton https://twitter.com/Guy_F_Sutton
10-29-2020

Today, we are going to look at how to compute and plot species accumulation curves (hereafter ‘SAC’) using the amazing vegan package. SAC’s are extremely popular in ecological analyses as they allow us to evaluate whether performing additional sampling is required to record all of the insect species associated with a particular plant species, for example.

In this session, we are going to cover the most basic SAC - observed richness. Observed species richness (hereafter ‘S’) simply tells us:
* how many species we have recorded, to date, and
* whether more surveys could yield new additional species.
* But, it DOES NOT tell us how many species could be in the community (i.e. we cannot extrapolate) - we will cover how to extrapolate species richness in a later post.

Let’s load in a typical community richness dataset (species abundance matrix). This dataset represents the insect community associated with the shurb, Lycium ferocissimum, in South Africa.

# Load required packages 
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, 
               tidyr, 
               janitor,
               vegan)

# Read in data 
sp_comm <- readr::read_csv("https://raw.githubusercontent.com/guysutton/CBC_coding_club/master/data_raw/species_abundance_matrix_ex.csv") %>%
  # Clean column names 
  janitor::clean_names()

# Check data entry 
dplyr::glimpse(sp_comm)
Rows: 56
Columns: 56
$ provinces                                   <chr> "Eastern Cape...
$ climatic_zones                              <chr> "Cfb", "Cfb",...
$ site                                        <chr> "EC1", "EC1",...
$ season                                      <dbl> 1, 2, 1, 2, 1...
$ haplotype                                   <dbl> 5, 5, 5, 5, 5...
$ cleta_eckloni                               <dbl> 23, 8, 11, 0,...
$ pseudambonea_capeni_schuhistes_lekkersingia <dbl> 4, 28, 0, 0, ...
$ acanthocoris_spinosus                       <dbl> 1, 0, 0, 0, 0...
$ antestiopsis_thunbergii                     <dbl> 1, 0, 0, 0, 0...
$ cassida_distinguenda                        <dbl> 0, 3, 0, 0, 0...
$ epilachna_sp_1                              <dbl> 0, 4, 0, 0, 0...
$ cleta_sp_1                                  <dbl> 0, 0, 0, 0, 0...
$ cleta_sp_2                                  <dbl> 0, 0, 0, 0, 0...
$ exochomus_flavipes                          <dbl> 0, 0, 0, 0, 0...
$ scymnus_sp                                  <dbl> 0, 0, 0, 0, 0...
$ cheilomenes_lunata                          <dbl> 0, 0, 0, 0, 0...
$ cheilomenes_sulphurea                       <dbl> 0, 0, 0, 0, 0...
$ cf_nephus_sp                                <dbl> 0, 0, 0, 0, 0...
$ chnootriba_sp                               <dbl> 0, 0, 0, 0, 0...
$ oenopia_cinctella                           <dbl> 0, 0, 0, 0, 0...
$ hippodamia_variegate                        <dbl> 0, 0, 0, 0, 0...
$ cassida_melanophthalma                      <dbl> 0, 0, 0, 0, 0...
$ cassida_reticulipennis                      <dbl> 0, 0, 0, 0, 0...
$ macetes_sp                                  <dbl> 0, 0, 0, 0, 0...
$ cryptocephalus_nr_liturellus                <dbl> 0, 0, 0, 0, 0...
$ epitrix_sp                                  <dbl> 0, 0, 0, 0, 0...
$ chrysomelidae_sp                            <dbl> 0, 0, 0, 0, 0...
$ sulcobruchus_longipennis                    <dbl> 0, 0, 0, 0, 0...
$ monolepta_bioculata                         <dbl> 0, 0, 0, 0, 0...
$ eurytomidae_sp                              <dbl> 0, 0, 0, 0, 0...
$ pachycnema_sp                               <dbl> 0, 0, 0, 0, 0...
$ scarabaeidae_sp                             <dbl> 0, 0, 0, 0, 0...
$ neoplatygaster_serieturberculata            <dbl> 0, 8, 0, 0, 0...
$ sciobius_sp                                 <dbl> 0, 0, 0, 0, 0...
$ lixini_sp                                   <dbl> 0, 0, 0, 0, 0...
$ beaufortiana_cornuta                        <dbl> 0, 0, 0, 0, 1...
$ pentatomidae_sp                             <dbl> 0, 0, 0, 0, 0...
$ apalochrus_sp                               <dbl> 0, 0, 0, 0, 0...
$ hylomela_sexpunctata                        <dbl> 0, 0, 0, 0, 1...
$ anthripidae_sp                              <dbl> 0, 0, 0, 0, 1...
$ cenaeus_carnifex                            <dbl> 0, 0, 0, 0, 0...
$ thrips_simplex                              <dbl> 0, 0, 0, 0, 0...
$ ceratitis_sp                                <dbl> 0, 0, 0, 0, 0...
$ brachymeria_sp                              <dbl> 0, 0, 0, 0, 0...
$ syrphidae_sp                                <dbl> 0, 0, 0, 0, 0...
$ cicadidae_sp                                <dbl> 0, 0, 0, 0, 0...
$ apis_mellifera                              <dbl> 0, 0, 0, 0, 0...
$ pteromalidae_sp                             <dbl> 0, 0, 0, 0, 0...
$ chrysopidae_sp                              <dbl> 0, 0, 0, 0, 0...
$ pamphagidae_sp                              <dbl> 0, 0, 0, 0, 0...
$ amphipsocidae_sp                            <dbl> 0, 0, 0, 0, 0...
$ diptera_sp                                  <dbl> 0, 0, 0, 0, 0...
$ hymenoptera_sp                              <dbl> 0, 0, 0, 0, 0...
$ decapotoma_lunata                           <dbl> 0, 0, 0, 0, 0...
$ pyrrhocordae_sp_1                           <dbl> 0, 0, 0, 0, 0...
$ mylabris_oculata                            <dbl> 0, 0, 0, 0, 0...

Now it is time to compute our species accumulation curve. To do this, we will use the poolaccum function from the vegan R package. Notice the first few columns are site description variables (i.e. not species abundances). We need to remove these columns, and only input the columns containing species abundances.

sac_raw <- sp_comm %>%
  # Remove site decsription variables 
  dplyr::select(-c(provinces, climatic_zones, site, season, haplotype)) %>%
  # Compute SAC
  vegan::poolaccum(.)

We now need to extract our observed species richness (S) estimates.

# Extract observed richness (S) estimate 
obs <- data.frame(summary(sac_raw)$S, check.names = FALSE)
colnames(obs) <- c("N", "S", "lower2.5", "higher97.5", "std")
head(obs)
  N     S lower2.5 higher97.5      std
1 3  7.17    0.000     15.000 3.787419
2 4  9.13    2.000     18.000 4.222726
3 5 10.87    2.475     18.525 4.464574
4 6 12.49    4.475     21.525 4.641697
5 7 14.38    6.000     22.525 4.442176
6 8 15.81    7.000     23.575 4.622508

Finally, we can plot the desired species accumulation curve. Ultimately, we would like to see our S estimate and the 95% confidence intervals reach an asymptote (flat line) on the y-axis. This would indicate that performing additional surveys is highly unlikely to yield additional insects on the plant we have surveyed.

obs %>%
  ggplot(data = ., aes(x = N,
                       y = S)) +
  # Add confidence intervals
  geom_ribbon(aes(ymin = lower2.5,
                  ymax = higher97.5),
              alpha = 0.5,
              colour = "gray70") +
  # Add observed richness line 
  geom_line() +
  labs(x = "No. of surveys",
       y = "Observed richness",
       subtitle = "More surveys are required to find all the insects on this plant")

Citation

For attribution, please cite this work as

Sutton (2020, Oct. 29). Guy Sutton: Plotting species accumulation curves using ggplot. Retrieved from https://guysutton.netlify.app/posts/2020-10-29-automated-plotting-of-species-accumulation-curves-by-group/

BibTeX citation

@misc{sutton2020plotting,
  author = {Sutton, Guy},
  title = {Guy Sutton: Plotting species accumulation curves using ggplot},
  url = {https://guysutton.netlify.app/posts/2020-10-29-automated-plotting-of-species-accumulation-curves-by-group/},
  year = {2020}
}