Species richness estimation
Source:vignettes/speciesRichness_example.Rmd
speciesRichness_example.Rmd
This is a basic workflow for feeding species occurrence data, a
group’s taxonomy, and a group’s country checklist into BeeBDC in order
to produce species richness estimates. Some of these functions are also
generic wrappers around SpadeR
and iNEXT
functions that can be used with any abundance data. These functions have
grown from an original publication.
Script preparation
Working directory
Choose the path to the root folder in which all other folders can be found.
RootPath <- paste0("/your/path/here")
# Create the working directory in the RootPath if it doesn't exist already
if (!dir.exists(paste0(RootPath, "/Data_acquisition_workflow"))) {
dir.create(paste0(RootPath, "/Data_acquisition_workflow"), recursive = TRUE)
}
# Set the working directory
setwd(paste0(RootPath, "/Data_acquisition_workflow"))
Install packages (if needed)
You may need to install some packages for using this workflow. In
particular, SpadeR
and iNEXT
are required.
They should prompt you to download them the first time that you run the
functions, however, let’s install them here and now.
install.packages("SpadeR")
install.packages("iNEXT")
Parallel estimations
We can start by looking at the relatively simple iNEXT
and SpadeR
wrapper functions that can take the usual inputs
for those functions in their host packages. (See
iNEXT::iNEXT()
and SpadeR::ChaoSpecies()
for
more info.) These functions can take your data and run multiple
sites/countries/whatever level you want to throw at them and run them in
parallel. This greatly simplifies the code needed to run them and also
makes its implementation MUCH faster!
iNEXTwrapper
BeeBDC::iNEXTwrapper()
parallelizes
iNEXT::iNEXT()
, which estimates species richness patterns
by extrapolating and interpolating Hill numbers. By and large this kinda
equates to estimating richness by rarefaction. We can start by reading
in an example dataset with 1,488
scientificName-country_suggested combinations across four
countries; Fiji, Uganda, Vietnam, and Zambia.
beesCountrySubset <- BeeBDC::beesCountrySubset
Next, let’s look at how we would usually use iNEXT… By first transforming our (very simple example) occurrence dataset into the right format and then running the function.
# Transform the data
transformedAbundance <- beesCountrySubset %>%
dplyr::group_by(scientificName, country_suggested) %>%
dplyr::count() %>%
dplyr::select(scientificName, country_suggested, n) %>%
tidyr::pivot_wider(names_from = country_suggested, values_from = n, values_fill = 0) %>%
# Create the rownames
tibble::column_to_rownames("scientificName") %>%
dplyr::tibble() %>%
as.data.frame()
# Run iNEXT
output_iNEXT <- iNEXT::iNEXT(transformedAbundance, datatype = "abundance")
We can also view the output of this function by running
output_iNEXT$AsyEst
Assemblage | Diversity | Observed | Estimator | s.e. | LCL | UCL |
---|---|---|---|---|---|---|
Fiji | Species richness | 17.000000 | 17.499483 | 1.4850098 | 17.000000 | 20.410049 |
Fiji | Shannon diversity | 4.712056 | 4.754101 | 0.2164397 | 4.329887 | 5.178315 |
Fiji | Simpson diversity | 2.653013 | 2.657561 | 0.1237356 | 2.415044 | 2.900078 |
Uganda | Species richness | 44.000000 | 70.243359 | 14.1752423 | 44.000000 | 98.026324 |
Uganda | Shannon diversity | 19.651483 | 26.633143 | 3.4844763 | 19.803695 | 33.462591 |
Uganda | Simpson diversity | 7.699248 | 8.128000 | 1.5689856 | 5.052845 | 11.203155 |
Vietnam | Species richness | 6.000000 | 13.794872 | 5.8924098 | 6.000000 | 25.343783 |
Vietnam | Shannon diversity | 1.953128 | 2.307552 | 0.5923478 | 1.146572 | 3.468532 |
Vietnam | Simpson diversity | 1.386509 | 1.400756 | 0.1986146 | 1.011479 | 1.790034 |
Zambia | Species richness | 129.000000 | 186.853965 | 17.7824798 | 152.000945 | 221.706985 |
Zambia | Shannon diversity | 77.832700 | 104.647375 | 7.6001195 | 89.751415 | 119.543336 |
Zambia | Simpson diversity | 32.753790 | 35.991359 | 6.3494595 | 23.546647 | 48.436071 |
The implementation of BeeBDC::iNEXTwrapper()
is also
relatively simple; including the implementation of running in parallel
(Note: Windows machines can’t use R’s parallel
functions). We can again modify the input data to work with the
function as below and we could leave most of the inputs to iNEXTwrapper
to the defaults or change them as we saw fit. The key variable is to
change mc.cores
to however many threads you’d liek to use
on your computer!
# Transform data
data_nextWrapper <- beesCountrySubset %>%
dplyr::group_by(scientificName, country_suggested) %>%
dplyr::count()
# Calculate iNEXT with the wrapper function
output_iNEXTwrapper <- BeeBDC::iNEXTwrapper(data = data_nextWrapper, variableColumn = "country_suggested",
valueColumn = "n", q = 0, datatype = "abundance", conf = 0.95, se = TRUE, nboot = 50,
size = NULL, endpoint = NULL, knots = 40, mc.cores = 1)
## - Outputs can be found in a list with two tibbles called 'DataInfo' and 'AsyEst' and a list of iNext outputs per groupVariable in iNextEst'.
country_suggested | statistic | Observed | Estimator | Est_s.e. | 95% Lower | 95% Upper |
---|---|---|---|---|---|---|
Fiji | Species Richness | 17.000000 | 17.499483 | 1.6650163 | 14.236111 | 20.762855 |
Fiji | Shannon diversity | 4.712056 | 4.754101 | 0.2271829 | 4.308831 | 5.199372 |
Fiji | Simpson diversity | 2.653013 | 2.657561 | 0.1264695 | 2.409685 | 2.905436 |
Uganda | Species Richness | 44.000000 | 70.243359 | 25.7541584 | 19.766137 | 120.720582 |
Uganda | Shannon diversity | 19.651483 | 26.633143 | 4.2573468 | 18.288896 | 34.977389 |
Uganda | Simpson diversity | 7.699248 | 8.128000 | 1.8631187 | 4.476354 | 11.779646 |
Vietnam | Species Richness | 6.000000 | 13.794872 | 6.1393736 | 1.761921 | 25.827823 |
Vietnam | Shannon diversity | 1.953128 | 2.307552 | 0.5633747 | 1.203358 | 3.411746 |
Vietnam | Simpson diversity | 1.386509 | 1.400756 | 0.1830418 | 1.042001 | 1.759511 |
Zambia | Species Richness | 129.000000 | 186.853965 | 18.1990609 | 151.184461 | 222.523469 |
Zambia | Shannon diversity | 77.832700 | 104.647375 | 6.8285323 | 91.263698 | 118.031053 |
Zambia | Simpson diversity | 32.753790 | 35.991359 | 5.5923079 | 25.030637 | 46.952081 |
ChaoWrapper
BeeBDC::ChaoWrapper()
parallelizes
SpadeR::ChaoSpecies()
, which estimates species richness
using various non-parametric estimators. The primary one that I tend to
use is iChao. Let’s use the same example dataset to first run the
SpadeR function.
# Transform data
data_iChao <- beesCountrySubset %>%
dplyr::group_by(scientificName, country_suggested) %>%
dplyr::count() %>%
dplyr::select(scientificName, country_suggested, n) %>%
tidyr::pivot_wider(names_from = country_suggested, values_from = n, values_fill = 0) %>%
# Create the rownames
tibble::column_to_rownames("scientificName") %>%
dplyr::tibble()
# Run ChaoSpecies for the country Fiji
output_iChao <- SpadeR::ChaoSpecies(data_iChao$Fiji, datatype = "abundance")
Let’s view the output of this function by running
output_iChao$Species_table
. Once again, you’ll be able to
see that there are actually quite a few species richness estimators
available here.
Estimate | s.e. | 95%Lower | 95%Upper | |
---|---|---|---|---|
Homogeneous Model | 17.182 | 0.474 | 17.011 | 20.015 |
Homogeneous (MLE) | 17.000 | 0.633 | 17.000 | 18.858 |
Chao1 (Chao, 1984) | 17.499 | 1.322 | 17.030 | 25.434 |
Chao1-bc | 17.000 | 0.633 | 17.000 | 18.858 |
iChao1 (Chiu et al. 2014) | 17.624 | 1.322 | 17.048 | 25.048 |
ACE (Chao & Lee, 1992) | 17.342 | 0.775 | 17.024 | 21.788 |
ACE-1 (Chao & Lee, 1992) | 17.366 | 0.835 | 17.026 | 22.168 |
1st order jackknife | 17.999 | 1.413 | 17.128 | 24.796 |
2nd order jackknife | 18.000 | 2.446 | 17.065 | 32.367 |
The implementation of the parallel wrapper is also quite simple for
BeeBDC::ChaoWrapper()
. And the number of cores can also be
changed using the mc.cores
argument. Note, that in this
case, we can run all sites (or countries) at once!
# Run the wrapper function
output_iChaowrapper <- BeeBDC::ChaoWrapper(data = data_iChao, datatype = "abundance",
k = 10, conf = 0.95, mc.cores = 1)
## - Outputs can be found in a list with two tibbles called 'basicTable' and 'diversityTable'.
View the output with
output_iChaowrapper$diversityTable
.
variable | Name | Estimate | s.e. | 95%Lower | 95%Upper |
---|---|---|---|---|---|
Zambia | Homogeneous Model | 159.936 | 8.471 | 147.263 | 181.402 |
Zambia | Homogeneous (MLE) | 140.234 | 3.957 | 134.748 | 150.959 |
Zambia | Chao1 (Chao, 1984) | 186.854 | 20.305 | 158.664 | 241.834 |
Zambia | Chao1-bc | 183.879 | 19.243 | 157.155 | 235.970 |
Zambia | iChao1 (Chiu et al. 2014) | 199.091 | 12.645 | 178.353 | 228.541 |
Zambia | ACE (Chao & Lee, 1992) | 183.240 | 15.921 | 159.876 | 224.286 |
Zambia | ACE-1 (Chao & Lee, 1992) | 197.138 | 22.684 | 165.093 | 257.634 |
Zambia | 1st order jackknife | 185.839 | 10.654 | 168.488 | 210.814 |
Zambia | 2nd order jackknife | 214.754 | 18.422 | 185.553 | 259.032 |
Uganda | Homogeneous Model | 59.952 | 7.062 | 50.961 | 80.554 |
Uganda | Homogeneous (MLE) | 47.113 | 2.032 | 44.969 | 54.004 |
Uganda | Chao1 (Chao, 1984) | 70.243 | 14.659 | 53.450 | 116.878 |
Uganda | Chao1-bc | 66.820 | 12.672 | 52.261 | 107.041 |
Uganda | iChao1 (Chiu et al. 2014) | 76.243 | 8.394 | 63.519 | 97.262 |
Uganda | ACE (Chao & Lee, 1992) | 78.927 | 16.764 | 58.308 | 129.256 |
Uganda | ACE-1 (Chao & Lee, 1992) | 95.184 | 30.929 | 61.149 | 196.767 |
Uganda | 1st order jackknife | 66.820 | 6.743 | 56.944 | 84.233 |
Uganda | 2nd order jackknife | 79.695 | 11.623 | 63.160 | 110.499 |
Fiji | Homogeneous Model | 17.182 | 0.474 | 17.011 | 20.015 |
Fiji | Homogeneous (MLE) | 17.000 | 0.633 | 17.000 | 18.858 |
Fiji | Chao1 (Chao, 1984) | 17.499 | 1.322 | 17.030 | 25.434 |
Fiji | Chao1-bc | 17.000 | 0.633 | 17.000 | 18.858 |
Fiji | iChao1 (Chiu et al. 2014) | 17.624 | 1.322 | 17.048 | 25.048 |
Fiji | ACE (Chao & Lee, 1992) | 17.342 | 0.775 | 17.024 | 21.788 |
Fiji | ACE-1 (Chao & Lee, 1992) | 17.366 | 0.835 | 17.026 | 22.168 |
Fiji | 1st order jackknife | 17.999 | 1.413 | 17.128 | 24.796 |
Fiji | 2nd order jackknife | 18.000 | 2.446 | 17.065 | 32.367 |
Vietnam | Homogeneous Model | 16.000 | 12.450 | 7.501 | 72.612 |
Vietnam | Homogeneous (MLE) | 6.009 | 0.096 | 6.000 | 6.644 |
Vietnam | Chao1 (Chao, 1984) | 13.795 | 11.372 | 6.961 | 69.218 |
Vietnam | Chao1-bc | 8.923 | 4.085 | 6.380 | 28.471 |
Vietnam | iChao1 (Chiu et al. 2014) | 13.795 | 11.372 | 6.961 | 69.218 |
Vietnam | ACE (Chao & Lee, 1992) | 16.000 | 12.450 | 7.501 | 72.612 |
Vietnam | ACE-1 (Chao & Lee, 1992) | 16.000 | 12.450 | 7.501 | 72.612 |
Vietnam | 1st order jackknife | 9.897 | 2.774 | 7.111 | 19.668 |
Vietnam | 2nd order jackknife | 12.769 | 4.734 | 7.965 | 29.318 |
Visualising
These are both easy and simple to use wrapper functions. However, the
complexity of the output data types can actually be a little bit
confusing to work with. Especially if you want to visualise those data.
So, we also provide the function,
BeeBDC::ggRichnessWrapper()
to display the results of both
wrapper functions by site. Not only does this functino save a plot that
visualises the data (below), it also outputs a summary of the
estimates.
plot_summary <- BeeBDC::ggRichnessWrapper(iNEXT_in = output_iNEXTwrapper, iChao_in = output_iChaowrapper,
nrow = 2, ncol = 2, labels = NULL, fileName = "speciesRichnessPlots", outPath = tempdir(),
base_width = 8.3, base_height = 11.7, dpi = 300)
## Loading required namespace: cowplot
Plots showing the rarefied iNext and non-parametric iChao estimates for four countries with 95% confidence intervals.
Note that Fiji’s diversity seems to be reaching asymptote, but the remaining country’s species richnesses are still climbing. Vietnam in particular has a huge amount of uncertainty with the iChao estimates (green) reaching over 60 species! We knew that this was a really small dataset (in fact we chose it as a test dataset because it is small), but it goes to show that if your data are inadequate you may not get a great answer. On the plus side, the 95% confidence intervals all overlap.
level | n | observedRichness | iNEXT_est | iNEXT_lower | iNEXT_upper | iNEXT_increasePercent | iChao_est | iChao_lower | iChao_upper | iChao_increasePercent |
---|---|---|---|---|---|---|---|---|---|---|
Fiji | 967 | 17 | 17 | 14 | 21 | 2.85 | 18 | 17 | 25 | 3.54 |
Uganda | 128 | 44 | 70 | 20 | 121 | 37.36 | 76 | 64 | 97 | 42.29 |
Vietnam | 39 | 6 | 14 | 2 | 26 | 56.51 | 14 | 7 | 69 | 56.51 |
Zambia | 354 | 129 | 187 | 151 | 223 | 30.96 | 199 | 178 | 229 | 35.21 |
You can see, that these functions are moving towards simplifying quick and mass-estimation of species richness across sites or countries! The is the idea behind these improvements. But, we take this further below by also attempting to sample known taxa that are otherwise missing from our datasets!
Estimating richness from the known unknowns
Seems like a bit of an odd title, no? Well, many taxa have country-level checklists. Even insect taxa can be well represented in global country-level checklists; for example, ants, butterflies, dragonflies, and bees. While for vertebrates the situation is even better. Country-level global checklists often incorporate knowledge that isn’t contianed in species occurrence datasets but that, perhaps, we can take advantage of.
Preparing the inputs
In our bee species richness estimates paper, we had 4,819 species (from a total of ~21,000 species) that had no occurrence records globally. We could have run the above species richness estimates using just the occurrences that we had, but we wanted to incorporate that knowledge in our estimates. To do so, we found the samples sizes from most-recent literature for 497 species and generated a curve (we also generated this curve using the global- and country-level occurrence datasets and found the former to match the literature curve well).

We can then extract the formula from this curve and use it to
randomly generate data for those missing species. These curves can be
built using your own data; see section 1.6 of the R code for the bee species richness estimates paper, which uses
ggplot2 and mosaic’s
fitModel
function to generate the curve function that goes
into BeeBDC::diversityPrepR()
below.
Let’s dig in… To start with, we will make an R object using
our occurrence data (beesCountrySubset), taxonomy (from
BeeBDC::beesTaxonomy()
or
BeeBDC::taxadbToBeeBDC()
), and country
checklist(BeeBDC::beesChecklist()
or a manually-made
checklist with the countries in the rNaturalEarth_name column
and species in the validName column; the latter must match the
remaining data — see BeeBDC::HarmoniseR()
). Feel free to
explore the input files or the output list of files for modifying your
own and pay attention to match the column names.
Note as well that the curveFunction
is the one generated
from the Literature curve above.
# Generate the R file
estimateData <- BeeBDC::diversityPrepR(
data = beesCountrySubset,
# Download the bee taxonomy. Download other taxonomies using BeeBDC::taxadbToBeeBDC()
taxonomyFile = BeeBDC::beesTaxonomy(),
# Download the bee country checklist. See notes above about making a checklist for other taxa
checklistFile = BeeBDC::beesChecklist(),
curveFunction = function(x) (228.7531 * x * x^-log(12.1593)),
sampleSize = 10000,
countryColumn = "country_suggested"
)
Making estimates
The output file from the above code is then used to feed into the
BeeBDC::richnessEstimator()
function which will repeatedly
sample the provided curve and estimate species richness (a user-defined
number of times) at the country (or site), continental, or
global level. This function can also be run in parallel and run at
whatever scale the user asks for. If globalSamples
,
continentSamples
, or countrySamples
are set to
zero, they will not be run. If they are set to >0, they will be
estimated that many times. Let’s run our estimates at the country
(or site) level ten times below, drawing sample sizes from the
non-occurrence species each time (capped at the maximum sample size
for each country).
estimates <- BeeBDC::richnessEstimateR(
data = estimateData,
sampleSize = 10000,
globalSamples = 0,
continentSamples = 0,
countrySamples = 10,
# Increase the number of cores to use R's parallel package and speed estimates up.
mc.cores = 1,
# Directory where to save files
outPath = tempdir(),
fileName = "Sampled.pdf"
)
We can look at the median outputs of our analysis with
estimates$Summary
.
name | scale | nChao | iChao_est | iChao_lower | iChao_upper | observedRichness | level | niNEXT | iNEXT_est | iNEXT_lower | iNEXT_upper | iNEXT_increasePercent | iNEXT_increase | iChao_increasePercent | iChao_increase |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fiji | Country | 1003.5 | 27.1220 | 23.5965 | 51.9615 | 23 | Country | 1003.5 | 26.37171 | 16.48158 | 36.19951 | 14.65960 | 3.371707 | 17.92174 | 4.1220 |
Vietnam | Country | 482.5 | 197.7020 | 161.0185 | 258.1285 | 104 | Country | 482.5 | 181.94422 | 131.48660 | 232.74189 | 74.94637 | 77.944224 | 90.09808 | 93.7020 |
Zambia | Country | 917.5 | 348.1945 | 317.1325 | 393.8205 | 226 | Country | 917.5 | 329.56588 | 282.64476 | 376.48700 | 45.82561 | 103.565880 | 54.06836 | 122.1945 |
Uganda | Country | 1098.5 | 411.0900 | 366.1295 | 471.9980 | 235 | Country | 1098.5 | 382.87058 | 318.61433 | 452.87441 | 62.92365 | 147.870577 | 74.93191 | 176.0900 |
Or, we can look at the outputs from each iteration together with
estimates$SiteOutput
.
variable | Name | Observed | Estimate | s.e. | 95%Lower | 95%Upper | n | groupNo |
---|---|---|---|---|---|---|---|---|
Fiji | iChao | NA | 23.25000 | 0.728000 | 23.01300 | 27.72900 | 1017 | 1 |
Fiji | iNEXT | 23 | 23.24975 | 2.549451 | 18.25292 | 28.24659 | 1017 | 1 |
Fiji | iNEXT | 23 | 23.49950 | 2.435377 | 18.72625 | 28.27276 | 1009 | 1 |
Fiji | iNEXT | 23 | 24.49848 | 3.452599 | 17.73151 | 31.26545 | 988 | 1 |
Fiji | iChao | NA | 23.50000 | 1.322000 | 23.03000 | 31.43500 | 1009 | 1 |
Fiji | iNEXT | 23 | 25.24786 | 5.060689 | 15.32909 | 35.16663 | 1050 | 1 |
Fiji | iChao | NA | 24.49800 | 2.289000 | 23.17400 | 35.87400 | 988 | 1 |
Fiji | iNEXT | 23 | 25.24780 | 5.565085 | 14.34043 | 36.15516 | 1022 | 1 |
Fiji | iNEXT | 23 | 27.49556 | 4.463501 | 18.74726 | 36.24386 | 1013 | 1 |
Fiji | iChao | NA | 25.74800 | 3.393000 | 23.41700 | 41.11900 | 1050 | 1 |
Fiji | iChao | NA | 25.24800 | 3.393000 | 23.26600 | 42.02200 | 1022 | 1 |
Fiji | iNEXT | 23 | 30.99194 | 7.889233 | 15.52933 | 46.45456 | 993 | 1 |
Fiji | iNEXT | 23 | 30.99198 | 7.991654 | 15.32863 | 46.65534 | 998 | 1 |
Fiji | iNEXT | 23 | 35.48728 | 9.211120 | 17.43382 | 53.54075 | 983 | 1 |
Fiji | iNEXT | 23 | 35.48721 | 10.732818 | 14.45127 | 56.52314 | 977 | 1 |
Fiji | iChao | NA | 28.49600 | 7.187000 | 23.77600 | 61.90100 | 1013 | 1 |
Fiji | iChao | NA | 32.49200 | 11.651000 | 24.45000 | 85.13100 | 993 | 1 |
Fiji | iChao | NA | 31.86700 | 11.651000 | 24.24500 | 86.13100 | 998 | 1 |
Fiji | iChao | NA | 38.11200 | 17.122000 | 25.54600 | 112.70900 | 977 | 1 |
Fiji | iChao | NA | 36.61200 | 17.122000 | 25.01800 | 114.82900 | 983 | 1 |
Vietnam | iNEXT | 104 | 145.91232 | 15.311836 | 115.90167 | 175.92297 | 479 | 1 |
Vietnam | iChao | NA | 154.58900 | 11.363000 | 136.75100 | 182.14400 | 479 | 1 |
Vietnam | iChao | NA | 156.79300 | 11.733000 | 138.33000 | 185.18600 | 501 | 1 |
Vietnam | iNEXT | 104 | 147.93594 | 19.559979 | 109.59908 | 186.27279 | 501 | 1 |
Vietnam | iNEXT | 104 | 169.98894 | 21.510476 | 127.82918 | 212.14870 | 486 | 1 |
Vietnam | iNEXT | 104 | 175.87346 | 24.813033 | 127.24081 | 224.50611 | 569 | 1 |
Vietnam | iChao | NA | 180.29800 | 19.720000 | 150.35000 | 229.59500 | 486 | 1 |
Vietnam | iNEXT | 104 | 179.37514 | 26.646434 | 127.14909 | 231.60119 | 385 | 1 |
Vietnam | iNEXT | 104 | 184.51331 | 25.188874 | 135.14402 | 233.88259 | 526 | 1 |
Vietnam | iNEXT | 104 | 186.12437 | 25.859664 | 135.44036 | 236.80838 | 510 | 1 |
Vietnam | iChao | NA | 185.87300 | 21.771000 | 153.05200 | 240.65500 | 569 | 1 |
Vietnam | iChao | NA | 195.91000 | 22.187000 | 161.64900 | 250.53400 | 385 | 1 |
Vietnam | iNEXT | 104 | 196.66063 | 27.767477 | 142.23737 | 251.08388 | 400 | 1 |
Vietnam | iChao | NA | 199.49400 | 26.139000 | 160.38800 | 265.72300 | 510 | 1 |
Vietnam | iChao | NA | 201.67000 | 26.579000 | 161.84200 | 268.92000 | 526 | 1 |
Vietnam | iNEXT | 104 | 215.71356 | 32.516129 | 151.98311 | 279.44400 | 391 | 1 |
Vietnam | iChao | NA | 208.94600 | 29.292000 | 165.35100 | 283.51900 | 400 | 1 |
Vietnam | iNEXT | 104 | 215.88745 | 37.507542 | 142.37402 | 289.40088 | 421 | 1 |
Vietnam | iChao | NA | 230.82100 | 34.828000 | 178.75900 | 319.13800 | 391 | 1 |
Vietnam | iChao | NA | 242.83200 | 33.249000 | 191.39200 | 324.55000 | 421 | 1 |
Zambia | iNEXT | 226 | 312.16971 | 23.589693 | 265.93476 | 358.40466 | 951 | 1 |
Zambia | iNEXT | 226 | 319.93159 | 22.469790 | 275.89161 | 363.97157 | 861 | 1 |
Zambia | iChao | NA | 330.91700 | 14.819000 | 305.65400 | 364.19200 | 951 | 1 |
Zambia | iNEXT | 226 | 321.92649 | 22.090906 | 278.62911 | 365.22388 | 821 | 1 |
Zambia | iNEXT | 226 | 319.93371 | 24.380049 | 272.14969 | 367.71773 | 878 | 1 |
Zambia | iNEXT | 226 | 326.65971 | 22.303031 | 282.94657 | 370.37285 | 986 | 1 |
Zambia | iChao | NA | 338.92000 | 16.091000 | 311.52300 | 375.09400 | 861 | 1 |
Zambia | iChao | NA | 341.19300 | 15.621000 | 314.41400 | 376.08300 | 878 | 1 |
Zambia | iChao | NA | 345.34400 | 15.223000 | 319.03800 | 379.08700 | 821 | 1 |
Zambia | iNEXT | 226 | 332.47205 | 25.576539 | 282.34295 | 382.60114 | 997 | 1 |
Zambia | iNEXT | 226 | 335.00782 | 25.681406 | 284.67319 | 385.34245 | 849 | 1 |
Uganda | iNEXT | 235 | 342.06599 | 22.368632 | 298.22427 | 385.90770 | 1107 | 1 |
Zambia | iChao | NA | 346.16700 | 18.690000 | 314.75300 | 388.69800 | 986 | 1 |
Zambia | iNEXT | 226 | 340.61584 | 25.229873 | 291.16620 | 390.06548 | 896 | 1 |
Zambia | iNEXT | 226 | 342.14367 | 26.143935 | 290.90250 | 393.38484 | 973 | 1 |
Zambia | iChao | NA | 350.22200 | 21.122000 | 315.22700 | 398.94300 | 997 | 1 |
Zambia | iChao | NA | 357.41600 | 19.169000 | 324.88700 | 400.64500 | 849 | 1 |
Uganda | iNEXT | 235 | 356.67916 | 26.712735 | 304.32316 | 409.03516 | 1026 | 1 |
Zambia | iNEXT | 226 | 350.26996 | 30.139969 | 291.19670 | 409.34321 | 939 | 1 |
Uganda | iChao | NA | 366.80900 | 19.038000 | 334.45700 | 409.68600 | 1107 | 1 |
Zambia | iChao | NA | 365.99400 | 20.552000 | 331.15200 | 412.38200 | 896 | 1 |
Zambia | iChao | NA | 363.38800 | 22.491000 | 325.89000 | 414.96200 | 973 | 1 |
Uganda | iNEXT | 235 | 364.91956 | 26.161785 | 313.64340 | 416.19572 | 997 | 1 |
Uganda | iNEXT | 235 | 362.40276 | 29.372156 | 304.83440 | 419.97113 | 1162 | 1 |
Zambia | iChao | NA | 376.90000 | 22.267000 | 339.17800 | 427.19400 | 939 | 1 |
Uganda | iChao | NA | 383.92800 | 20.747000 | 348.49200 | 430.42700 | 1026 | 1 |
Uganda | iNEXT | 235 | 371.87315 | 35.774710 | 301.75601 | 441.99029 | 906 | 1 |
Uganda | iChao | NA | 387.66800 | 23.857000 | 347.59900 | 441.99600 | 1162 | 1 |
Uganda | iChao | NA | 395.50100 | 22.771000 | 356.70600 | 446.66200 | 997 | 1 |
Uganda | iChao | NA | 406.06200 | 21.653000 | 368.60800 | 454.01400 | 906 | 1 |
Uganda | iNEXT | 235 | 400.07425 | 32.492573 | 336.38997 | 463.75852 | 1026 | 1 |
Uganda | iNEXT | 235 | 393.86800 | 35.859201 | 323.58526 | 464.15075 | 1090 | 1 |
Uganda | iChao | NA | 416.11800 | 31.850000 | 363.65100 | 489.98200 | 1090 | 1 |
Uganda | iNEXT | 235 | 419.30806 | 41.499383 | 337.97076 | 500.64535 | 1135 | 1 |
Uganda | iChao | NA | 426.76700 | 33.473000 | 371.55500 | 504.30200 | 1026 | 1 |
Uganda | iNEXT | 235 | 434.33604 | 35.958948 | 363.85780 | 504.81429 | 1111 | 1 |
Uganda | iNEXT | 235 | 432.85124 | 41.005866 | 352.48122 | 513.22126 | 1197 | 1 |
Uganda | iChao | NA | 461.50300 | 35.548000 | 401.83800 | 542.50400 | 1135 | 1 |
Uganda | iChao | NA | 482.28900 | 36.184000 | 420.91400 | 563.92400 | 1197 | 1 |
Uganda | iChao | NA | 478.76200 | 39.108000 | 413.34800 | 568.16700 | 1111 | 1 |
Visualising estimates
Of course, these values are super helpful for papers on these topics. But similarly, we can easily make some further useful visualisations!
# To build the manual legend, make a small fake dataset
legendData <- dplyr::tibble(name = c("yes","yes"),
statistic = c("iChao", "iNEXT") %>%
factor(levels = c("iChao", "iNEXT")),
est = c(1,1))
# Build a legend manually
violinLegend <- ggplot2::ggplot(legendData, ggplot2::aes(x = name, y = est)) +
ggplot2::geom_point(data = legendData, ggplot2::aes(y=est, x = name, colour = "red")) +
scale_color_manual(labels = c('Observed'), values = c('grey30')) +
ggplot2::geom_bar(aes(fill = statistic, y = est), position = position_dodge(0.90),
stat = "identity") +
ggplot2::scale_fill_manual(name = "Statistic",
labels = c("iChao", "iNEXT"),
values = c("iChao" = "#55AD9B", "iNEXT" = "#FD9B63")) +
ggplot2::theme_classic() +
theme(legend.title = element_blank(), legend.position = 'right',
legend.margin = margin(0, 0, 0, 0), legend.spacing.y = unit(0, "pt")) +
ggplot2::guides(fill = ggplot2::guide_legend(ncol = 1, byrow = TRUE, reverse = TRUE, order = 1))
# plot the countries
violinPlot <- ggplot2::ggplot(estimates$SiteOutput, ggplot2::aes(x = name, y = est)) +
ggplot2::geom_violin(position="dodge", alpha=0.5, ggplot2::aes(fill=Name,
y=`95%Lower`, x=variable),
colour = NA) +
ggplot2::geom_violin(position="dodge", alpha=0.5, ggplot2::aes(fill=Name, y=`95%Upper`,
x=variable), colour = NA) +
ggplot2::geom_violin(position="dodge", alpha=1, ggplot2::aes(fill=Name, y=Estimate,
x=variable), colour = "black") +
ggplot2::scale_fill_manual(values=c("#55AD9B", "#FD9B63")) +
ggplot2::geom_point(data = estimates$SiteOutput %>%
dplyr::distinct(variable, Observed) %>%
tidyr::drop_na(),
ggplot2::aes(x = variable, y = Observed), col = "grey40") +
ggplot2::theme_classic() +
ggplot2::xlab("Continent") +
ggplot2::ylab("Species estimate") +
guides(fill=guide_legend(title="Statistic")) +
ggplot2::theme_classic() +
ggplot2::theme(legend.position = "none",
axis.text.x = ggplot2::element_text(angle = 60, vjust = 1, hjust=1)) +
ggplot2::xlab(c( "")) + ggplot2::ylab(c("Species")) +
ggplot2::annotation_custom(cowplot::ggdraw(cowplot::get_legend(violinLegend)) %>%
ggplot2::ggplotGrob(), xmin = 1, xmax = 1,
ymin = 350, ymax = 500)
## Tip: You can wrap the entire ggplot2 chunk of code in brackets () to print at the same time
# Save the plot
cowplot::save_plot(filename = paste0(tempdir(), "/violinPlot.pdf"),
plot = violinPlot,
base_width = 10,
base_height = 7)
Plots showing the rarefied iNext and non-parametric iChao estimates for four countries with 95% confidence intervals.
There are many ways to plot or analyse these kinds of data. The rest I will leave up to you! But, for more ideas, do feel free to check out the original publication for this workflow as well as the original R workflow itself.