Skip to contents

BeeBDC logo of a cuckoo bee sweeping up occurrence records in South America

This is a basic workflow for feeding species occurrence data, a group’s taxonomy, and a group’s country checklist into BeeBDC in order to produce species richness estimates. Some of these functions are also generic wrappers around SpadeR and iNEXT functions that can be used with any abundance data. These functions have grown from an original publication.

Script preparation

Working directory

Choose the path to the root folder in which all other folders can be found.

RootPath <- paste0("/your/path/here")
# Create the working directory in the RootPath if it doesn't exist already
if (!dir.exists(paste0(RootPath, "/Data_acquisition_workflow"))) {
    dir.create(paste0(RootPath, "/Data_acquisition_workflow"), recursive = TRUE)
}
# Set the working directory
setwd(paste0(RootPath, "/Data_acquisition_workflow"))

Install packages (if needed)

You may need to install some packages for using this workflow. In particular, SpadeR and iNEXT are required. They should prompt you to download them the first time that you run the functions, however, let’s install them here and now.

Parallel estimations

We can start by looking at the relatively simple iNEXT and SpadeR wrapper functions that can take the usual inputs for those functions in their host packages. (See iNEXT::iNEXT() and SpadeR::ChaoSpecies() for more info.) These functions can take your data and run multiple sites/countries/whatever level you want to throw at them and run them in parallel. This greatly simplifies the code needed to run them and also makes its implementation MUCH faster!

iNEXTwrapper

BeeBDC::iNEXTwrapper() parallelizes iNEXT::iNEXT(), which estimates species richness patterns by extrapolating and interpolating Hill numbers. By and large this kinda equates to estimating richness by rarefaction. We can start by reading in an example dataset with 1,488 scientificName-country_suggested combinations across four countries; Fiji, Uganda, Vietnam, and Zambia.

beesCountrySubset <- BeeBDC::beesCountrySubset

Next, let’s look at how we would usually use iNEXT… By first transforming our (very simple example) occurrence dataset into the right format and then running the function.

# Transform the data
transformedAbundance <- beesCountrySubset %>%
    dplyr::group_by(scientificName, country_suggested) %>%
    dplyr::count() %>%
    dplyr::select(scientificName, country_suggested, n) %>%
    tidyr::pivot_wider(names_from = country_suggested, values_from = n, values_fill = 0) %>%
    # Create the rownames
tibble::column_to_rownames("scientificName") %>%
    dplyr::tibble() %>%
    as.data.frame()
# Run iNEXT
output_iNEXT <- iNEXT::iNEXT(transformedAbundance, datatype = "abundance")

We can also view the output of this function by running output_iNEXT$AsyEst

Assemblage Diversity Observed Estimator s.e. LCL UCL
Fiji Species richness 17.000000 17.499483 1.4850098 17.000000 20.410049
Fiji Shannon diversity 4.712056 4.754101 0.2164397 4.329887 5.178315
Fiji Simpson diversity 2.653013 2.657561 0.1237356 2.415044 2.900078
Uganda Species richness 44.000000 70.243359 14.1752423 44.000000 98.026324
Uganda Shannon diversity 19.651483 26.633143 3.4844763 19.803695 33.462591
Uganda Simpson diversity 7.699248 8.128000 1.5689856 5.052845 11.203155
Vietnam Species richness 6.000000 13.794872 5.8924098 6.000000 25.343783
Vietnam Shannon diversity 1.953128 2.307552 0.5923478 1.146572 3.468532
Vietnam Simpson diversity 1.386509 1.400756 0.1986146 1.011479 1.790034
Zambia Species richness 129.000000 186.853965 17.7824798 152.000945 221.706985
Zambia Shannon diversity 77.832700 104.647375 7.6001195 89.751415 119.543336
Zambia Simpson diversity 32.753790 35.991359 6.3494595 23.546647 48.436071

The implementation of BeeBDC::iNEXTwrapper() is also relatively simple; including the implementation of running in parallel (Note: Windows machines can’t use R’s parallel functions). We can again modify the input data to work with the function as below and we could leave most of the inputs to iNEXTwrapper to the defaults or change them as we saw fit. The key variable is to change mc.cores to however many threads you’d liek to use on your computer!

# Transform data
data_nextWrapper <- beesCountrySubset %>%
    dplyr::group_by(scientificName, country_suggested) %>%
    dplyr::count()

# Calculate iNEXT with the wrapper function
output_iNEXTwrapper <- BeeBDC::iNEXTwrapper(data = data_nextWrapper, variableColumn = "country_suggested",
    valueColumn = "n", q = 0, datatype = "abundance", conf = 0.95, se = TRUE, nboot = 50,
    size = NULL, endpoint = NULL, knots = 40, mc.cores = 1)
##  - Outputs can be found in a list with two tibbles called 'DataInfo' and 'AsyEst' and a list of iNext outputs per groupVariable in iNextEst'.
country_suggested statistic Observed Estimator Est_s.e. 95% Lower 95% Upper
Fiji Species Richness 17.000000 17.499483 1.6650163 14.236111 20.762855
Fiji Shannon diversity 4.712056 4.754101 0.2271829 4.308831 5.199372
Fiji Simpson diversity 2.653013 2.657561 0.1264695 2.409685 2.905436
Uganda Species Richness 44.000000 70.243359 25.7541584 19.766137 120.720582
Uganda Shannon diversity 19.651483 26.633143 4.2573468 18.288896 34.977389
Uganda Simpson diversity 7.699248 8.128000 1.8631187 4.476354 11.779646
Vietnam Species Richness 6.000000 13.794872 6.1393736 1.761921 25.827823
Vietnam Shannon diversity 1.953128 2.307552 0.5633747 1.203358 3.411746
Vietnam Simpson diversity 1.386509 1.400756 0.1830418 1.042001 1.759511
Zambia Species Richness 129.000000 186.853965 18.1990609 151.184461 222.523469
Zambia Shannon diversity 77.832700 104.647375 6.8285323 91.263698 118.031053
Zambia Simpson diversity 32.753790 35.991359 5.5923079 25.030637 46.952081

ChaoWrapper

BeeBDC::ChaoWrapper() parallelizes SpadeR::ChaoSpecies(), which estimates species richness using various non-parametric estimators. The primary one that I tend to use is iChao. Let’s use the same example dataset to first run the SpadeR function.

# Transform data
data_iChao <- beesCountrySubset %>%
    dplyr::group_by(scientificName, country_suggested) %>%
    dplyr::count() %>%
    dplyr::select(scientificName, country_suggested, n) %>%
    tidyr::pivot_wider(names_from = country_suggested, values_from = n, values_fill = 0) %>%
    # Create the rownames
tibble::column_to_rownames("scientificName") %>%
    dplyr::tibble()

# Run ChaoSpecies for the country Fiji
output_iChao <- SpadeR::ChaoSpecies(data_iChao$Fiji, datatype = "abundance")

Let’s view the output of this function by running output_iChao$Species_table. Once again, you’ll be able to see that there are actually quite a few species richness estimators available here.

Estimate s.e. 95%Lower 95%Upper
Homogeneous Model 17.182 0.474 17.011 20.015
Homogeneous (MLE) 17.000 0.633 17.000 18.858
Chao1 (Chao, 1984) 17.499 1.322 17.030 25.434
Chao1-bc 17.000 0.633 17.000 18.858
iChao1 (Chiu et al. 2014) 17.624 1.322 17.048 25.048
ACE (Chao & Lee, 1992) 17.342 0.775 17.024 21.788
ACE-1 (Chao & Lee, 1992) 17.366 0.835 17.026 22.168
1st order jackknife 17.999 1.413 17.128 24.796
2nd order jackknife 18.000 2.446 17.065 32.367

The implementation of the parallel wrapper is also quite simple for BeeBDC::ChaoWrapper(). And the number of cores can also be changed using the mc.cores argument. Note, that in this case, we can run all sites (or countries) at once!

# Run the wrapper function
output_iChaowrapper <- BeeBDC::ChaoWrapper(data = data_iChao, datatype = "abundance",
    k = 10, conf = 0.95, mc.cores = 1)
##  - Outputs can be found in a list with two tibbles called 'basicTable' and 'diversityTable'.

View the output with output_iChaowrapper$diversityTable.

variable Name Estimate s.e. 95%Lower 95%Upper
Zambia Homogeneous Model 159.936 8.471 147.263 181.402
Zambia Homogeneous (MLE) 140.234 3.957 134.748 150.959
Zambia Chao1 (Chao, 1984) 186.854 20.305 158.664 241.834
Zambia Chao1-bc 183.879 19.243 157.155 235.970
Zambia iChao1 (Chiu et al. 2014) 199.091 12.645 178.353 228.541
Zambia ACE (Chao & Lee, 1992) 183.240 15.921 159.876 224.286
Zambia ACE-1 (Chao & Lee, 1992) 197.138 22.684 165.093 257.634
Zambia 1st order jackknife 185.839 10.654 168.488 210.814
Zambia 2nd order jackknife 214.754 18.422 185.553 259.032
Uganda Homogeneous Model 59.952 7.062 50.961 80.554
Uganda Homogeneous (MLE) 47.113 2.032 44.969 54.004
Uganda Chao1 (Chao, 1984) 70.243 14.659 53.450 116.878
Uganda Chao1-bc 66.820 12.672 52.261 107.041
Uganda iChao1 (Chiu et al. 2014) 76.243 8.394 63.519 97.262
Uganda ACE (Chao & Lee, 1992) 78.927 16.764 58.308 129.256
Uganda ACE-1 (Chao & Lee, 1992) 95.184 30.929 61.149 196.767
Uganda 1st order jackknife 66.820 6.743 56.944 84.233
Uganda 2nd order jackknife 79.695 11.623 63.160 110.499
Fiji Homogeneous Model 17.182 0.474 17.011 20.015
Fiji Homogeneous (MLE) 17.000 0.633 17.000 18.858
Fiji Chao1 (Chao, 1984) 17.499 1.322 17.030 25.434
Fiji Chao1-bc 17.000 0.633 17.000 18.858
Fiji iChao1 (Chiu et al. 2014) 17.624 1.322 17.048 25.048
Fiji ACE (Chao & Lee, 1992) 17.342 0.775 17.024 21.788
Fiji ACE-1 (Chao & Lee, 1992) 17.366 0.835 17.026 22.168
Fiji 1st order jackknife 17.999 1.413 17.128 24.796
Fiji 2nd order jackknife 18.000 2.446 17.065 32.367
Vietnam Homogeneous Model 16.000 12.450 7.501 72.612
Vietnam Homogeneous (MLE) 6.009 0.096 6.000 6.644
Vietnam Chao1 (Chao, 1984) 13.795 11.372 6.961 69.218
Vietnam Chao1-bc 8.923 4.085 6.380 28.471
Vietnam iChao1 (Chiu et al. 2014) 13.795 11.372 6.961 69.218
Vietnam ACE (Chao & Lee, 1992) 16.000 12.450 7.501 72.612
Vietnam ACE-1 (Chao & Lee, 1992) 16.000 12.450 7.501 72.612
Vietnam 1st order jackknife 9.897 2.774 7.111 19.668
Vietnam 2nd order jackknife 12.769 4.734 7.965 29.318

Visualising

These are both easy and simple to use wrapper functions. However, the complexity of the output data types can actually be a little bit confusing to work with. Especially if you want to visualise those data. So, we also provide the function, BeeBDC::ggRichnessWrapper() to display the results of both wrapper functions by site. Not only does this functino save a plot that visualises the data (below), it also outputs a summary of the estimates.

plot_summary <- BeeBDC::ggRichnessWrapper(iNEXT_in = output_iNEXTwrapper, iChao_in = output_iChaowrapper,
    nrow = 2, ncol = 2, labels = NULL, fileName = "speciesRichnessPlots", outPath = tempdir(),
    base_width = 8.3, base_height = 11.7, dpi = 300)
## Loading required namespace: cowplot

Plots showing the rarefied iNext and non-parametric iChao estimates for four countries with 95% confidence intervals.

Note that Fiji’s diversity seems to be reaching asymptote, but the remaining country’s species richnesses are still climbing. Vietnam in particular has a huge amount of uncertainty with the iChao estimates (green) reaching over 60 species! We knew that this was a really small dataset (in fact we chose it as a test dataset because it is small), but it goes to show that if your data are inadequate you may not get a great answer. On the plus side, the 95% confidence intervals all overlap.

level n observedRichness iNEXT_est iNEXT_lower iNEXT_upper iNEXT_increasePercent iChao_est iChao_lower iChao_upper iChao_increasePercent
Fiji 967 17 17 14 21 2.85 18 17 25 3.54
Uganda 128 44 70 20 121 37.36 76 64 97 42.29
Vietnam 39 6 14 2 26 56.51 14 7 69 56.51
Zambia 354 129 187 151 223 30.96 199 178 229 35.21

You can see, that these functions are moving towards simplifying quick and mass-estimation of species richness across sites or countries! The is the idea behind these improvements. But, we take this further below by also attempting to sample known taxa that are otherwise missing from our datasets!

Estimating richness from the known unknowns

Seems like a bit of an odd title, no? Well, many taxa have country-level checklists. Even insect taxa can be well represented in global country-level checklists; for example, ants, butterflies, dragonflies, and bees. While for vertebrates the situation is even better. Country-level global checklists often incorporate knowledge that isn’t contianed in species occurrence datasets but that, perhaps, we can take advantage of.

Preparing the inputs

In our bee species richness estimates paper, we had 4,819 species (from a total of ~21,000 species) that had no occurrence records globally. We could have run the above species richness estimates using just the occurrences that we had, but we wanted to incorporate that knowledge in our estimates. To do so, we found the samples sizes from most-recent literature for 497 species and generated a curve (we also generated this curve using the global- and country-level occurrence datasets and found the former to match the literature curve well).

Curves generated using 497 literature species (yellow), the global occurrence records (light blue), and occurrences at the country-level (dark blue).
Curves generated using 497 literature species (yellow), the global occurrence records (light blue), and occurrences at the country-level (dark blue).

We can then extract the formula from this curve and use it to randomly generate data for those missing species. These curves can be built using your own data; see section 1.6 of the R code for the bee species richness estimates paper, which uses ggplot2 and mosaic’s fitModel function to generate the curve function that goes into BeeBDC::diversityPrepR() below.

Let’s dig in… To start with, we will make an R object using our occurrence data (beesCountrySubset), taxonomy (from BeeBDC::beesTaxonomy() or BeeBDC::taxadbToBeeBDC()), and country checklist(BeeBDC::beesChecklist() or a manually-made checklist with the countries in the rNaturalEarth_name column and species in the validName column; the latter must match the remaining data — see BeeBDC::HarmoniseR()). Feel free to explore the input files or the output list of files for modifying your own and pay attention to match the column names.

Note as well that the curveFunction is the one generated from the Literature curve above.

  # Generate the R file
estimateData <- BeeBDC::diversityPrepR(
  data = beesCountrySubset,
    # Download the bee taxonomy. Download other taxonomies using BeeBDC::taxadbToBeeBDC()
  taxonomyFile = BeeBDC::beesTaxonomy(),
    # Download the bee country checklist. See notes above about making a checklist for other taxa
  checklistFile = BeeBDC::beesChecklist(),
  curveFunction = function(x) (228.7531 * x * x^-log(12.1593)),
  sampleSize = 10000,
  countryColumn = "country_suggested"
)

Making estimates

The output file from the above code is then used to feed into the BeeBDC::richnessEstimator() function which will repeatedly sample the provided curve and estimate species richness (a user-defined number of times) at the country (or site), continental, or global level. This function can also be run in parallel and run at whatever scale the user asks for. If globalSamples, continentSamples, or countrySamples are set to zero, they will not be run. If they are set to >0, they will be estimated that many times. Let’s run our estimates at the country (or site) level ten times below, drawing sample sizes from the non-occurrence species each time (capped at the maximum sample size for each country).

estimates <- BeeBDC::richnessEstimateR(
  data = estimateData,
  sampleSize = 10000,
  globalSamples = 0,
  continentSamples = 0,
  countrySamples = 10,
    # Increase the number of cores to use R's parallel package and speed estimates up.
  mc.cores = 1,
    # Directory where to save files
  outPath = tempdir(),
  fileName = "Sampled.pdf"
)

We can look at the median outputs of our analysis with estimates$Summary.

name scale nChao iChao_est iChao_lower iChao_upper observedRichness level niNEXT iNEXT_est iNEXT_lower iNEXT_upper iNEXT_increasePercent iNEXT_increase iChao_increasePercent iChao_increase
Fiji Country 1003.5 27.1220 23.5965 51.9615 23 Country 1003.5 26.37171 16.48158 36.19951 14.65960 3.371707 17.92174 4.1220
Vietnam Country 482.5 197.7020 161.0185 258.1285 104 Country 482.5 181.94422 131.48660 232.74189 74.94637 77.944224 90.09808 93.7020
Zambia Country 917.5 348.1945 317.1325 393.8205 226 Country 917.5 329.56588 282.64476 376.48700 45.82561 103.565880 54.06836 122.1945
Uganda Country 1098.5 411.0900 366.1295 471.9980 235 Country 1098.5 382.87058 318.61433 452.87441 62.92365 147.870577 74.93191 176.0900

Or, we can look at the outputs from each iteration together with estimates$SiteOutput.

variable Name Observed Estimate s.e. 95%Lower 95%Upper n groupNo
Fiji iChao NA 23.25000 0.728000 23.01300 27.72900 1017 1
Fiji iNEXT 23 23.24975 2.549451 18.25292 28.24659 1017 1
Fiji iNEXT 23 23.49950 2.435377 18.72625 28.27276 1009 1
Fiji iNEXT 23 24.49848 3.452599 17.73151 31.26545 988 1
Fiji iChao NA 23.50000 1.322000 23.03000 31.43500 1009 1
Fiji iNEXT 23 25.24786 5.060689 15.32909 35.16663 1050 1
Fiji iChao NA 24.49800 2.289000 23.17400 35.87400 988 1
Fiji iNEXT 23 25.24780 5.565085 14.34043 36.15516 1022 1
Fiji iNEXT 23 27.49556 4.463501 18.74726 36.24386 1013 1
Fiji iChao NA 25.74800 3.393000 23.41700 41.11900 1050 1
Fiji iChao NA 25.24800 3.393000 23.26600 42.02200 1022 1
Fiji iNEXT 23 30.99194 7.889233 15.52933 46.45456 993 1
Fiji iNEXT 23 30.99198 7.991654 15.32863 46.65534 998 1
Fiji iNEXT 23 35.48728 9.211120 17.43382 53.54075 983 1
Fiji iNEXT 23 35.48721 10.732818 14.45127 56.52314 977 1
Fiji iChao NA 28.49600 7.187000 23.77600 61.90100 1013 1
Fiji iChao NA 32.49200 11.651000 24.45000 85.13100 993 1
Fiji iChao NA 31.86700 11.651000 24.24500 86.13100 998 1
Fiji iChao NA 38.11200 17.122000 25.54600 112.70900 977 1
Fiji iChao NA 36.61200 17.122000 25.01800 114.82900 983 1
Vietnam iNEXT 104 145.91232 15.311836 115.90167 175.92297 479 1
Vietnam iChao NA 154.58900 11.363000 136.75100 182.14400 479 1
Vietnam iChao NA 156.79300 11.733000 138.33000 185.18600 501 1
Vietnam iNEXT 104 147.93594 19.559979 109.59908 186.27279 501 1
Vietnam iNEXT 104 169.98894 21.510476 127.82918 212.14870 486 1
Vietnam iNEXT 104 175.87346 24.813033 127.24081 224.50611 569 1
Vietnam iChao NA 180.29800 19.720000 150.35000 229.59500 486 1
Vietnam iNEXT 104 179.37514 26.646434 127.14909 231.60119 385 1
Vietnam iNEXT 104 184.51331 25.188874 135.14402 233.88259 526 1
Vietnam iNEXT 104 186.12437 25.859664 135.44036 236.80838 510 1
Vietnam iChao NA 185.87300 21.771000 153.05200 240.65500 569 1
Vietnam iChao NA 195.91000 22.187000 161.64900 250.53400 385 1
Vietnam iNEXT 104 196.66063 27.767477 142.23737 251.08388 400 1
Vietnam iChao NA 199.49400 26.139000 160.38800 265.72300 510 1
Vietnam iChao NA 201.67000 26.579000 161.84200 268.92000 526 1
Vietnam iNEXT 104 215.71356 32.516129 151.98311 279.44400 391 1
Vietnam iChao NA 208.94600 29.292000 165.35100 283.51900 400 1
Vietnam iNEXT 104 215.88745 37.507542 142.37402 289.40088 421 1
Vietnam iChao NA 230.82100 34.828000 178.75900 319.13800 391 1
Vietnam iChao NA 242.83200 33.249000 191.39200 324.55000 421 1
Zambia iNEXT 226 312.16971 23.589693 265.93476 358.40466 951 1
Zambia iNEXT 226 319.93159 22.469790 275.89161 363.97157 861 1
Zambia iChao NA 330.91700 14.819000 305.65400 364.19200 951 1
Zambia iNEXT 226 321.92649 22.090906 278.62911 365.22388 821 1
Zambia iNEXT 226 319.93371 24.380049 272.14969 367.71773 878 1
Zambia iNEXT 226 326.65971 22.303031 282.94657 370.37285 986 1
Zambia iChao NA 338.92000 16.091000 311.52300 375.09400 861 1
Zambia iChao NA 341.19300 15.621000 314.41400 376.08300 878 1
Zambia iChao NA 345.34400 15.223000 319.03800 379.08700 821 1
Zambia iNEXT 226 332.47205 25.576539 282.34295 382.60114 997 1
Zambia iNEXT 226 335.00782 25.681406 284.67319 385.34245 849 1
Uganda iNEXT 235 342.06599 22.368632 298.22427 385.90770 1107 1
Zambia iChao NA 346.16700 18.690000 314.75300 388.69800 986 1
Zambia iNEXT 226 340.61584 25.229873 291.16620 390.06548 896 1
Zambia iNEXT 226 342.14367 26.143935 290.90250 393.38484 973 1
Zambia iChao NA 350.22200 21.122000 315.22700 398.94300 997 1
Zambia iChao NA 357.41600 19.169000 324.88700 400.64500 849 1
Uganda iNEXT 235 356.67916 26.712735 304.32316 409.03516 1026 1
Zambia iNEXT 226 350.26996 30.139969 291.19670 409.34321 939 1
Uganda iChao NA 366.80900 19.038000 334.45700 409.68600 1107 1
Zambia iChao NA 365.99400 20.552000 331.15200 412.38200 896 1
Zambia iChao NA 363.38800 22.491000 325.89000 414.96200 973 1
Uganda iNEXT 235 364.91956 26.161785 313.64340 416.19572 997 1
Uganda iNEXT 235 362.40276 29.372156 304.83440 419.97113 1162 1
Zambia iChao NA 376.90000 22.267000 339.17800 427.19400 939 1
Uganda iChao NA 383.92800 20.747000 348.49200 430.42700 1026 1
Uganda iNEXT 235 371.87315 35.774710 301.75601 441.99029 906 1
Uganda iChao NA 387.66800 23.857000 347.59900 441.99600 1162 1
Uganda iChao NA 395.50100 22.771000 356.70600 446.66200 997 1
Uganda iChao NA 406.06200 21.653000 368.60800 454.01400 906 1
Uganda iNEXT 235 400.07425 32.492573 336.38997 463.75852 1026 1
Uganda iNEXT 235 393.86800 35.859201 323.58526 464.15075 1090 1
Uganda iChao NA 416.11800 31.850000 363.65100 489.98200 1090 1
Uganda iNEXT 235 419.30806 41.499383 337.97076 500.64535 1135 1
Uganda iChao NA 426.76700 33.473000 371.55500 504.30200 1026 1
Uganda iNEXT 235 434.33604 35.958948 363.85780 504.81429 1111 1
Uganda iNEXT 235 432.85124 41.005866 352.48122 513.22126 1197 1
Uganda iChao NA 461.50300 35.548000 401.83800 542.50400 1135 1
Uganda iChao NA 482.28900 36.184000 420.91400 563.92400 1197 1
Uganda iChao NA 478.76200 39.108000 413.34800 568.16700 1111 1

Visualising estimates

Of course, these values are super helpful for papers on these topics. But similarly, we can easily make some further useful visualisations!

# To build the manual legend, make a small fake dataset
legendData <- dplyr::tibble(name = c("yes","yes"),
                                              statistic = c("iChao", "iNEXT") %>%
                                                factor(levels = c("iChao", "iNEXT")),
                                              est = c(1,1))
# Build a legend manually
violinLegend <- ggplot2::ggplot(legendData, ggplot2::aes(x = name, y = est)) + 
  ggplot2::geom_point(data = legendData, ggplot2::aes(y=est, x = name, colour = "red")) + 
   scale_color_manual(labels = c('Observed'), values = c('grey30'))  +
   ggplot2::geom_bar(aes(fill = statistic, y = est), position = position_dodge(0.90),  
                     stat = "identity") +
   ggplot2::scale_fill_manual(name = "Statistic",
                              labels = c("iChao", "iNEXT"),
                              values = c("iChao" = "#55AD9B", "iNEXT" = "#FD9B63")) +
   ggplot2::theme_classic() +
   theme(legend.title = element_blank(), legend.position = 'right', 
         legend.margin = margin(0, 0, 0, 0), legend.spacing.y = unit(0, "pt")) +
   ggplot2::guides(fill = ggplot2::guide_legend(ncol = 1, byrow = TRUE, reverse = TRUE, order = 1))

# plot the countries 
violinPlot <- ggplot2::ggplot(estimates$SiteOutput, ggplot2::aes(x = name, y = est)) + 
   ggplot2::geom_violin(position="dodge", alpha=0.5, ggplot2::aes(fill=Name, 
                                                                  y=`95%Lower`, x=variable), 
                        colour =  NA) +
   ggplot2::geom_violin(position="dodge", alpha=0.5, ggplot2::aes(fill=Name, y=`95%Upper`,
                                                                  x=variable), colour =  NA) +
   ggplot2::geom_violin(position="dodge", alpha=1, ggplot2::aes(fill=Name, y=Estimate, 
                                                                x=variable), colour =  "black") +
   ggplot2::scale_fill_manual(values=c("#55AD9B", "#FD9B63")) +
    ggplot2::geom_point(data = estimates$SiteOutput %>%
                          dplyr::distinct(variable, Observed) %>% 
                          tidyr::drop_na(), 
                        ggplot2::aes(x = variable, y = Observed), col = "grey40") +
  ggplot2::theme_classic() + 
  ggplot2::xlab("Continent") + 
  ggplot2::ylab("Species estimate") +
  guides(fill=guide_legend(title="Statistic")) +
  ggplot2::theme_classic() +
  ggplot2::theme(legend.position = "none", 
                 axis.text.x = ggplot2::element_text(angle = 60, vjust = 1, hjust=1)) +
  ggplot2::xlab(c( "")) + ggplot2::ylab(c("Species"))  +
  ggplot2::annotation_custom(cowplot::ggdraw(cowplot::get_legend(violinLegend)) %>%
                               ggplot2::ggplotGrob(), xmin = 1, xmax = 1, 
                               ymin = 350, ymax = 500)

  ## Tip: You can wrap the entire ggplot2 chunk of code in brackets () to print at the same time

# Save the plot
cowplot::save_plot(filename = paste0(tempdir(), "/violinPlot.pdf"),
                   plot = violinPlot,
                   base_width = 10,
                   base_height = 7)

Plots showing the rarefied iNext and non-parametric iChao estimates for four countries with 95% confidence intervals.

There are many ways to plot or analyse these kinds of data. The rest I will leave up to you! But, for more ideas, do feel free to check out the original publication for this workflow as well as the original R workflow itself.