A small bee occurrence dataset with flags generated by BeeBDC used to run example script and test functions. For data types, see ColTypeR().

data("beesFlagged", package = "BeeBDC")

Format

An object of class "tibble"

database_id

Occurrence code generated in bdc or BeeBDC

scientificName

Full scientificName as shown on DiscoverLife

family

Family name

subfamily

Subfamily name

genus

Genus name

subgenus

Subgenus name

subspecies

Full name with subspecies name - ALA column

specificEpithet

The species name only

infraspecificEpithet

The subspecies name only

acceptedNameUsage

The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon.

taxonRank

The taxonomic rank of the most specific name in the scientificName.

scientificNameAuthorship

The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.

identificationQualifier

A brief phrase or a standard term ("cf.", "aff.") to express the determiner's doubts about the Identification.

higherClassification

A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record.)

identificationReferences

A list (concatenated and separated) of references (publication, global unique identifier, URI) used in the Identification.

typeStatus

A list (concatenated and separated) of nomenclatural types (type status, typified scientific name, publication) applied to the subject.

previousIdentifications

A list (concatenated and separated) of previous assignments of names to the Organism.

verbatimIdentification

This term is meant to allow the capture of an unaltered original identification/determination, including identification qualifiers, hybrid formulas, uncertainties, etc. This term is meant to be used in addition to scientificName (and identificationQualifier etc.), not instead of it.

identifiedBy

A list (concatenated and separated) of names of people, groups, or organizations who assigned the Taxon to the subject.

dateIdentified

The date on which the subject was determined as representing the Taxon.

decimalLatitude

The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.

decimalLongitude

The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.

stateProvince

The name of the next smaller administrative region than country (state, province, canton, department, region, etc.) in which the Location occurs.

continent

The name of the continent in which the Location occurs.

locality

The specific description of the place.

island

The name of the island on or near which the Location occurs.

county

The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department, etc.) in which the Location occurs.

municipality

The full, unabbreviated name of the next smaller administrative region than county (city, municipality, etc.) in which the Location occurs. Do not use this term for a nearby named place that does not contain the actual location.

license

A legal document giving official permission to do something with the resource.

issue

A GBIF-defined issue.

eventDate

The date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded. Not suitable for a time in a geological context.

eventTime

The time or interval during which an Event occurred.

day

The integer day of the month on which the Event occurred.

month

The integer month in which the Event occurred.

year

The four-digit year in which the Event occurred, according to the Common Era Calendar.

basisOfRecord

The specific nature of the data record. Recommended best practice is to use the standard label of one of the Darwin Core classes.PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, Occurrence, MaterialCitation

country

The name of the country or major administrative unit in which the Location occurs.

type

The nature or genre of the resource. StillImage, MovingImage, Sound, PhysicalObject, Event, Text.

occurrenceStatus

A statement about the presence or absence of a Taxon at a Location. present, absent.

recordNumber

An identifier given to the Occurrence at the time it was recorded. Often serves as a link between field notes and an Occurrence record, such as a specimen collector's number.

recordedBy

A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.

eventID

An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the data set.

Location

A spatial region or named place.

samplingProtocol

The names of, references to, or descriptions of the methods or protocols used during an Event. Examples UV light trap, mist net, bottom trawl, ad hoc observation | point count, Penguins from space: faecal stains reveal the location of emperor penguin colonies, https://doi.org/10.1111/j.1466-8238.2009.00467.x, Takats et al. 2001.

samplingEffort

The amount of effort expended during an Event. Examples 40 trap-nights, 10 observer-hours, 10 km by foot, 30 km by car.

individualCount

The number of individuals present at the time of the Occurrence. Integer.

organismQuantity

A number or enumeration value for the quantity of organisms. Examples 27 (organismQuantity) with individuals (organismQuantityType). 12.5 (organismQuantity) with percentage biomass (organismQuantityType). r (organismQuantity) with Braun Blanquet Scale (organismQuantityType). many (organismQuantity) with individuals (organismQuantityType).

coordinatePrecision

A decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude.

coordinateUncertaintyInMeters

The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term.

spatiallyValid

Occurrence records in the ALA can be filtered by using the spatially valid flag. This flag combines a set of tests applied to the record to see how reliable are its spatial data components.

catalogNumber

An identifier (preferably unique) for the record within the data set or collection.

gbifID

The identifier assigned by GBIF for each record.

datasetID

An identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.

institutionCode

The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record. Examples MVZ, FMNH, CLO, UCMP.

datasetName

The name identifying the data set from which the record was derived.

otherCatalogNumbers

A list (concatenated and separated) of previous or alternate fully qualified catalog numbers or other human-used identifiers for the same Occurrence, whether in the current or any other data set or collection.

occurrenceID

An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.

taxonKey

The GBIF-assigned taxon identifier number.

collectionID

An identifier for the collection or dataset from which the record was derived.

verbatim_scientificName

The verbatim (originally-provided) scientific name

verbatimEventDate

The verbatim original representation of the date and time information for an Event.

associatedTaxa

A list (concatenated and separated) of identifiers or names of taxa and the associations of this Occurrence to each of them.

associatedOrganisms

A list (concatenated and separated) of identifiers of other Organisms and the associations of this Organism to each of them.

fieldNotes

One of a) an indicator of the existence of, b) a reference to (publication, URI), or c) the text of notes taken in the field about the Event.

sex

The sex of the biological individual(s) represented in the Occurrence.

rights

A description of the usage rights applicable to the record.

rightsHolder

A person or organization owning or managing rights over the resource.

accessRights

Information about who can access the resource or an indication of its security status.

associatedReferences

A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the Occurrence.

bibliographicCitation

A bibliographic reference for the resource as a statement indicating how this record should be cited (attributed) when used.

references

A related resource that is referenced, cited, or otherwise pointed to by the described resource.

informationWithheld

Additional information that exists, but that has not been shared in the given record.

isDuplicateOf

Additional information that exists, but that has not been shared in the given record.

hasCoordinate

Variable indicating presence/absence of location coordinates.

hasGeospatialIssues

Variable indicating validity of geospatial data associated with record.

occurrenceYear

Year associated with Occurrence.

id

Variable with identifying value for the Occurrenc.

duplicateStatus

Variable indicating is Occurrence is duplicate or not.

associatedOccurrences

A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence.

locationRemarks

Comments or notes about the Location.

dataSource

BeeBDC assigned source of the data. Often written when the data is formatted by a BeeBDC::xxx_readr function or similar.

verbatim_scientificName

The verbatim (originally-provided) scientific name

.scientificName_empty

Flag produced by bdc::bdc_scientificName_empty() where FALSE == no scientific name provided and TRUE means that there is text in that column.

.coordinates_empty

Flag produced by bdc::bdc_coordinates_empty() where FALSE == no coordinates provided.

.coordinates_outOfRange

Flag produced by bdc::bdc_coordinates_outOfRange() where FALSE == point off the earth. This function identifies records with out-of-range coordinates (not between -90 and 90 for latitude; between -180 and 180 for longitude).

.basisOfRecords_notStandard

Flag produced by bdc::bdc_basisOfRecords_notStandard() where FALSE == an occurrence with a basisOfRecord not defined as acceptable by the user.

country_suggested

A country name suggested by the bdc::bdc_country_standardized() function.

countryCode

A country code suggested by the bdc::bdc_country_standardized() function.

coordinates_transposed

A column indicating if coordinates were tansposed by jbd_Ctrans_chunker() where FALSE == transposed.

.coordinates_country_inconsistent

A flag generated by jbd_coordCountryInconsistent() where FALSE == an occurrence where the country name and coordinates did not match.

.occurrenceAbsent

A flag generated by flagAbsent() where FALSE == occurrences marked as "ABSENT" in the "occurrenceStatus" column

.unLicensed

A flag generated by flagLicense() where FALSE == those occurrences protected by a restrictive license.

.GBIFflags

A flag generated by GBIFissues() where FALSE == an occurrence with user-specified GBIF issues to flag.

.uncer_terms

A flag generated by bdc::bdc_clean_names() where FALSE == the presence of taxonomic uncertainty terms.

names_clean

A column made by bdc::bdc_clean_names() indicating the cleaned scientificName

.invalidName

A flag generated by harmoniseR() where FALSE == occurrences whose scientificName did not match the Discover Life taxonomy.

.rou

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == rounded (probably imprecise) coordinates.

.val

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == invalid coordinates.

.equ

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == equal coordinates (e.g., 0.1, 0.1).

.zer

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == zeros as coordinates

.cap

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == records around country capital centroid.

.cen

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == records around country or province centroids.

.gbf

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == records around the GBIF headquarters.

.inst

A flag generated by CoordinateCleaner::clean_coordinates() where FALSE == records around biodiversity institutions.

.sequential

A flag generated by diagonAlley() where FALSE == records that are possibly the result of fill-down errors in sequence.

.lonFlag

A flag generated by CoordinateCleaner::cd_round() where FALSE == potential gridding in the longitude column within dataset.

.latFlag

A flag generated by CoordinateCleaner::cd_round() where FALSE == potential gridding in the latitude column within dataset.

.gridSummary

A flag generated by CoordinateCleaner::cd_round() where FALSE == potential gridding in either the longitude or latitude columns within dataset.

.uncertaintyThreshold

A flag generated by coordUncerFlagR() where FALSE == occurrences that did not pass a user-specified threshold in the "coordinateUncertaintyInMeters" column.

countryMatch

A column made by countryOutlieRs(). Summarises the occurrence-level result: where the species is not known to occur in that country (noMatch), it is known from a bordering country (neighbour), or it is known to occur in that country (exact).

.countryOutlier

A flag generated by countryOutlieRs() where FALSE == occurrences the do not occur in a country that concurs with the Discover Life country checklist OR an adjacent country.

.sea

A flag generated by countryOutlieRs() where FALSE == occurrences that are in the ocean.

.summary

A flag generated by summaryFun() where FALSE == occurrences flagged as FALSE in any of the .flag columns. In this example it excludes flags in the ".gridSummary", ".lonFlag", ".latFlag", and ".uncer_terms" columns.

.eventDate_empty

A flag generated by bdc::bdc_eventDate_empty() where FALSE == occurrences with no eventDate provided.

.year_outOfRange

A flag generated by bdc::bdc_year_outOfRange() where FALSE == occurrences older than a threshold date. In this case 1950.

.duplicates

A flag generated by dupeSummary() where FALSE == occurrences identified as duplicates. There will be an associated kept duplicate (.duplictes == TRUE) for all duplicate clusters.

References

This data set was created by generating a random subset of 100 rows from the full BeeBDC dataset from the publication: Dorey, J.B., Fischer, E.E., Chesshire, P.R., Nava-Bolaños, A., O’Reilly, R.L., Bossert, S., Collins, S.M., Lichtenberg, E.M., Tucker, E., Smith-Pardo, A., Falcon-Brindis, A., Guevara, D.A., Ribeiro, B.R., de Pedro, D., Hung, J.K.-L., Parys, K.A., McCabe, L.M., Rogan, M.S., Minckley, R.L., Velzco, S.J.E., Griswold, T., Zarrillo, T.A., Jetz, W., Sica, Y.V., Orr, M.C., Guzman, L.M., Ascher, J., Hughes, A.C. & Cobb, N.S. (2023) A globally synthesised and flagged bee occurrence dataset and cleaning workflow. Scientific Data, 10, 1–17. https://www.doi.org/10.1038/S41597-023-02626-W

Examples


beesFlagged <- BeeBDC::beesFlagged
head(beesFlagged)
#> # A tibble: 6 × 124
#>   database_id  scientificName family subfamily genus subgenus subspecies species
#>   <chr>        <chr>          <chr>  <chr>     <chr> <chr>    <lgl>      <chr>  
#> 1 Dorey_data_… Pseudoanthidi… Megac… Megachil… Pseu… NA       NA         Pseudo…
#> 2 Dorey_data_… Macrotera arc… Andre… Panurgin… Macr… NA       NA         Macrot…
#> 3 Dorey_data_… Xanthesma fur… Colle… Euryglos… Xant… NA       NA         Xanthe…
#> 4 Dorey_data_… Exomalopsis s… Apidae Apinae    Exom… NA       NA         Exomal…
#> 5 Dorey_data_… Osmia bicolor… Megac… Megachil… Osmia NA       NA         Osmia …
#> 6 Paige_data_… Augochlorella… Halic… Halictin… Augo… NA       NA         Augoch…
#> # ℹ 116 more variables: specificEpithet <chr>, infraspecificEpithet <chr>,
#> #   acceptedNameUsage <lgl>, taxonRank <chr>, scientificNameAuthorship <chr>,
#> #   identificationQualifier <lgl>, higherClassification <chr>,
#> #   identificationReferences <lgl>, typeStatus <chr>,
#> #   previousIdentifications <chr>, verbatimIdentification <chr>,
#> #   identifiedBy <chr>, dateIdentified <chr>, decimalLatitude <dbl>,
#> #   decimalLongitude <dbl>, stateProvince <chr>, continent <chr>, …