R/toyData_beesFlagged.R
beesFlagged.Rd
A small bee occurrence dataset with flags generated by BeeBDC used to run example script and test
functions. For data types, see ColTypeR()
.
data("beesFlagged", package = "BeeBDC")
An object of class "tibble"
Occurrence code generated in bdc or BeeBDC
Full scientificName as shown on DiscoverLife
Family name
Subfamily name
Genus name
Subgenus name
Full name with subspecies name - ALA column
The species name only
The subspecies name only
The full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) taxon.
The taxonomic rank of the most specific name in the scientificName.
The authorship information for the scientificName formatted according to the conventions of the applicable nomenclaturalCode.
A brief phrase or a standard term ("cf.", "aff.") to express the determiner's doubts about the Identification.
A list (concatenated and separated) of taxa names terminating at the rank immediately superior to the taxon referenced in the taxon record.)
A list (concatenated and separated) of references (publication, global unique identifier, URI) used in the Identification.
A list (concatenated and separated) of nomenclatural types (type status, typified scientific name, publication) applied to the subject.
A list (concatenated and separated) of previous assignments of names to the Organism.
This term is meant to allow the capture of an unaltered original identification/determination, including identification qualifiers, hybrid formulas, uncertainties, etc. This term is meant to be used in addition to scientificName (and identificationQualifier etc.), not instead of it.
A list (concatenated and separated) of names of people, groups, or organizations who assigned the Taxon to the subject.
The date on which the subject was determined as representing the Taxon.
The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.
The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.
The name of the next smaller administrative region than country (state, province, canton, department, region, etc.) in which the Location occurs.
The name of the continent in which the Location occurs.
The specific description of the place.
The name of the island on or near which the Location occurs.
The full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department, etc.) in which the Location occurs.
The full, unabbreviated name of the next smaller administrative region than county (city, municipality, etc.) in which the Location occurs. Do not use this term for a nearby named place that does not contain the actual location.
A legal document giving official permission to do something with the resource.
A GBIF-defined issue.
The date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded. Not suitable for a time in a geological context.
The time or interval during which an Event occurred.
The integer day of the month on which the Event occurred.
The integer month in which the Event occurred.
The four-digit year in which the Event occurred, according to the Common Era Calendar.
The specific nature of the data record. Recommended best practice is to use the standard label of one of the Darwin Core classes.PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, Occurrence, MaterialCitation
The name of the country or major administrative unit in which the Location occurs.
The nature or genre of the resource. StillImage, MovingImage, Sound, PhysicalObject, Event, Text.
A statement about the presence or absence of a Taxon at a Location. present, absent.
An identifier given to the Occurrence at the time it was recorded. Often serves as a link between field notes and an Occurrence record, such as a specimen collector's number.
A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
An identifier for the set of information associated with an Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the data set.
A spatial region or named place.
The names of, references to, or descriptions of the methods or protocols used during an Event. Examples UV light trap, mist net, bottom trawl, ad hoc observation | point count, Penguins from space: faecal stains reveal the location of emperor penguin colonies, https://doi.org/10.1111/j.1466-8238.2009.00467.x, Takats et al. 2001.
The amount of effort expended during an Event. Examples 40 trap-nights, 10 observer-hours, 10 km by foot, 30 km by car.
The number of individuals present at the time of the Occurrence. Integer.
A number or enumeration value for the quantity of organisms. Examples 27 (organismQuantity) with individuals (organismQuantityType). 12.5 (organismQuantity) with percentage biomass (organismQuantityType). r (organismQuantity) with Braun Blanquet Scale (organismQuantityType). many (organismQuantity) with individuals (organismQuantityType).
A decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude.
The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term.
Occurrence records in the ALA can be filtered by using the spatially valid flag. This flag combines a set of tests applied to the record to see how reliable are its spatial data components.
An identifier (preferably unique) for the record within the data set or collection.
The identifier assigned by GBIF for each record.
An identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.
The name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record. Examples MVZ, FMNH, CLO, UCMP.
The name identifying the data set from which the record was derived.
A list (concatenated and separated) of previous or alternate fully qualified catalog numbers or other human-used identifiers for the same Occurrence, whether in the current or any other data set or collection.
An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.
The GBIF-assigned taxon identifier number.
An identifier for the collection or dataset from which the record was derived.
The verbatim (originally-provided) scientific name
The verbatim original representation of the date and time information for an Event.
A list (concatenated and separated) of identifiers or names of taxa and the associations of this Occurrence to each of them.
A list (concatenated and separated) of identifiers of other Organisms and the associations of this Organism to each of them.
One of a) an indicator of the existence of, b) a reference to (publication, URI), or c) the text of notes taken in the field about the Event.
The sex of the biological individual(s) represented in the Occurrence.
A description of the usage rights applicable to the record.
A person or organization owning or managing rights over the resource.
Information about who can access the resource or an indication of its security status.
A list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the Occurrence.
A bibliographic reference for the resource as a statement indicating how this record should be cited (attributed) when used.
A related resource that is referenced, cited, or otherwise pointed to by the described resource.
Additional information that exists, but that has not been shared in the given record.
Additional information that exists, but that has not been shared in the given record.
Variable indicating presence/absence of location coordinates.
Variable indicating validity of geospatial data associated with record.
Year associated with Occurrence.
Variable with identifying value for the Occurrenc.
Variable indicating is Occurrence is duplicate or not.
A list (concatenated and separated) of identifiers of other Occurrence records and their associations to this Occurrence.
Comments or notes about the Location.
BeeBDC assigned source of the data. Often written when the data is formatted by a BeeBDC::xxx_readr function or similar.
The verbatim (originally-provided) scientific name
Flag produced by bdc::bdc_scientificName_empty()
where FALSE == no scientific name provided and TRUE means that there is text in that column.
Flag produced by bdc::bdc_coordinates_empty()
where FALSE == no coordinates provided.
Flag produced by bdc::bdc_coordinates_outOfRange()
where FALSE == point off the earth. This function identifies records with out-of-range coordinates (not between -90 and 90 for latitude; between -180 and 180 for longitude).
Flag produced by bdc::bdc_basisOfRecords_notStandard()
where FALSE == an occurrence with a basisOfRecord not defined as acceptable by the user.
A country name suggested by the bdc::bdc_country_standardized()
function.
A country code suggested by the bdc::bdc_country_standardized()
function.
A column indicating if coordinates were tansposed by jbd_Ctrans_chunker()
where FALSE == transposed.
A flag generated by jbd_coordCountryInconsistent()
where FALSE == an occurrence where the country name and coordinates did not match.
A flag generated by flagAbsent()
where FALSE == occurrences marked as "ABSENT" in the "occurrenceStatus" column
A flag generated by flagLicense()
where FALSE == those occurrences protected by a restrictive license.
A flag generated by GBIFissues()
where FALSE == an occurrence with user-specified GBIF issues to flag.
A flag generated by bdc::bdc_clean_names()
where FALSE == the presence of taxonomic uncertainty terms.
A column made by bdc::bdc_clean_names()
indicating the cleaned scientificName
A flag generated by harmoniseR()
where FALSE == occurrences whose scientificName did not match the Discover Life taxonomy.
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == rounded (probably imprecise) coordinates.
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == invalid coordinates.
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == equal coordinates (e.g., 0.1, 0.1).
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == zeros as coordinates
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == records around country capital centroid.
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == records around country or province centroids.
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == records around the GBIF headquarters.
A flag generated by CoordinateCleaner::clean_coordinates()
where FALSE == records around biodiversity institutions.
A flag generated by diagonAlley()
where FALSE == records that are possibly the result of fill-down errors in sequence.
A flag generated by CoordinateCleaner::cd_round()
where FALSE == potential gridding in the longitude column within dataset.
A flag generated by CoordinateCleaner::cd_round()
where FALSE == potential gridding in the latitude column within dataset.
A flag generated by CoordinateCleaner::cd_round()
where FALSE == potential gridding in either the longitude or latitude columns within dataset.
A flag generated by coordUncerFlagR()
where FALSE == occurrences that did not pass a user-specified threshold in the "coordinateUncertaintyInMeters" column.
A column made by countryOutlieRs()
. Summarises the occurrence-level result: where the species is not known to occur in that country (noMatch), it is known from a bordering country (neighbour), or it is known to occur in that country (exact).
A flag generated by countryOutlieRs()
where FALSE == occurrences the do not occur in a country that concurs with the Discover Life country checklist OR an adjacent country.
A flag generated by countryOutlieRs()
where FALSE == occurrences that are in the ocean.
A flag generated by summaryFun()
where FALSE == occurrences flagged as FALSE in any of the .flag columns. In this example it excludes flags in the ".gridSummary", ".lonFlag", ".latFlag", and ".uncer_terms" columns.
A flag generated by bdc::bdc_eventDate_empty()
where FALSE == occurrences with no eventDate provided.
A flag generated by bdc::bdc_year_outOfRange()
where FALSE == occurrences older than a threshold date. In this case 1950.
A flag generated by dupeSummary()
where FALSE == occurrences identified as duplicates. There will be an associated kept duplicate (.duplictes == TRUE) for all duplicate clusters.
This data set was created by generating a random subset of 100 rows from the full BeeBDC dataset from the publication: Dorey, J.B., Fischer, E.E., Chesshire, P.R., Nava-Bolaños, A., O’Reilly, R.L., Bossert, S., Collins, S.M., Lichtenberg, E.M., Tucker, E., Smith-Pardo, A., Falcon-Brindis, A., Guevara, D.A., Ribeiro, B.R., de Pedro, D., Hung, J.K.-L., Parys, K.A., McCabe, L.M., Rogan, M.S., Minckley, R.L., Velzco, S.J.E., Griswold, T., Zarrillo, T.A., Jetz, W., Sica, Y.V., Orr, M.C., Guzman, L.M., Ascher, J., Hughes, A.C. & Cobb, N.S. (2023) A globally synthesised and flagged bee occurrence dataset and cleaning workflow. Scientific Data, 10, 1–17. https://www.doi.org/10.1038/S41597-023-02626-W
beesFlagged <- BeeBDC::beesFlagged
head(beesFlagged)
#> # A tibble: 6 × 124
#> database_id scientificName family subfamily genus subgenus subspecies species
#> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr>
#> 1 Dorey_data_… Pseudoanthidi… Megac… Megachil… Pseu… NA NA Pseudo…
#> 2 Dorey_data_… Macrotera arc… Andre… Panurgin… Macr… NA NA Macrot…
#> 3 Dorey_data_… Xanthesma fur… Colle… Euryglos… Xant… NA NA Xanthe…
#> 4 Dorey_data_… Exomalopsis s… Apidae Apinae Exom… NA NA Exomal…
#> 5 Dorey_data_… Osmia bicolor… Megac… Megachil… Osmia NA NA Osmia …
#> 6 Paige_data_… Augochlorella… Halic… Halictin… Augo… NA NA Augoch…
#> # ℹ 116 more variables: specificEpithet <chr>, infraspecificEpithet <chr>,
#> # acceptedNameUsage <lgl>, taxonRank <chr>, scientificNameAuthorship <chr>,
#> # identificationQualifier <lgl>, higherClassification <chr>,
#> # identificationReferences <lgl>, typeStatus <chr>,
#> # previousIdentifications <chr>, verbatimIdentification <chr>,
#> # identifiedBy <chr>, dateIdentified <chr>, decimalLatitude <dbl>,
#> # decimalLongitude <dbl>, stateProvince <chr>, continent <chr>, …