Because the jbd_coordinates_transposed() function is very RAM-intensive, this wrapper allows a user to specify chunk-sizes and only analyse a small portion of the occurrence data at a time. The prefix jbd_ is used to highlight the difference between this function and the original bdc::bdc_coordinates_transposed(). This function will preferably use the countryCode column generated by bdc::bdc_country_standardized().

jbd_Ctrans_chunker(
  data = NULL,
  lat = "decimalLatitude",
  lon = "decimalLongitude",
  idcol = "databse_id",
  country = "country_suggested",
  countryCode = "countryCode",
  sci_names = "scientificName",
  border_buffer = 0.2,
  save_outputs = TRUE,
  stepSize = 1e+06,
  chunkStart = 1,
  progressiveSave = TRUE,
  path = tempdir(),
  append = TRUE,
  scale = "large",
  mc.cores = 1
)

Arguments

data

A data frame or tibble. Occurrence records as input.

lat

Character. The column with latitude in decimal degrees. Default = "decimalLatitude".

lon

Character. The column with longitude in decimal degrees. Default = "decimalLongitude".

idcol

Character. The column name with a unique record identifier. Default = "database_id".

country

Character. The name of the column containing country names. Default = "country".

countryCode

Character. Identifies the column containing ISO-2 country codes Default = "countryCode".

sci_names

Character. The column containing scientific names. Default = "scientificName".

border_buffer

Numeric. The buffer, in decimal degrees, around points to help match them to countries. Default = 0.2 (~22 km at equator).

save_outputs

Logical. If TRUE, transposed occurrences will be saved to their own file.

stepSize

Numeric. The number of occurrences to process in each chunk. Default = 1000000.

chunkStart

Numeric. The chunk number to start from. This can be > 1 when you need to restart the function from a certain chunk; for example if R failed unexpectedly.

progressiveSave

Logical. If TRUE then the country output list will be saved between each iteration so that append can be used if the function is stopped part way through.

path

Character. The path to a file in which to save the 01_coordinates_transposed_ output.

append

Logical. If TRUE, the function will look to append an existing file.

scale

Passed to rnaturalearth's ne_countries(). Scale of map to return, one of 110, 50, 10 or 'small', 'medium', 'large'. Default = "large".

mc.cores

Numeric. If > 1, the jbd_correct_coordinates function will run in parallel using mclapply using the number of cores specified. If = 1 then it will be run using a serial loop. NOTE: Windows machines must use a value of 1 (see ?parallel::mclapply). Additionally, be aware that each thread can use large chunks of memory. Default = 1.#'

Value

Returns the input data frame with a new column, coordinates_transposed, where FALSE = columns that had coordinates transposed.

Examples

if(requireNamespace("rnaturalearthdata")){
library(dplyr)
  # Import and prepare the data
data(beesFlagged)
beesFlagged <- beesFlagged %>% dplyr::select(!c(.val, .sea)) %>%
  # Cut down the dataset to un example quicker
dplyr::filter(dplyr::row_number() %in% 1:20)
  # Run the function
beesFlagged_out <- jbd_Ctrans_chunker(
# bdc_coordinates_transposed inputs
data = beesFlagged,
idcol = "database_id",
lat = "decimalLatitude",
lon = "decimalLongitude",
country = "country_suggested",
countryCode = "countryCode",
# in decimal degrees (~22 km at the equator)
border_buffer = 1, 
save_outputs = FALSE,
sci_names = "scientificName",
# chunker inputs
# How many rows to process at a time
stepSize = 1000000,  
# Start row
chunkStart = 1,  
# Progressively save the output between each iteration?
progressiveSave = FALSE,
path = tempdir(),
# If FALSE it may overwrite existing dataset
append = FALSE,
  # Users should select scale = "large" as it is more thoroughly tested
scale = "medium",
mc.cores = 1
) 
table(beesFlagged_out$coordinates_transposed, useNA = "always")
} # END if require
#>  - Running chunker with:
#> stepSize = 1,000,000
#> chunkStart = 1
#> chunkEnd = 1,000,000
#> append = FALSE
#>  - Starting chunk 1...
#> From 1 to 1,000,000
#> Loading required package: readr
#> Spherical geometry (s2) switched on
#> Correcting latitude and longitude transposed
#> 0 occurrences will be tested
#> No latitude and longitude were transposed
#>  - Finished chunk 1 of 1. Total records examined: 20
#>  - Completed in 3.97 secs
#> 
#> TRUE <NA> 
#>   20    0