R/jbd_Ctrans_chunker.R
jbd_Ctrans_chunker.Rd
Because the jbd_coordinates_transposed()
function is very RAM-intensive, this wrapper
allows a user to specify chunk-sizes and only analyse a small portion of the occurrence data at a
time. The prefix jbd_ is used to highlight the difference between this function and the original
bdc::bdc_coordinates_transposed()
.
This function will preferably use the countryCode column generated by
bdc::bdc_country_standardized()
.
jbd_Ctrans_chunker(
data = NULL,
lat = "decimalLatitude",
lon = "decimalLongitude",
idcol = "databse_id",
country = "country_suggested",
countryCode = "countryCode",
sci_names = "scientificName",
border_buffer = 0.2,
save_outputs = TRUE,
stepSize = 1e+06,
chunkStart = 1,
progressiveSave = TRUE,
path = tempdir(),
append = TRUE,
scale = "large",
mc.cores = 1
)
A data frame or tibble. Occurrence records as input.
Character. The column with latitude in decimal degrees. Default = "decimalLatitude".
Character. The column with longitude in decimal degrees. Default = "decimalLongitude".
Character. The column name with a unique record identifier. Default = "database_id".
Character. The name of the column containing country names. Default = "country".
Character. Identifies the column containing ISO-2 country codes Default = "countryCode".
Character. The column containing scientific names. Default = "scientificName".
Numeric. The buffer, in decimal degrees, around points to help match them to countries. Default = 0.2 (~22 km at equator).
Logical. If TRUE, transposed occurrences will be saved to their own file.
Numeric. The number of occurrences to process in each chunk. Default = 1000000.
Numeric. The chunk number to start from. This can be > 1 when you need to restart the function from a certain chunk; for example if R failed unexpectedly.
Logical. If TRUE then the country output list will be saved between
each iteration so that append
can be used if the function is stopped part way through.
Character. The path to a file in which to save the 01_coordinates_transposed_ output.
Logical. If TRUE, the function will look to append an existing file.
Passed to rnaturalearth's ne_countries(). Scale of map to return, one of 110, 50, 10 or 'small', 'medium', 'large'. Default = "large".
Numeric. If > 1, the jbd_correct_coordinates function will run in parallel using mclapply using the number of cores specified. If = 1 then it will be run using a serial loop. NOTE: Windows machines must use a value of 1 (see ?parallel::mclapply). Additionally, be aware that each thread can use large chunks of memory. Default = 1.#'
Returns the input data frame with a new column, coordinates_transposed, where FALSE = columns that had coordinates transposed.
if(requireNamespace("rnaturalearthdata")){
library(dplyr)
# Import and prepare the data
data(beesFlagged)
beesFlagged <- beesFlagged %>% dplyr::select(!c(.val, .sea)) %>%
# Cut down the dataset to un example quicker
dplyr::filter(dplyr::row_number() %in% 1:20)
# Run the function
beesFlagged_out <- jbd_Ctrans_chunker(
# bdc_coordinates_transposed inputs
data = beesFlagged,
idcol = "database_id",
lat = "decimalLatitude",
lon = "decimalLongitude",
country = "country_suggested",
countryCode = "countryCode",
# in decimal degrees (~22 km at the equator)
border_buffer = 1,
save_outputs = FALSE,
sci_names = "scientificName",
# chunker inputs
# How many rows to process at a time
stepSize = 1000000,
# Start row
chunkStart = 1,
# Progressively save the output between each iteration?
progressiveSave = FALSE,
path = tempdir(),
# If FALSE it may overwrite existing dataset
append = FALSE,
# Users should select scale = "large" as it is more thoroughly tested
scale = "medium",
mc.cores = 1
)
table(beesFlagged_out$coordinates_transposed, useNA = "always")
} # END if require
#> - Running chunker with:
#> stepSize = 1,000,000
#> chunkStart = 1
#> chunkEnd = 1,000,000
#> append = FALSE
#> - Starting chunk 1...
#> From 1 to 1,000,000
#> Loading required package: readr
#> Spherical geometry (s2) switched on
#> Correcting latitude and longitude transposed
#> 0 occurrences will be tested
#> No latitude and longitude were transposed
#> - Finished chunk 1 of 1. Total records examined: 20
#> - Completed in 3.97 secs
#>
#> TRUE <NA>
#> 20 0