R/manualOutlierFindeR.R
manualOutlierFindeR.Rd
Uses expert-identified outliers with source spreadsheets that may be edited by users. The function
will also use the duplicates file made using dupeSummary()
to identify duplicates of the
expert-identified outliers and flag those as well.
The function will add a flagging column called .expertOutlier
where records that are FALSE are
the expert outliers.
manualOutlierFindeR(
data = NULL,
DataPath = NULL,
PaigeOutliersName = "removedBecauseDeterminedOutlier.csv",
newOutliersName = "All_outliers_ANB.xlsx",
ColombiaOutliers_all = "All_Colombian_OutlierIDs.csv",
duplicates = NULL,
NearTRUE = NULL,
NearTRUE_threshold = 5
)
A data frame or tibble. Occurrence records as input.
A character path to the directory that contains the outlier spreadsheets.
A character patch. Should lead to outlier spreadsheet from Paige Chesshire (csv file).
A character path. Should lead to appropriate outlier spreadsheet (xlsx file).
A character path. Should lead to spreadsheet of bee outliers from Colombia (csv file).
A data frame or tibble. The duplicate file produced by dupeSummary()
.
Optional. A character file name to the csv file. If you want to remove expert outliers that are too close to TRUE points, use the name of the NearTRUE.csv. Note: This implementation is only basic for now unless there is a greater need in the future.
Numeric. The threshold (in km) for the distance to TRUE points to keep expert outliers.
Returns the data with a new column, .expertOutlier
where records that are FALSE are
the expert outliers.
if (FALSE) {
# Read example data
data(beesFlagged)
# Read in the most-recent duplicates file as well
if(!exists("duplicates")){
duplicates <- fileFinder(path = DataPath,
fileName = "duplicateRun_") %>%
readr::read_csv()}
# identify the outliers and get a list of their database_ids
beesFlagged_out <- manualOutlierFindeR(
data = beesFlagged,
DataPath = DataPath,
PaigeOutliersName = "removedBecauseDeterminedOutlier.csv",
newOutliersName = "^All_outliers_ANB_14March.xlsx",
ColombiaOutliers_all = "All_Colombian_OutlierIDs.csv",
duplicates = duplicates)
}