Tune ecological niche model (ENM) settings and calculate evaluation statistics
ENMevaluate.Rd
ENMevaluate()
is the primary function for the ENMeval package. This
function builds ecological niche models iteratively across a range of user-specified tuning
settings. Users can choose to evaluate models with cross validation or a full-withheld testing
dataset. ENMevaluate()
returns an ENMevaluation
object with slots containing
evaluation statistics for each combination of settings and for each cross validation fold therein, as
well as raster predictions for each model when raster data is input. The evaluation statistics in the
results table should aid users in identifying model settings that balance fit and predictive ability. See
the extensive vignette for fully worked examples:
<https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.
Usage
ENMevaluate(
occs,
envs = NULL,
bg = NULL,
tune.args = NULL,
partitions = NULL,
algorithm = NULL,
partition.settings = NULL,
other.settings = NULL,
categoricals = NULL,
doClamp = TRUE,
clamp.directions = NULL,
user.enm = NULL,
user.grp = NULL,
occs.testing = NULL,
taxon.name = NULL,
n.bg = 10000,
overlap = FALSE,
overlapStat = c("D", "I"),
user.val.grps = NULL,
user.eval = NULL,
rmm = NULL,
parallel = FALSE,
numCores = NULL,
parallelType = "doSNOW",
updateProgress = FALSE,
quiet = FALSE,
occ = NULL,
env = NULL,
bg.coords = NULL,
RMvalues = NULL,
fc = NULL,
occ.grp = NULL,
bg.grp = NULL,
method = NULL,
bin.output = NULL,
rasterPreds = NULL,
clamp = NULL,
progbar = NULL
)
Arguments
- occs
matrix / data frame: occurrence records with two columns for longitude and latitude of occurrence localities, in that order. If specifying predictor variable values assigned to presence/background localities (without inputting raster data), this table should also have one column for each predictor variable. See Note for important distinctions between running the function with and without rasters.
- envs
RasterStack: environmental predictor variables. These should be in same geographic projection as occurrence data.
- bg
matrix / data frame: background records with two columns for longitude and latitude of background (or pseudo-absence) localities, in that order. If NULL, points will be randomly sampled across
envs
with the number specified by argumentn.bg
. If specifying predictor variable values assigned to presence/background localities (without inputting raster data), this table should also have one column for each predictor variable. See Details for important distinctions between running the function with and without rasters.- tune.args
named list: model settings to be tuned (i.e., for Maxent models:
list(fc = c("L","Q"), rm = 1:3)
)- partitions
character: name of partitioning technique. Currently available options are the nonspatial partitions "randomkfold" and "jackknife", and the spatial partitions "block", "checkerboard1", and "checkerboard2", "testing" for partitioning with fully withheld data (see argument occs.testing), the "user" option (see argument user.grp), and "none" for no partitioning (see
?partitions
for details).- algorithm
character: name of the algorithm used to build models. Currently one of "maxnet", "maxent.jar", or "bioclim", else the name from a custom ENMdetails implementation.
- partition.settings
named list: used to specify certain settings for partitioning schema. See Details and ?partitions for descriptions of these settings.
- other.settings
named list: used to specify extra settings for the analysis. All of these settings have internal defaults, so if they are not specified the analysis will be run with default settings. See Details for descriptions of these settings, including how to specify arguments for maxent.jar.
- categoricals
character vector: name or names of categorical environmental variables. If not specified, all predictor variables will be treated as continuous unless they are factors. If categorical variables are already factors, specifying names of such variables in this argument is not needed.
- doClamp
boolean: if TRUE (default), model prediction extrapolations will be restricted to the upper and lower bounds of the predictor variables. Clamping avoids extreme predictions for environment values outside the range of the training data. If free extrapolation is a study aim, this should be set to FALSE, but for most applications leaving this at the default of TRUE is advisable to avoid unrealistic predictions. When predictor variables are input, they are clamped internally before making model predictions when clamping is on. When no predictor variables are input and data frames of variable values are used instead (SWD format), validation data is clamped before making model predictions when clamping is on.
- clamp.directions
named list: specifies the direction ("left" for minimum, "right" for maximum) of clamping for predictor variables -- (e.g.,
list(left = c("bio1","bio5"), right = c("bio10","bio15"))
).- user.enm
ENMdetails object: a custom ENMdetails object used to build models. This is an alternative to specifying
algorithm
with a character string.- user.grp
named list: specifies user-defined partition groups, where
occs.grp
= vector of partition group (fold) for each occurrence locality, intended for user-defined partitions, andbg.grp
= same vector for background (or pseudo-absence) localities.- occs.testing
matrix / data frame: a fully withheld testing dataset with two columns for longitude and latitude of occurrence localities, in that order when
partitions = "testing"
. These occurrences will be used only for evaluation but not for model training, and thus no cross validation will be performed.- taxon.name
character: name of the focal species or taxon. This is used primarily for annotating the ENMevaluation object and output metadata (rmm), but not necessary for analysis.
- n.bg
numeric: the number of background (or pseudo-absence) points to randomly sample over the environmental raster data (default: 10000) if background records were not already provided.
- overlap
boolean: if TRUE, calculate niche overlap statistics (Warren et al. 2008).
- overlapStat
character: niche overlap statistics to be calculated -- "D" (Schoener's D) and or "I" (Hellinger's I) -- see ?calc.niche.overlap for more details.
- user.val.grps
matrix / data frame: user-defined validation record coordinates and predictor variable values. This is used internally by
ENMnulls()
to force each null model to evaluate with empirical validation data, and does not have any current use when runningENMevaluate()
independently.- user.eval
function: custom function for specifying performance metrics not included in ENMeval. The function must first be defined and then input as the argument
user.eval
. This function should have a single argument calledvars
, which is a list that includes different data that can be used to calculate the metric. See Details below and the vignette for a worked example.- rmm
rangeModelMetadata object: if specified,
ENMevaluate()
will write metadata details for the analysis into this object, but if not, a newrangeModelMetadata
object will be generated and included in the outputENMevaluation
object.- parallel
boolean: if TRUE, run with parallel processing.
- numCores
numeric: number of cores to use for parallel processing. If NULL, all available cores will be used.
- parallelType
character: either "doParallel" or "doSNOW" (default: "doSNOW") .
- updateProgress
boolean: if TRUE, use shiny progress bar. This is only for use in shiny apps.
- quiet
boolean: if TRUE, silence all function messages (but not errors).
- occ, env, bg.coords, RMvalues, fc, occ.grp, bg.grp, method, bin.output, rasterPreds, clamp, progbar
These arguments from previous versions are backward-compatible to avoid unnecessary errors for older scripts, but in a later version these arguments will be permanently deprecated.
Value
An ENMevaluation object. See ?ENMevaluation for details and description of the columns in the results table.
Details
There are a few methodological details in the implementation of ENMeval >=2.0.0 that are important to mention. There is also a brief discussion of some points relevant to null models in ?ENMnulls.
1. By default, validation AUC is calculated with respect to the full background (training + validation). This approach follows Radosavljevic & Anderson (2014).This setting can be changed by assigning other.settings$validation.bg to "partition", which will calculate AUC with respect to the validation background only. The default value for other.settings$validation.bg is "full".
2. The continuous Boyce index (always) and AICc (when no raster is provided) are not calculated using the predicted values of the RasterStack delineating the full study extent, but instead using the predicted values for the background records. This decision to use the background only for calculating the continuous Boyce index was made to simplify the code and improve running time. The decision for AICc was made in order to allow AICc calculations for datasets that do not include raster data. See ?calc.aicc for more details, and for caveats when calculating AICc without raster data (mainly, that if the background does not adequately represent the occurrence records, users should use the raster approach, for reasons explained in the calc.aicc documentation). For both metrics, if the background records are a good representation of the study extent, there should not be much difference between this approach using the background data and the approach that uses rasters.
3. When running ENMevaluate()
without raster data, and instead adding the environmental predictor values
to the occurrence and background data tables, users may notice some differences in the results. Occurrence records
that share a raster grid cell are automatically removed when raster data is provided, but without raster data
this functionality cannot operate, and thus any such duplicate occurrence records can remain in the training data.
The Java implementation of Maxent (maxent.jar) should automatically remove these records, but the R implementation
maxnet
does not, and the bioclim()
function from the R package dismo
does not as well. Therefore,
it is up to the user to remove such records before running ENMevaluate()
when raster data are not included.
Below are descriptions of the parameters used in the other.settings, partition.settings, and user.eval arguments.
For other.settings, the options are:
*
abs.auc.diff - boolean: if TRUE, take absolute value of AUCdiff (default: TRUE)
*
pred.type - character: specifies which prediction type should be used to generate maxnet or
maxent.jar prediction rasters (default: "cloglog").
*
validation.bg - character: either "full" to calculate training and validation AUC and CBI
for cross-validation with respect to the full background (default), or "partition" (meant for
spatial partitions only) to calculate each with respect to the partitioned background only
(i.e., training occurrences are compared to training background, and validation occurrences
compared to validation background).
*
other.args - named list: any additional model arguments not specified for tuning; this can
include arguments for maxent.jar, which are described in the software's Help file.
For partition.settings, the current options are:
*
orientation - character: one of "lat_lon" (default), "lon_lat", "lat_lat", or "lon_lon" (required for block partition).
*
aggregation.factor - numeric vector: one or two numbers specifying the factor with which to aggregate the envs (default: 2)
raster to assign partitions (required for the checkerboard partitions).
*
kfolds - numeric: the number of folds (i.e., partitions) for random partitions (default: 5).
For the block partition, the orientation specifications are abbreviations for "latitude" and "longitude", and they determine the order and orientations with which the block partitioning function creates the partition groups. For example, "lat_lon" will split the occurrence localities first by latitude, then by longitude. For the checkerboard partitions, the aggregation factor specifies how much to aggregate the existing cells in the envs raster to make new spatial partitions. For example, checkerboard1 with an aggregation factor value of 2 will make the grid cells 4 times larger and then assign occurrence and background records to partition groups based on which cell they are in. The checkerboard2 partition is hierarchical, so cells are first aggregated to define groups like checkerboard1, but a second aggregation is then made to separate the resulting 2 bins into 4 bins. For checkerboard2, two different numbers can be used to specify the two levels of the hierarchy, or if a single number is inserted, that value will be used for both levels.
For user.eval, the accessible variables you have access to in order to run your custom function are below.
See the vignette for a worked example.
*
enm - ENMdetails object
*
occs.train.z - data frame: predictor variable values for training occurrences
*
occs.val.z - data frame: predictor variable values for validation occurrences
*
bg.train.z - data frame: predictor variable values for training background
*
bg.val.z - data frame: predictor variable values for validation background
*
mod.k - Model object for current partition (k)
*
nk - numeric: number of folds (i.e., partitions)
*
other.settings - named list: other settings specified in ENMevaluate()
*
partitions - character: name of the partition method (e.g., "block")
*
occs.train.pred - numeric: predictions made by mod.k for training occurrences
*
occs.val.pred - numeric: predictions made by mod.k for validation occurrences
*
bg.train.pred - numeric: predictions made by mod.k for training background
*
bg.val.pred - numeric: predictions made by mod.k for validation background
References
Muscarella, R., Galante, P. J., Soley-Guardia, M., Boria, R. A., Kass, J. M., Uriarte, M., & Anderson, R. P. (2014). ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in Ecology and Evolution, 5: 1198-1205. https://doi.org/10.1111/2041-210X.12261
Warren, D. L., Glor, R. E., Turelli, M. & Funk, D. (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-2883. https://doi.org/10.1111/j.1558-5646.2008.00482.x