Lookup NU author(s): Professor Steve Juggins
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
Three large training sets were investigated to determine optimal sample sizes for diatom-based inference models. The sample sets represented (1) assemblages from Great Lakes coastlines, (2) phytoplankton from the pelagic Great Lakes and (3) surface sediment assemblages from Minnesota lakes. Diatom-based weighted average models to infer nutrient concentrations were developed for each training set. Training set sample sizes ranging from 10 to the maximum number of samples were created through random sample selection, and performance of each model was evaluated. For each model iteration, diatom-inferred (DI) nutrient data were related to stressor data (e.g., adjacent agricultural or urban development) to characterize the ability of each model to track human activities. The relationships between model performance parameters (DI-stressor correlations and model r (2), error and bias) and sample size were used to determine the minimum sample size needed to optimize models for each region. Depending on the training set, at least 40-70 samples were needed to capture the variation in diatom assemblages and environmental conditions to such a degree that non-analog situations should be rare and so should provide an unambiguous result if the model was applied to any sample assemblage from the region. It is recommended that one exercises caution when dealing with smaller training sets unless there is certainty that the selected samples reflect the regional variability in diatom assemblages and environmental conditions.
Author(s): Reavie ED, Juggins S
Publication type: Article
Publication status: Published
Journal: Aquatic Ecology
Print publication date: 23/09/2011
ISSN (print): 1386-2588
ISSN (electronic): 1573-5125
Altmetrics provided by Altmetric