Some years back I was challenged with the questions of how to track fish species richness through time in a selection of freshwater lakes in Florida. I first started experimenting with some of the simpler nonparametric species richness estimators as I had seen them commonly used to answer questions about patterns in species richness in the ecological literature.
However, my preliminary analyses revealed quite unrealistic results. My annual estimates of fish species richness could be quite variable through time in a given lake, giving the appearance that fish species become extirpated and recolonize lakes at high rates from year to year. Well this cannot be because the processes driving extirpation and recolonization must happen at a slower rate for fish in isolated lakes across the state of Florida. So, why do these richness estimators perform so poorly for describing these annual trends? It’s because all of these estimators in one way or another rely on the relative numbers or detections of species in your samples to infer the number of species present that you did not observe, which is a function of the true species abundance distribution and our imperfect ability to observe it. The logic behind this is that the more rare things you observe, the more likely it is that there or rare things out there that you haven’t observed. Thus, richness estimators tend to estimate a high number of unobserved species for communities dominated by species that are rare and will estimate a lower number of unobserved species for communities that have few observed rare species. There is a serious problem with this behavior of richness estimators because the true species abundance distribution of a community can be highly variable through time and across space while the members of the community stays constant. A simple truth that illustrates this is that species will generally become rare before they become extirpated and so a species richness estimator will likely show changes in the richness of a community before the number a species actually declines. Furthermore, the efficiency of our field sampling methods can be highly variable across space and time, causing the observed species abundance distribution to appear more variable than it truly is. When this type of data is applied to many of the available species richness estimators, it will unfortunately show spurious changes in species richness.
There is some awareness of these issues in the ecological literature, as there are numerous papers evaluating the bias and precision of suites of richness estimators under different conditions. Although most of these studies reveal bias and inconsistent behavior in species richness estimators, they never (to my knowledge) evaluate the consequences of these behaviors to inference about patterns in species richness through space and/or time. This gap in the literature is unfortunate as rarely is the absolute number of species as interesting as patterns in the number of species. To fill this gap and generate awareness of these problems with many species richness estimators, some colleagues and I investigated the use of various estimators for detecting tends in species richness….and we measured estimator performance in terms of type-I and II error rates when aiming to detection linear trends across a gradient or differences between two blocks. The results are contained in the paper below and indicate that inflated type-I error rates should be expected under certain circumstances.
Our results suggest that caution should be taken when using nonparametric estimators of species richness to detect pattern in the number of species. Furthermore, estimator evaluations should always include measures of type-I and type-II error rates. These quantities can reveal the inference consequences of the dependency of estimator bias and precision on community and sampling characteristics (paper link). I think it would be ideal if the developers of various richness estimators put their methods to more rigorous tests and explore how they perform for hypothesis testing. Then perhaps they could give guidance around how to avoid spurious inference.