Recently I have been involved with the evaluation of two monitoring program designs where the objective of the program is to detect when a metric has crossed a specific threshold that is used to trigger management. For these programs the question of how 80% statistical power can be determined was asked. Determining statistical power within the context monitoring thresholds has not always been immediately obvious to me so I built a simple simulation and wrote a summary to demonstrate how I formulated the problem.

**Summary:**

In the context of monitoring when the abundance of a population has crossed a threshold, statistical power is the probability of the abundance estimate in a given year reaching or exceeding the monitoring threshold when in fact the true population has, in reality, reached or exceeded this value. Thus, we will correctly conclude that the management trigger has been met. A key difference between this way of thinking about statistical power and more common approaches is that we aim to detect when a threshold has been crossed rather than detecting a difference between two samples. This subtle difference prevents the use of classical power analysis for determining sample size requirement as can be done for trend analysis, block designs, etc. For a statistical test determining when a threshold has been crossed, we can modify the point on the probability distribution of the metric (abundance in our case) to approximate different values of statistical power. For example, we could choose the point estimate itself as the reference such that when the point estimate crosses the threshold, we conclude that our trigger has been met (i.e. the black square points on Figure 1).

*Figure 1. Multiple hypothetical realizations of a monitoring metric (i.e. the transparent points and lines) that is estimated by drawing multiple samples from the true population mean (equal to two) and calculating the mean and standard deviations of the samples. The left panel represents probability distributions for metrics estimated from a monitoring program with low precision (as would be expected when few samples are collected) and the right panel represents metrics estimated from a program with high precision (as would be expected when many samples are collected). Management is triggered when the metric (black or red points) is at or below the trigger (the dotted line equal to 2).*

Choosing the point estimate as the reference leads to a 50% probability of concluding that the population has crossed the threshold when the true population is exactly equal to the threshold. This is because our estimation of abundance contains random aspects that will result in a 50% probability that the estimate falls above or below the true value within its probability region (assuming that the probability distribution of the abundance estimate is approximately symmetrical). This is analogous to a statistical power of 50%. Alternatively, we could choose the lower 60% confidence interval as the reference and frame the monitoring question in terms of when this point crosses the monitoring threshold (i.e. the red points on Figure 1). Choosing the lower 60% confidence interval as the reference leads to an 80% probability of concluding that the population has crossed the threshold when the true population is exactly equal to the threshold. This is analogous to a statistical power of 80%, which is the goal of many monitoring programs. Figure 1 shows how the point estimate and the lower 60% confidence interval pass the trigger at different rates when the true population mean is at the trigger. Note that approximately 50% of the black squares in Figure 1 are above and below the trigger, while about 80% of the red dots fall below the trigger. This method of linking population monitoring to management triggers has been employed for decades in commercial fisheries where it is common to manage based on population thresholds that trigger various levels of management actions.

As in the previous paragraph, one of the benefits of formulating the monitoring problem in this way is that the statistical power is independent of the variance of the metric. In other words, using the 60% confidence limit as the reference to approximate 80% statistical power holds true regardless of sampling error. However, the sensitivity of the trigger is dependent on the variance of the metric. This relationship is depicted in Figure 2.

*Figure 2. The probability of concluding that the monitoring trigger has been met for three hypothetical monitoring programs when using the lower 60% confidence limit as the reference. The three hypothetical monitoring programs differ in the precision of the population mean, and thus, the black line represents a population estimate with low precision, the red line represents a population estimate with medium precision, and the blue line represents a population estimate with high precision. The x-axis represents the true population mean.*

Figure 2 shows the probability of concluding that the monitoring trigger has been met across a range of true population means and for three different hypothetical monitoring programs. The hypothetical monitoring program that produces a population estimate with low precision (black line) results in high probabilities of concluding that the management trigger (trigger = 2) has been met well before the trigger has been met in truth (true population mean >2). This is referred to as a Type-I error, i.e., concluding that the trigger has been met when in fact it has not. However, for the hypothetical monitoring program that produces a population estimate with medium or high precision (red or blue line), there are much lower probabilities of inaccurately concluding that the monitoring trigger has been met. Thus, programs that produce more precise population estimates have a lower probability of generating false triggers, but all monitoring programs meet the 80% statistical power goal when the trigger has truly been met. This is often referred to as reversing the burden of proof because the variance of the population estimate now directly effects the Type-I error rate (α) while statistical power is fixed, whereas typical hypothesis tests fix the Type-I error rate (e.g., α = 0.05) and manipulate the variance of the estimate to attain a given statistical power. This is also referred to as the precautionary principle because the management trigger is more sensitive (and thus we are more likely to invoke management action) when we are more uncertain about the system state than when we are more certain about system state.

This is the way that I have been thinking about this problem and I invite any comments or alternative perspectives.