Estimating wolf abundance with unverified methods

Wildlife abundance can be very difficult to estimate, especially for rare and elusive species, such as wolves. Over nearly a century, wolf scientists have developed methods for estimating abundance across large areas, which involve marked animals being detected again after capture, sometimes supplemented by observations of the associates of those marked animals. Recently, several US jurisdictions have departed from those proven methods to explore alternatives that are believed to be less expensive for wolf populations estimated >1000 individuals. The new methods sacrifice precision but are believed to retain adequate accuracy and sensitivity to changing conditions for reliable decision-making. We review evidence for the accuracy, precision, sensitivity, and reproducibility of the new “ scaled occupancy model ” (SOM) applied in Wisconsin. We conclude that the Wisconsin method would systematically overestimate wolf abundance by large (but currently incalculable) margins. Because Wisconsin, similar to other states, not only changed to unverified methods but also implemented widespread wolf-killing, shortcomings in their estimates of wolf abundance may have far-reaching consequences for population viability and confidence in state wildlife policy. We discuss findings from Wisconsin alongside similar findings for oth er states’ occupancy models being insensitive to human causes of mortality that have recently increased. Overall, Wisconsin’s method for estimating wolf abundance shows significant departures from best practices in scientific measurement. Verification will require independent replication and unbiased tests at multiple scales in multiple habitats under different human-induced mortality rates and rigorous independent review before the new methods are considered reliable.


Introduction
The abundance of wild animals is often central to policy and of interest to many publics.A century has been spent in devising and evaluating methods to estimate wildlife abundance [1].Mammalian carnivore abundances can be particularly difficult to estimate accurately, when they are naturally low-density, when their habits or habitats are inconvenient for humans, or when individuals have overlapping ranges but cannot be distinguished individually.The latter issue of counting known individuals or reliable use of mark-recapture methods has animated the field for some time [2][3][4][5][6].Scientists in many regions have raised concerns over dubious estimates of abundance influencing policy, especially when those estimates are not verified independently or the methods diverge from those validated [7,8].Such concerns have arisen lately in three jurisdictions of the USA where estimates of gray wolf (Canis lupus) abundance relied on new methods.
Wolf populations in many regions are recolonizing their former range in increasing numbers [European Union: [9]; USA: [10]].Wolf recolonization has triggered sociopolitical clashes over how to respond [11][12][13][14][15]. Different worldviews of wolves surfaced as interest groups vied for control of policy-making processes.Persistent points of friction in public policy debates over wolves center on the abundance and geographic distribution of individuals and packs and the role such variables should play in policy.
The number and geographic extent of wolves is thought to serve as a correlate of the benefits and costs of wolves to humans and to ecosystems and as a measure of the success or failure of policies.For example, decisions over the (de)listing of species under the USA Endangered Species Act largely rely on data describing number of breeding pairs and their rate of change over time.Wolves in the USA are still listed as threatened or endangered in most regions.Therefore, recent actions by some jurisdictions to change methods for estimating wolf abundance and geographic distribution have raised concerns over accuracy.
In particular, three states in the USA have recently changed methods for estimating wolves across large areas, as both the cost of the task has grown and the sociopolitical scrutiny of policy has intensified.In 2021, the Fish, Wildlife & Parks agency of Montana, USA (MT) changed from monitoring intensive efforts and a patch occupancy model (POM) [16,17] to predicting population size by combining predictions on area occupied with predictive models of home range size and pack size (called an "integrated population occupancy model" iPOM) [18].In 2021, the Fish & Game agency of Idaho (ID) changed to space-to-event models with stationary cameras [19,20].In 2021, the Department of Natural Resources of Wisconsin (WI) transformed its efforts to count every wolf into an effort to estimate state-wide population size with an "SOM" that combines estimates of pack size, pack ranging area, and occupied range, similar to MT [21,22].
Policies that liberalized killing of wolves (expanded methods, timing, and participation in hunting, trapping, and hounding) in those same states raised concerns about the sensitivity of the new methods to changes in widespread human-caused mortality [7].The same three jurisdictions substantially expanded both the methods and opportunities for private individuals to pursue or kill wolves in ID [23], MT [24], and WI [25,26].As a result, concerns arose that the new methods for estimating wolf abundance are inadequate to detect population changes [7].If the concerns are well-founded, then states would not be safeguarding populations from excessive exploitation and risk of extinction.Therefore, reliable methods to estimate wolf populations are swiftly needed not only to continue to inform policy but especially to address the consequences of regulatory changes that likely raised wolf mortality and can hinder or reverse wolf population growth [27][28][29][30][31].The reliability of new methods is particularly important to federal and state policy in those three jurisdictions and to their claims that wolf policy is science-based.Independent verification of state methods may raise public confidence in the actions and statements of public trustees whose primary legal duty is to preserve wildlife for future generations, e.g., Hughes v Oklahoma 1979 [32].

Attributes of valid methods for estimating abundance of wild animals
Reliability of measurement methods and estimation techniques in quantitative sciences can be conceptualized using the criteria of precision, accuracy, sensitivity, to changing conditions, and reproducibility.Precision increases as the margin of error ("confidence interval" or "credible interval") around the estimate decreases.Therefore, precise estimates of wildlife abundance are more certain or credible, with narrower clustering of repeated measures, and thereby produce more confidence in policy.
Accuracy refers to measurements that are neither systematically lower nor higher than the actual value, and errors are random with regard to the actual value, i.e., accurate measurements are unbiased.Measurements that are highly precise but inaccurate provide false confidence about a poor approximation of reality.Likewise, measurements that are accurate but imprecise give low confidence about a good approximation to reality and can obscure trends toward increase or decrease.
An accurate and precise estimate in one period may be reliable under stable conditions but generate biased (inaccurate) estimates or lose precision (credibility) when conditions change the range of input variables.This attribute of measurement methods we call sensitivity.
Sensitivity is achieved when methods of measurement adjust to changes across periods (i.e., generalizable).For example, a method for estimating occupancy and abundance that was designed outside of hunting seasons may not generalize to a hunted population.Insensitive measurements are not generalizable or cannot be confidently extrapolated for policy-making that reliably anticipates future conditions.
Finally, reproducibility refers to replication via independent attempts.No matter how accurate, precise, or sensitive a measurement is claimed, if independent efforts to replicate those attributes fail despite good faith efforts, the method is fundamentally unscientific.Therefore, policy-making on that basis is likely to fail in unpredictable ways.

Background to wolf abundance in USA and Wisconsin policy
As USA states try to save time and effort in estimating their wolf abundances while imposing more killing on those populations, proponents of the new methods should invite independent review and demonstrate accuracy, precision, sensitivity, and reproducibility of the methods through the usual processes of scientific debate, peer review, transparency, replication, correction, retraction, revision, and persuasion until the wolf science community comes to consensus.That was the process for the older methods.To estimate wolf abundance by meeting the above criteria, the scientific community has come to consensus over a century or so that one needs a large sample of marked animals detected again over a year or more with methods that account for migrants, variability in pack structure and territory sizes, and ideally some understanding of birth and death rates [33][34][35][36][37].
We offer the following discussion of concerns about the new Wisconsin method in hopes of informing public policy debate, whether in the USA or abroad, whether federal, state, tribal, or private.We have a particular audience in mind in the USA federal government because it collects information on these and other jurisdictions' wolf monitoring programs; wolves have been periodically protected, and still are in some areas, such as Wisconsin, under the USA Endangered Species Act (ESA).The ESA requires the Secretaries of Interior and Commerce to base determinations "solely on the basis of the best scientific and commercial data available" (16 USC.§ § 1531 Sec 4(b)(1)(A) and Sec.7 "Interagency cooperation"), including those to relist species.Moreover, monitoring and population estimation methods are an essential part of, or pre-condition to, each of the five factors the USFWS should analyze when making those determinations.For examples of two factors, population estimates are essential for assessing "overutilization" or for assessing the (in)adequacy of regulatory mechanisms in states hosting wolves (16 USAC.§ § 1531 Sec 4(a)(1) "Determination of Endangered Species and Threatened Species").We also note the many invested interest groups including state and tribal governments, litigants, lobbyists, and the public who continue to pay careful attention to rhetoric and to science in wolf policy.Setting aside the important (but rare) debate over whether the number of wolves (census population size) is a more meaningful or useful measure for policy decisions than the number of packs, geographic distribution, net benefits-costs, or estimates of viability (e.g., effective population size), we focus here on the reliability of the science brought to recent changes to methods for estimating abundances.
Before proceeding to the case of Wisconsin's new method for estimating wolf abundance and geographic distribution, we have a disclaimer and two acknowledgments.We acknowledge that counting wolves and many other rare and elusive wildlife is very difficult, time-consuming, and expensive, and the eventual estimates are often disputed.Indeed, mitigating costs, difficulty, and controversy in part motivated the recent methodological shifts for estimating wolves.As publicly funded scientists, we see it as our duty to evaluate the methods, and their assumptions and application, for the broadest public to judge policy [38].Second, we recognize that government scientists in all jurisdictions are increasingly pressured by political appointees to deliver answers that the politicians like.None of our writing should be construed as a personal criticism of state agency scientists.Our disclaimer is that we will not compare the current methods to the previous method.However, in every case when we identify a current problem, we point to a feasible solution.We hope our comments are viewed in the spirit intended, as constructive criticism.
Wisconsin's SOM applied in 2022: The new method adopted by WI's Department of Natural Resources (WDNR) in 2022 [21] takes the raw data from the traditional "territory mapping method" and replaces the last step of the population estimation process with new steps.The traditional method involved primarily snow-track surveys, supplemented by aerial telemetry, and observation of a small number of collared wolves supplemented by summer howling surveys to detect litters of pups [39][40][41].The final step of the traditional method was a public process (from 1996 approximately to 2012) of counting the number of wolf packs; physically placing a marker on a large map of the state in public; and writing the estimated number of wolves on each pack's location for summing the total.From 2012 to 2017, it was somewhat less public [38].Later, the traditional method was supplemented by early versions of the SOM and then supplanted in 2022.
The WDNR SOM [21] took the data from the traditional counting method (snow-track surveys and telemetry) and replaced the population estimation process described above with a modelassisted estimate.In essence, the SOM is a sampling method while the traditional method attempted a complete enumeration analogous to the USA census.The new steps in the SOM treat evidence of wolves as the basis for probable pack occupancy across a large number of relatively small survey blocks and then apply an estimate of home range size and pack size to extrapolate the state-wide population across estimated occupied range.These new steps rely on the predictive model called the "scaled occupancy model" first recommended for WI wolves by Stauffer et al. [22].
The WDNR deserves praise for two aspects of their transition to the use of the SOM.First, they have continued to use actual field snow tracking observations collected by many community volunteers to supplement state, tribal, and federal staff efforts.Second, the WDNR used both methods side by side for several years, so that scientists and the public could see how the SOM outputs related to the traditional census.Although the raw data have not been provided despite our repeated requests, the final outputs and their uncertainty (precision) can be examined for several years before the 2022 implementation of the SOM alone in 2022 [21].That period of simultaneous traditional method and SOM were therefore conducted under similar management and ecological conditions.However, the decision to adapt [22] to the situation in 2022 [21], carries with it both the typical scientific assumption that a model from the past can predict the present and future and also a secondary, more tenuous assumption.The tenuous assumption is that the model of Stauffer et al. [22] can generate reliable estimates when conditions are dramatically different in input conditions (sensitivity).For example, Stauffer et al.'s [22] model was developed in the absence of legal wolfkilling and compared to the traditional method entirely under conditions without wolf-hunting or state-sanctioned killing of suspected predators of domestic animals [22].The policy change of the 2021 wolf-hunt [25], the first since 2014, and the novelty of such a hunt during breeding season raises questions about the relevance of Stauffer et al.'s model [22] to any period with legal wolf-killing.We examine that model's relevance after describing the WDNR SOM of 2022 [21].
The SOM has three components, each estimated separately: an estimate of the area occupied by wolves, an estimate of territory size, and an estimate of pack size.To understand the SOM, one must scrutinize the methods for estimating each of the three components.Below, we focus on precision and sensitivity of the measurement methods and spend less time discussing accuracy because we still await WDNR data >9 months after our first of two requests.

Input data
Input data from winter track surveys: In 2022, the WDNR reported core range of wolves and evidence of wolves based on winter track surveys, telemetry, and other observations less systematically described by WDNR [21].For winter track surveys, community volunteers and government agents traveled survey blocks of irregular size and shape.They traveled these blocks 0-13 times during the two winters of 2021 and 2022 with one modification.Because "…of the Feb 2021 wolf-hunt, only 14,000 km of pre-hunt survey effort were considered in the model" (emphasis added) [21].Importantly, the boldface phrase emphasizes how, for the first time ever, two winters' worth of field data were combined for the September 2022 estimate of the state-wide wolf population.Also, estimates of wolf presence were used over four years to designate core wolf range in 2022 as explained next.

Input data for area occupied by wolves
Winter tracking surveys provide inputs for the area putatively occupied by wolves.The description of this method differs in two documents [21] and its Addendum as follows: The Addendum states, "winter tracking blocks with 'confirmed pack activity' during at least one of the previous four years" (emphasis added, p. 3), but later defined it "as winter tracking blocks with confirmed pack activity during the previous four years" [emphasis added, p. 26 of the Addendum to WDNR [21]].The two sources create confusion that cannot be resolved simply by a query.Rather the WDNR should resolve the confusion by publishing a correction.Until then, the method cannot be reproduced (a hallmark of science).Subsequent statements do not clarify: "Four years was identified as the number of years which allowed the core range to respond to possible expansions and contractions of wolf range, while minimizing inclusion or exclusion based on transient wolf movements or imperfect detection of wolves in pack-occupied areas" (p.26) [21].We fail to see any scientific evidence for those statements.
In sum, the data input to the SOM comes from previous years (two winters of snow-track surveys and four years of packoccupied area estimates).This may help explain unusual claims by WDNR.First, WDNR [21] claimed that the occupied area increased to 28,824 mi 2 when compared to the pre-hunt occupied area reported in the 2020-2021 population report (28,493 mi 2 ).That is an unusual claim after the high wolf death toll in 2021, estimated at more than 27%-33% of the population 12 months earlier [25].Second, in 2020, when the census method was compared to the SOM method using the same data, the SOM estimate produced a much higher estimate of the state-wide number of wolf packs whose lower bounds did not include the traditional method with its narrow bounds (high precision).Therefore, prior years in which both traditional and SOM method were used would lead us to expect the SOM to overestimate the number of packs in the state.There is no apparent effort in WDNR's [21] report to address or remedy the issue.
Several unjustified steps in handling the input data may explain this apparent range expansion and the surprising stability in number of packs after the February 2021 wolf-hunt (described below).

Data handling and analysis
An area was included in 'core wolf range' if either of the following criteria was met: "Tracks from at least two wolves were observed within a block during a single tracking event" or "Single wolf tracks were observed in a grid during separate surveys within a tracking season" (p.26 and 28, respectively).Because single tracks in the snow do not confirm the presence of a pack, this step of the analysis can inflate the estimated 'core wolf range' that will be later multiplied by pack size.Also, 25 survey blocks (of 156; 16%) were surveyed 0-1 times in winter 2021-2022.The tracking data used from 2020 to 2021 is equally or more concerning, as there were no surveys done on 21 blocks (of 154 blocks, 16%) and only one survey on 50 blocks (32%).Surveying a block once or twice makes it impossible to verify or replicate the first count and impossible to rule out double-counting the same wolf laying down two sets of tracks at different times.It also departs from decades of precedent by the WDNR [41][42][43][44].A departure from past methods is seen in the use of snow-track survey data collected with fewer than three surveys per census block.Indeed, by estimating area occupied for survey blocks surveyed <3 times [62 of 154 survey blocks, 40%; Fig. 5, p. 18 in WDNR [21]], the WDNR violates its own quality control rules established in the early 2000s [44][45][46].We had previously warned that changing methods inflated the variability (imprecision) of population estimates after methods of counting wolves changed three to four times from 1979 to 2012 [47].
Moreover, tracks in the snow can be of non-wolves, transient lone wolves, or packs that have disappeared after the first year they were detected.Wolves that dispersed since being counted or spend most of their time in tribal reservations, Michigan, or Minnesota, might also have left a track that resulted in a survey block being included in "core wolf range," which was not clearly defined in WDNR [21].For example, numerous wolves made long-lasting, long-distance extraterritorial movements [48], which might have produced tracks in areas unoccupied by wolf packs.This concern is only heightened by the potential proliferation of false positives (identifying a pack where none exists) due to the social disruption and pack disbandment that may be caused by wolf-killing [49][50][51][52][53].Many packs could have disappeared given increases in humanmortality throughout wolf range during and after the February 2021 wolf-hunt that killed 218 wolves legally and >100 illegally [25].Four to five dozen surviving wolves also faced killing by USDA-WS in summer 2021 [54].
To illustrate our concern with inflating occupied range with spurious packs, we call attention to survey block 167 in the winter 2020-2021 population report.Survey block 167 appeared in the report for 2020-2021 [55] but not in the report for 2021-2022 [21], with no explanation for its omission.No surveys were conducted there in winter 2021-2022 [Figure 5, p. 18 in WDNR [21]], yet it is included in pack-occupied wolf range and core range [Figures 6, 7, and 13 in WDNR [21]].The latter figures show occupancy probabilities, wolf density estimates, and included and excluded survey blocks.Indeed, there is a probability of occupancy of survey block 167 in Figure 6 despite no snow-track survey there, no anecdotal observation of wolves there [Figure 1, p. 14 in WDNR [21]], and no verified complaints of domesticated animal loss there in 2021-2022 [Figure 2, p. 15 in WDNR [21]].This is direct evidence that WDNR [21] used data from prior years to estimate wolf numbers.Survey block 167 is only unique in its obvious position outside the main population range, making it obvious on visual inspection.Currently, we have no way of knowing how many other survey blocks were populated with wolf tracks from years before the February 2021 wolf-hunt.
Finally, for occupied range, we have concerns about neighboring and included jurisdictions whose wolves should not legally be included in the state-wide count [56].A basic principle of spatial models such as the SOM is to accommodate the boundaries with unoccupied or unstudied areas, such as Michigan, Minnesota, large bodies of water such as Lake Superior, and tribal reservations such as Menominee County.Exclusion of these regions and care in handling adjacent areas would avoid simulating wolves that do not exist or double-counting wolves counted by neighboring jurisdictions.However, WDNR's [21] report and Stauffer et al.'s [22] model are not transparent on these issues.If a wolf pack territory were simulated to overlap any of these neighboring areas, the statewide count would overestimate the wolf population size by counting an area as occupied by a pack full-time when it was in fact unoccupied water or claimed by neighboring jurisdictions.The SOM grid cells neighboring those problematic areas should be deleted to avoid the state claiming wolves that spend most of their time in non-Wisconsin areas, especially tribal reservations.In sum, we have numerous concerns about overestimation bias and arbitrary and capricious inclusion and exclusion criteria.
The extent of the overestimate is impossible for us to disentangle without greater transparency from the WDNR.One dubious result is that the number of packs remain virtually unchanged between years: 292 packs in 2020-2021 and 288 packs in 2021-2022.We surmise the introduction of data from winter 2020 to 2021 (and earlier years) raises the likelihood that WDNR's [21] report populates the state with simulated wolves.Due to lack of transparency about the number of data points collected prior to December 2021, inclusion or exclusion of other jurisdictions' wolves, and how the WDNR handled surveys that counted only one wolf (in the majority of efforts in that block), we cannot correct how much their overestimation bias inflates the state wolf population.

Occupancy model
The inputs from above are survey blocks ostensibly occupied by wolf packs.To assign probability of occupancy given the input data, a land cover model was built, based on forest cover, proportion of agricultural and developed land (2016 National Land Cover Data, NLCD), and road density [22].A major concern with the exclusive use of such variables is that none of them change annually or even considerably over five years or longer.Hence, any changes in conditions beyond those captured by the model but that affect occupancy, such as increases in human-caused mortality, would not affect the occupancy estimates (i.e., the model is insensitive).Without ground-truthing, the model used to estimate occupancy is also speculative.Any event that changed the land cover or the presence of wolves since the land cover model was produced would alter the probability that there are packs there now.In particular, the February 2021 wolfhunt would have changed occupancy probabilities, as would any future increased use of lethal methods.That means the assumed probability of wolves occupying a given grid cell is of potential (or simulated) packs, not counted packs.
The above oversight is troubling given the many studies showing that human-induced killing affects wolf population, pack, and individual responses, sometimes beyond their numerical effects [28,29,33,34,36,37,[49][50][51][57][58][59][60][61], and increased risk of illegal killings [30,31,62,63].Liberalizing lethal methods is a pervasive condition that structures wolf population dynamics, whereas Stauffer et al.'s [22] predictive model had underlying assumptions of full protections and saturation of range with packs, and therefore different population dynamics that are unaccounted for by the 2022 data.Although such disruption of wolf dynamics and their effect on occupancy estimates are not addressed in their article, Stauffer et al. [22] do note that "a situation with highly dispersed packs and extensive interstitial space is more subject to positive bias than a situation where packs are clustered" (p.1418).Given the above, assigning occupancy probabilities based on forest, agriculture, and roads without considering public hunts as a critical model variable can only produce overestimates of occupancy probability, never underestimates.Moreover, Stauffer et al.'s [22] model was not validated with data following wolf-hunts in WI.Rather, the model was validated on data from winters 2016-2017 to 2018-2019, which came several years after the wolf-hunts of 2012-2014, despite the WDNR authors having had access to all the requisite data [22].We are unclear why they chose to validate their model with the small subset of all the data at their disposal.This is an unjustified assumption and step in that article.Therefore, it is unknown whether the model of Stauffer et al. [22] is sensitive to changes in wolf populations after a wolf-hunt, especially an unprecedented hunt in timing, methods, and proportion of the wolf population estimated to have been killed legally and illegally [25].Indeed, the model of Stauffer et al. [22] was exclusively validated with data of periods during which wolves enjoyed full ESA protections (classified as "endangered"), and hence, we suggest this is an underlying and substantial model assumption that has remained unidentified but that challenges the model's application to liberalized killing anywhere at any time but for the two years they hand-picked for validation.The assumptions of Stauffer et al. [22] are numerous: habitat saturation, static land cover variables, little overlap in wolf pack territories with neighbors and with independent jurisdictions, and no legal wolf-killing (except for human health and safety).They also assume the average territory size and average pack sizes represent accurately the entire state's wolf packs, and normal distributions around those averages.Some of those assumptions were not made clear in Stauffer et al.'s study [22], despite peer review, but others were transparent.For example, Stauffer et al. [22] stipulated that "If the number of occupied sites adequately describes the distribution of home ranges (i.e., ψ is appropriately defined and interpreted, and is unbiased), then population abundance can be derived…" (p.411).The WDNR has not addressed such assumptions in [21] and we show the numerous sources of overestimation bias that make us skeptical.Therefore, WDNR [21] has misapplied a model to a new set of conditions, without validation.The claim that they use a peer-reviewed method is not accurate with regard to its application.This is analogous to treating an illness with a therapy validated for a different health condition dissimilar from the former.

Wolf pack home range size
Next, the pack-occupied area with some unspecified probability of occupancy was divided into estimated wolf pack territories statewide.These wolf pack areas are variously referred to as home ranges or territories because we lack systematic information on whether the ranges were defended as territories.The WDNR estimated pack home ranges from a small number of GPS-collared wolves monitored by telemetry [21].WDNR [21] relied on fewer than two dozen GPS-collared wolves (<3% of the population by all estimates) from December 2021 to April 2022 to estimate home range size and did not reveal which zones those wolves ranged within.This is a small sample compared to past years, e.g., Wydeven et al. [41] reported 13% of the population on average.Moreover, the agency did not report the number of GPS data points (e.g., number of locations per year) used in WDNR's [21] report.This small sample raises concerns over the estimates themselves because such estimates are dependent on the number of locations, with small sample sizes leading to underestimation of territory sizes, and thus overestimation of the number of territories [64].
Regarding how the GPS locations were transformed into territory sizes, the WDNR Addendum describes a highly subjective process in six steps [p.28, 29, [21]].Because an individual analyst's subjective judgments were used for an unstated proportion of the ranges, we doubt the process could be replicated by others (irreproducible).Moreover, the model provides no measures of sensitivity to changes in such an estimate, overall or by zone.Namely, WDNR [21] employs a single value for state-wide average home range size (171 km 2 ), despite their own data revealing that home ranges differ markedly by zone and have changed over time [41].One step to avoid overestimation bias if one does not have unbiased, reliable zone-specific home range estimates is to use the median from all wolves that were observed in the same manner (e.g., via GPS collars).We recommend more effort at collaring if the state retains the SOM approach.

Wolf pack size
Next, the estimated state-wide number of territories were multiplied by estimated pack size for a total state-wide count (perhaps omitting tribal reservations).Home range shape, as well as grid placement relative to where the actual territory boundaries lie (and their spatial relationship), could materially affect the number of estimated territories within state lines: Stauffer et al.'s [22] best simulation exercise(s) with saturated territories (the assumption made for Wisconsin core wolf range) still resulted in an overestimation in number of territories of 1.38-1.9times the true value [see Figure 7, [22]].That assumption of range saturation was made for two or more winters after the last fall wolf-hunt, during periods of full protections, and still resulted in considerable overestimation.Yet, WDNR [21] applied that assumption to a period of one year after an unprecedented February wolf-hunt, and we cannot find any evidence of attempts to acknowledge or correct for such overestimation.
Moreover, WDNR [21] uses a zone-specific average pack size ranging from 4.13 to 2.7 adult wolves.It is a well-known statistical principle that an average is pulled high or low by extreme values, and in this case, the range of values for pack sizes is bounded by two on the left and 12-13 on the right historically [40,41].Therefore, extreme pack sizes and inflated counts in a particular survey block can pull the zone-specific average up artificially.Indeed, the median is a safer measure because it is defined as the point estimate at which half of the wolf packs are smaller and half are larger.We recommend the WDNR present all pack size counts and how that pack was counted in each survey block.This would enable independent review and replication of their estimates.Previous work recommended methods and communications that would enhance transparency [26,47]. 1

Bias and uncertainty
Our conclusion of overestimation bias should be considered in light of the very low probability of an underestimation bias.We can imagine an unlikely set of conditions under which the WDNR [21] SOM approach would under-estimate the state wolf population.First, the 2022 territory size would have to be a significant overestimate.If pack ranges were significantly smaller than 171.5 km 2 , then the state could fit many more (smaller) packs into the state.Although there is a low probability that the few GPS-collared wolves had extremely large home ranges, resulting in fewer packs being estimated state-wide by the WDNR SOM method, the effects would be unclear because larger packs use larger ranges.Hence, the number of territories and the sizes of the packs that use those territories tend to be inversely related [34], so tweaks to pack size and range size tend to cancel each other out when it comes to estimating state-wide number of wolves.By contrast, the risk of double-counting a pack would create a more consistent overestimation bias.Every time a pack is mistakenly doublecounted, that inflates the state-wide count, so imagining an underestimate of the state-wide total requires zero double-counting.Avoiding double-counting would depend on survey blocks having been chosen to encompass tracks of only one pack, zero tracks of other packs, and no loners or transients.Double-counting is practically guaranteed by the WDNR [21] method of incorporating data from prior years to define pack-occupied range.Furthermore, the occupancy model designates its area of analysis after data are collected on any verified wolf sightings across the state, so it seems implausible that the actual occupied wolf range exceeds the bounds of analysis (indeed Figure 6, p. 9 suggests the converse, margins have 0%-20% probability of occupancy).Moreover, the very small packs that historically occur on the periphery of the wolf population would not contribute meaningfully to the entire state estimate.Therefore, an under-estimate of the state wolf population seems to depend on an extremely rare set of conditions and errors in data collection for which we have no evidence.Given how difficult it is to argue for an under-estimate, the reasonable conclusion is that our concerns about overestimation deserve the focus of independent researchers and reconsideration by the WDNR.
The three prior estimates (area occupied, territory size, and pack size) were not fixed values themselves but rather varied across space and over time in response to ecological conditions, including human influences that permeate wolf range, such as hunting.Because each component of the SOM is estimated separately, they each carry their own uncertainty (i.e., precision), which compound in the final population estimate to broaden the uncertainty.The result is a broad range of uncertainty around single values and a broad range of likely values.The meaning of a broad range of likely values has been lost in WDNR's report [21] and its associated greensheet because both documents identify a single point estimate for the state population size.This so-called "mode" does not seem to us to deserve the attention it garnered from the Natural Resources Board (accessed 9 April 2023, see time stamp two hours) [65].In statistical terms, the mode is the most frequently measured value.Yet, points around the mode may be near or equally credible in a model such as the Stauffer et al. model [22].We are dubious this mode can be defended on statistical grounds, so we call for the WDNR to publish the probability distributions and revise its confident claim about the state-wide wolf population estimate to a range of values.Also, WDNR [21] presents little or no data on breeding success, so the estimate of abundance in 2022 is based on presumed breeding in summer 2021 [26] without systematic evidence for pup survival to winter 2021-2022.We demonstrated how WDNR might communicate uncertainty about abundance in the latter article; therefore, we do not repeat the many recommendations on transparency and even-handed explanations of scientific uncertainty.

Conclusion
We recommend future efforts start with repeated, independent counts of two or more wolves as the only evidence for packoccupied areas, while also noting the possible overestimation bias inherent to that assumption (not all pairs of wolves are a pack, some pairs will cross multiple survey blocks, etc.).Then, ground-truthing should estimate how often that assumption is violated in each of the six management zones and deduct a proportion of packs accordingly.
We also recommend the use of the number of packs as more informative than counts of individual wolves for projecting population dynamics.We also consider packs more informative for evaluating the benefits and costs of wolf coexistence with other organisms such as deer, pathogens, domesticated animals, and humans.The WDNR's [21] approach is not conservative, given it is estimating abundance of individual wolves with a coarse and outdated occupancy model that does not consider increases in mortality.Moreover, WDNR [21] repeatedly failed to assess asof the published model.A conservative approach would reduce the risk of falling below legal and social thresholds [26] and lead to underestimation bias, not the current overestimation bias.Furthermore, we recommend the WDNR continue its practice of enumerating wolf packs on tribal reservations separately, rather than treating them as the state-wide total as was done by Stauffer et al. [22] and the Natural Resource Board meeting of September 2022 [65] (see time stamp two hours, accessed 9 April 2023).We cannot estimate the number of wolves that are not under state authority yet were counted in the state wolf population estimate [21] because the data on locations have not been presented transparently.
whom, collar, range of counts per survey attempt).Hence, there should be no concerns over concealing pack locations to avoid illegal killing.
We recognize that agency staff often face intense scrutiny and probably undue and uncomfortable political pressure to work quickly toward political goals.We call for the insulation of agency scientists from political appointees whenand this is the critical stepthe agency scientists are asked a scientific question such as "how many wolves live in Wisconsin?"That question is purely scientific and has no relationship to "how many wolves should live in Wisconsin?" or "ought we to hunt wolves?", which are value-based questions they are not trained to address [66,67].The public and politicians should transparently debate the value-based questions [66,67].Similarly, the politicians should steer clear of the scientific question because they are not trained to set aside their personal preferences when evaluating fact claims [68].Likewise, scientists should not treat value-based debates as if science resolved those.Scientists can respond as do other members of the public when addressing values.They should not treat valuebased assumptions and policy decisions as if their scientific expertise gave their opinions more weight, nor treat such issues as if science resolved the issues [68].Nor should scientists who participate in hunting mislead the public or themselves that they are more objective, impartial, or unbiased than scientists who do not participate in hunting [47,67,69,70].Science respects no authority or expertise.
Finally, results of predictive models developed and validated under particular conditions are limited to those conditions [7].Scientific confidence in the models declines as the extrapolation to novel conditions expands and as the assumptions of the original model are violated [7].For models like Stauffer et al.'s [22] model that multiply three different estimates or models together, the uncertainties multiply [7].We shown irreproducible input values and several, large biases, insensitivities, and imprecisions in how the WDNR went about relating observations of tracks in the snow to the pack-occupied range and pack size in WDNR [21].Therefore, we conclude that WDNR [21] is unreliable because it systematically overestimates the state-wide wolf abundance, particularly when hunting seasons and killing are in effect or recently passed.Moreover, given the WDNR is required by law to hold a wolf-hunt if the wolf is not federally listed, future agency policy would clearly violate the model assumption of full protections underlying the modeling methods [21,22].