Skip to main content

An approximate point-based alternative for the estimation of variance under big BAF sampling

Abstract

Background

A new variance estimator is derived and tested for big BAF (Basal Area Factor) sampling which is a forest inventory system that utilizes Bitterlich sampling (point sampling) with two BAF sizes, a small BAF for tree counts and a larger BAF on which tree measurements are made usually including DBHs and heights needed for volume estimation.

Methods

The new estimator is derived using the Delta method from an existing formulation of the big BAF estimator as consisting of three sample means. The new formula is compared to existing big BAF estimators including a popular estimator based on Bruce’s formula.

Results

Several computer simulation studies were conducted comparing the new variance estimator to all known variance estimators for big BAF currently in the forest inventory literature. In simulations the new estimator performed well and comparably to existing variance formulas.

Conclusions

A possible advantage of the new estimator is that it does not require the assumption of negligible correlation between basal area counts on the small BAF factor and volume-basal area ratios based on the large BAF factor selection trees, an assumption required by all previous big BAF variance estimation formulas. Although this correlation was negligible on the simulation stands used in this study, it is conceivable that the correlation could be significant in some forest types, such as those in which the DBH-height relationship can be affected substantially by density perhaps through competition. We derived a formula that can be used to estimate the covariance between estimates of mean basal area and the ratio of estimates of mean volume and mean basal area. We also mathematically derived expressions for bias in the big BAF estimator that can be used to show the bias approaches zero in large samples on the order of \(\frac {1}{n}\) where n is the number of sample points.

Background

The big BAF (Basal Area Factor) estimator for forest inventory is based on horizontal point sampling (HPS) (also called Bitterlich sampling) using angle gauges having two different BAFs at each sample point in the field—the smaller BAF angle gauge BAFc is used to obtain a count of qualifying trees at each sample point and the larger BAF angle gauge BAFv is used to select sample trees on which careful measurements are made. Usually these careful measurements include measurement of DBH and height from which tree volume, weight or biomass can be estimated (Bell et al. 1983; Bruce 1961; Oderwald and Jones 1992). Although big BAF sampling is a form of double sampling, as Marshall et al. (2004) indicated, it differs from some common forms of double sampling in forest inventory such as double sampling with regression estimators because both samples are taken at each point, so that the small sample is not simply a subset of the large sample point locations (discussions of double sampling in forest inventory include Gregoire and Valentine (2008, p. 262) and de Vries (1986, p. 164) among others).

As indicated by Iles (2012) the history of big BAF sampling may go back to Grosenbaugh (1952, p. 53). An early proposal for using two prism factors in big BAF sampling was given by Bell et al. (1983, p. 702) and later a detailed description of big BAF sampling was given by Marshall et al. (2004). A detailed review of the history of big BAF sampling was given a recent treatment by Gove et al. (2020) who compared variance estimation methods which have been proposed for the method. Recent texts of forest sampling and mensuration which include descriptions of big BAF sampling include Gregoire and Valentine (2008, p. 268); Kershaw et al. (2016, p. 377).

Big BAF methods have been used operationally to inventory forests in the USA and Canada in both western and eastern forest types (Corrin 1998; Desmarais 2002). Brooks (2006) compared combinations of 13 “big” BAFs and 6 “small” BAFs to an inventory using fixed-radius plots. Rice et al. (2014) compared a number of forest sampling methods including big BAF, HPS with various BAFs, horizontal line sampling and fixed-radius plot sampling. These studies were conducted in partial harvests in mixed species Acadian forests of northern Maine. A comparison of the results of these forest inventories showed that only the smallest BAF for HPS had a standard error smaller than the big BAF inventory. Methods for determination of optimal sampling plans for big BAF were described by Yang et al. (2017). These methods allow for choice of optimal combinations of BAFs and sample sizes for big BAF according to economic criteria. Chen et al. (2019) used these results to devise practical cost-efficient plans for estimation of forest carbon content using big BAF sampling for forest populations in the northeastern USA. Yang and Burkhart (2019) compared big BAF sampling to two other methods of subsampling count trees on point samples using simulated loblolly pine (Pinus taeda L.) plantations and found all three methods were satisfactory for estimating stand volume.

Despite the successes and evident utility of big BAF sampling variance estimation remains challenging. Moreover, the basic estimator associated with big BAF sampling is not, itself, design-unbiased. Gove et al. (2020) showed that one of the traditional methods for estimation of the variance in big BAF sampling could be derived using the Delta method if the covariance terms are assumed to be negligible. As described by Gove et al. (2020) the history of the Delta method, which is based on using a Taylor series approximation for nonlinear functions of random variables, has been traced by Ver Hoef (2012). Wolter (2007, p. 231) states that utilization of the method with a first-order approximation as done in this article has often provided satisfactory variance estimates for large complex surveys. The primary objective of this study is use of the Delta method to derive a new variance estimation formula for big BAF sampling that takes correlations among important sampling variables into account. As indicated below the traditional approaches to variance estimation for big BAF sampling ignore possible covariances between count basal area obtained by using BAFc and the volume per square unit of basal area on trees sampled with BAFv. An additional objective of this study is to test this newly derived estimator using Monte Carlo simulations to compare it with the variance estimators that have been previously proposed for big BAF sampling. We also derive expressions for the bias associated with the big BAF estimator.

Basic big BAF estimation formulas

Two BAF factors are needed for big BAF, a small BAF factor \(\mathcal {F}_{{\mathrm {c}}}\) used to select trees which are counted but not measured and a larger BAF factor \(\mathcal {F}_{\mathrm {v}}\) used to select trees on which measurements are made usually including dbh and height for volume, weight or biomass estimation.

To obtain the big BAF estimator we first express the volume to basal area ratio (VBAR) for each tree i selected by the larger BAF, \(\mathcal {F}_{\mathrm {v}}\):

$$ \mathbb{V}_{i} = \frac{v_{i}}{b_{i}} $$
(1)

where vi is the volume of tree i and bi is the basal area of tree i where there are \(\phantom {\dot {i}\!}i = 1, 2,..., {m_{\mathrm {v}_{s}}}\) trees on each of a sample of s=1,2,...,n points (Kershaw et al.2016, p. 377). The average VBAR is then (Gregoire and Valentine2008, p. 258):

$$ \bar{\mathbb{V}}= \frac{1}{m_{\mathrm{v}}}\sum_{s=1}^{n} \sum_{i=1}^{{m_{\mathrm{v}_{s}}}} \mathbb{V}_{i} $$
(2)

where \(\phantom {\dot {i}\!}m_{\mathrm {v}} = \sum _{s=1}^{n} {m_{\mathrm {v}_{s}}}\) is the total number of volume trees sampled on all points. Note that it is theoretically possible that the same tree may be sampled from more than one point and thus could possibly be counted multiple times. The average basal area per hectare for the entire sample is;

$$ \bar{B}_{{\mathrm{c}}} = \frac{\mathcal{F}_{{\mathrm{c}}}}{n} m_{{\mathrm{c}}} = \bar{m}_{{\mathrm{c}}} \mathcal{F}_{{\mathrm{c}}} $$
(3)

where \(\phantom {\dot {i}\!}m_{{\mathrm {c}}} = \sum _{s=1}^{n} {m_{{\mathrm {c}}_{s}}}\) and \(\phantom {\dot {i}\!}{m_{{\mathrm {c}}_{s}}}\) is the number of count sample trees counted at point s using the smaller angle gauge BAFc. Total basal area on a tract of size A is then:

$$ \hat{B}_{{\mathrm{c}}} = A \times \bar{B}_{{\mathrm{c}}} $$
(4)

The big BAF volume estimator can then be obtained by multiplying the sample mean VBAR by the count-based basal area as:

$$ {\hat{V}_{\mathcal{B}}} = \bar{\mathbb{V}}\times \hat{B}_{{\mathrm{c}}} $$
(5)

Gove et al. (2020) and Iles (2012) have indicated that the variance for the big BAF volume estimator cannot be obtained by using the traditional formula for double sampling from survey sampling theory and practice. This is because the smaller sample in big BAF sampling is not a smaller selection of the total number of sample points but instead the smaller sample actually occurs at each sample point, thus the point-wise sample sizes utilized by the sample survey double sampling formula are equal, which is contrary to the requirements of that formula. However, because the estimator above is a product of two random variables one may employ standard methods for expressing the variance of a product. By applying the formula of Goodman (1960) for the variance of a product as noted by Gove et al. (2020) one obtains:

$$ \widehat{\text{var}}_{\mathrm{G}}\!\left({\bar{\mathbb{V}}\hat{B}_{{\mathrm{c}}}}\right) = \bar{\mathbb{V}}^{2} \widehat{\text{var}}\!\left({\hat{B}_{{\mathrm{c}}}}\right) + \hat{B}_{{\mathrm{c}}}^{2} \widehat{\text{var}}\!\left({\bar{\mathbb{V}}}\right) - \widehat{\text{var}}\!\left({\bar{\mathbb{V}}} \right) \widehat{\text{var}}\!\left({\hat{B}_{{\mathrm{c}}}}\right) $$
(6)

This formula assumes that the covariance between \(\bar {\mathbb {V}}\) and \(\hat {B}_{{\mathrm {c}}}\) is zero or negligible for practical purposes. This may be expected to be true if VBARs are not greatly affected by density in the forest population being sampled. The variance estimators of \(\bar {\mathbb {V}}\) and \(\hat {B}_{{\mathrm {c}}}\) are given by Gregoire and Valentine (2008, p. 256–257) as:

$$\begin{array}{*{20}l} \widehat{\text{var}}\!\left({\bar{\mathbb{V}}}\right) &= \frac{1}{m_{\mathrm{v}}(m_{\mathrm{v}}-1)} \sum_{s=1}^{n} \sum_{i=1}^{{m_{\mathrm{v}_{s}}}} \left(\mathbb{V}_{i} - \bar{\mathbb{V}} \right)^{2} \end{array} $$
(7)

and

$$\begin{array}{*{20}l} \widehat{\text{var}}\!\left({\hat{B}_{{\mathrm{c}}}}\right) &= \frac{1}{n(n-1)} \sum_{s=1}^{n}\left(\hat{B}_{{\mathrm{c}}_{s}} - \hat{B}_{{\mathrm{c}}}\right)^{2} \notag \\ &= \frac{\widehat{\text{var}}\!\left({\hat{B}_{\mathrm{c}_{s}}}\right)}{n} \end{array} $$
(8)

where \(\hat {B}_{{\mathrm {c}}_{s}} = m_{s}\mathcal {F}_{{\mathrm {c}}}\) is the basal area per hectare at point s and ms is the number of count trees at point s with the small basal area factor BAFc.

Equivalent variance expressions are found in Kershaw et al. (2016, p. 380). Standard error estimates are:

$$\begin{array}{*{20}l} \widehat{\text{se}}\!\left(\bar{\mathbb{V}}\right) &= \sqrt{\widehat{\text{var}}\!\left({\bar{\mathbb{V}}} \right) } \end{array} $$
(9)

and

$$\begin{array}{*{20}l} \widehat{\text{se}}\!\left(\hat{B}_{{\mathrm{c}}}\right) &= \sqrt{\widehat{\text{var}}\!\left({\hat{B}_{{\mathrm{c}}}} \right)} \end{array} $$
(10)

A simplified estimator of \(\text {var}\!\left ({{\hat {V}_{\mathcal {B}}}}\right)\) can be obatined by using the formula of Bruce (1961). Written in percent standard error form the equation of Bruce (1961) gives the following estimator:

$$ \widehat{\mathrm{se\%}}\!\left({{\hat{V}_{\mathcal{B}}}}\right) = \sqrt{\widehat{\mathrm{se\%}}\!\left({\bar{\mathbb{V}}}\right)^{2} + \widehat{\mathrm{se\%}}\!\left({\hat{B}_{{\mathrm{c}}}}\right)^{2}} $$
(11)

As indicated by Gove et al. (2020); Marshall et al. (2004); Gregoire and Valentine (2008, p. 259); Bell and Alexander (1957) were the first to present the version of the product variance for standard error computation presented above. Gove et al. (2020) discussed the historical background of this formula and show how it can be derived using the Delta method (Ver Hoef 2012) which is based on a Taylor’s series approximation and has often been used to approximate the variance of a function of one or more random variables Kendall and Stuart (1977, p. 247). It has been noted by Gove et al. (2020); Gregoire and Valentine (2008, p. 259); Marshall et al. (2004); Iles (2012) that there is close agreement between the variance derived from (11) and the variance derived from Goodman’s formula in Eq. 6 because the third term in the latter equation is typically small and dominated by the other two terms.

Gregoire and Valentine (2008) (equation 8.33) have noted that the big BAF volume per hectare estimator can be formulated as follows:

$$ {\hat{V}_{\mathcal{B}}}=\hat{B}_{{\mathrm{c}}}\left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}}\right) $$
(12)

This formulation of the Big BAF estimator is based on three sample means. Gregoire and Valentine (2008) discuss alternative estimators for big BAF sampling including estimators based on Bruce’s traditional formula (Bell and Alexander 1957) and the Goodman (1962) formula for the variance of products of random variables.

Eq. 7 (same as equation 6 of Gove et al. (2020)) computes the variance of the mean volume basal area ratio as a mean of ratios. However the number of individual tree ratios mv is a random variable. In the classic formulations of the mean of ratios estimator in the context of design-based sample survey sampling (Schreuder et al.1993, p. 89) the number of sample ratios is fixed rather than random. As recognized by Gregoire and Valentine (2008, p. 258–259) Eq. 12 provides the opportunity to formulate the average volume basal area as the ratio of means because \(\hat {V}_{\mathrm {v}}\) is the mean volume per sample point and \(\hat {B}_{\mathrm {v}}\) is the mean basal area per sample point when the large basal area factor \(\mathcal {F}_{\mathrm {v}}\) is used for tree selection

$$ \bar{\mathbb{V}}=\hat{R}=\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}} $$
(13)

A classical estimate for the variance of this estimated ratio according to equation 6.13 in Cochran (1977, p. 155) is:

$$\begin{array}{*{20}l} \widehat{\text{var}}_{\mathrm{R}}\!\left({\bar{\mathbb{V}}}\right) &= \frac{1}{\hat{B}^{2}_{\mathrm{v}}} \biggl(\widehat{\text{var}}\!\left({\hat{V}_{\mathrm{v}}}\right) + \bar{\mathbb{V}}^{2} \widehat{\text{var}}\!\left({\hat{B}_{\mathrm{v}}}\right) \biggr. \notag\\ & \quad{}- \biggl. 2\bar{\mathbb{V}}\widehat{\text{cov}}\!\left({\hat{B}_{\mathrm{v}}}, {\hat{V}_{\mathrm{v}}}\right) \biggr) \end{array} $$
(14)

As indicated by Sukhatme et al. (1984, p. 99) this variance estimator is algebraically equivalent to the estimator suggested by Gregoire and Valentine (2008, p. 259). The equation above could then be used as alternative to estimate the variance of the average VBAR in Goodman’s variance formula (6) instead of the more traditional formula (7) for the estimated variance of the average VBAR. Thus this formula is equivalent to equation (12) in Gove et al. (2020) and was included in the simulations presented there. Those simulations are replicated here in order to compare them to the results from a new equation described below and termed the point-wise Delta method.

Methods

Bias in the big BAF estimator

The big BAF estimator has been described above and in Gregoire and Valentine (2008) as based on three sample means. Two of the sample means \(\hat {B}_{{\mathrm {c}}}\) and \(\hat {B}_{\mathrm {v}}\) provide design-based estimates of basal area per acre while the third \(\hat {V}_{\mathrm {v}}\) provides a design-based estimate of volume per hectare, albeit typically with a high variance. However their combination forms the big BAF estimator with a much lower variance but which is not design-unbiased. Here we provide equations giving a simple approximation for the bias as well as an exact upper bound to the bias associated with big BAF sampling.

Approximate bias

In Appendix Eq. A.11 we use a bias approximation formula from Seber (1982, p. 7) to derive the following approximation for the bias of the estimator \({\hat {V}_{\mathcal {B}}}\) in big BAF sampling:

$$\begin{array}{*{20}l} Bias &= \frac{V}{n} \left(\frac{\text{var}\!\left({\hat{B}_{\mathrm{v}_{s}}}\right)}{B^{2}} - \frac{\text{cov}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}}\right)}{B^{2}} \right. \notag\\ &\mspace{16.0mu}{}-\left.\frac{\text{cov}\!\left({\hat{V}_{\mathrm{v}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}}\right)}{VB} + \frac{\text{cov}\!\left({\hat{V}_{\mathrm{v}_{s}}}, {\hat{B}_{{\mathrm{c}}_{s}}}\right)}{VB} \right) \end{array} $$
(15)

where \(\hat {B}_{\mathrm {v}_{s}} = m_{s}\mathcal {F}_{\mathrm {v}}\) and \(\hat {V}_{\mathrm {v}_{s}} = \mathcal {F}_{\mathrm {v}} \sum _{i=1}^{m_{\mathrm {v}_{s}}}\mathbb {V}_{i}\). Note that all the quantities in the bias expression above are population constants with respect to changing sample size except the sample size n. Thus as n goes to infinity the above expression for bias goes to zero. Bias approaching zero with increasing samples size on the order of \(\frac {1}{n}\) is similar to the behavior of the standard ratio estimator according to Cochran (1977, p. 160). Note that if the difference between the large basal area factor BAFc and the small basal area factor BAFv is small, the covariance between \(\hat {B}_{{\mathrm {c}}_{s}}\) and \(\hat {B}_{\mathrm {v}_{s}}\) approaches the variance for \(\hat {B}_{\mathrm {v}_{s}}\) so that the first two terms approach cancellation and similarly for the last two terms, so that bias will be also be lessened as the difference between basal area factors BAFc and BAFv becomes smaller. For a given sample size the bias will also be smaller for forests with high levels of basal area B than for forests with low levels of basal area. This bias expression is very similar to the bias that would be obtained from equation 11 of Palley and Horwitz (1961) for the Bell and Alexander (1957) estimator which can also be expressed as the ratio of two HPS sample means divided by a third sample mean. An important difference is that there are two point-wise sample sizes in the Bell and Alexander (1957) estimator, one being a point-wise subsample. Therefore some of the variances and covariances for the Palley and Horwitz (1961) bias formula and variance estimator of the Bell and Alexander (1957) volume estimator are based on the smaller subsample size while others are based on the total sample size.

Exact bias

In the Appendix an expression for the exact bias in the big BAF sampling estimator \({\hat {V}_{\mathcal {B}}}\), Eq. (A.15), is derived based on methods used by Hartley and Ross (1954) to find the exact bias of the standard ratio estimator (also see Cochran (1977, p. 162))

$$ Bias = \left(\mathrm{E}\!\left[ {{\hat{V}_{\mathcal{B}}}}\right] - V \right) = \frac{\text{cov}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{V}_{\mathrm{v}}}\right) - \text{cov}\!\left({{\hat{V}_{\mathcal{B}}}}, {\hat{B}_{\mathrm{v}}}\right)}{B} $$
(16)

This formula also seems to indicate that the bias will tend to be smaller in stands having higher basal area. Again Eq. A.21 was derived in the Appendix following the methods of Hartley and Ross (1954) resulting in the following upper bound on the absolute relative bias in the big BAF estimator (also see Cochran (1977, p. 162)):

$$ \frac{\left|Bias\right|}{\sqrt{\text{var}\!\left({{\hat{V}_{\mathcal{B}}}}\right)}} \le \frac{1}{\sqrt{n}} \frac{\sqrt{\hat{B}_{\mathrm{v}_{s}}}}{B} $$
(17)

This formula indicates that the bias relative to the standard error of the big BAF estimator approaches zero as sample size n becomes large, on the order of \(\frac {1}{\sqrt {n}}\). This is also the case for the standard ratio estimator according to Cochran (1977, p. 160).

The Delta method for big BAF variance based on three sample means

Previous approaches to variance estimation for big BAF sampling view the estimator as the product of two random variables, the count basal area per hectare and the mean volume basal area ratio. As indicated above, these approaches have assumed that the covariance between count basal area per hectare and the mean volume basal area ratio VBAR is negligible. However, if we do not wish to make that assumption, an alternative is to use the Delta method (Kendall and Stuart1977, p. 247), to approximate the variance of the big BAF estimator (12) in the form presented by Gregoire and Valentine (2008, equation 8.33) indicated above as a function of three sample means. On the basis of a Taylor series, the Delta method approximates the variance of a function of estimators of parameters \(g({\hat {\boldsymbol {\theta }}})\) which estimates g(θ) where θ=(θ1,θ2,…,θn). Now, since the population parameters are generally unknown, the unbiased estimators, \(\hat {\boldsymbol {\theta }}\) where \(\mathrm {E}\!\left [ {\hat {\theta _{i}}} \right ] = \theta _{i} \) are substituted here in the formula for the Delta method presented by Kendall and Stuart (1977, p. 247) viz.,

$$\begin{array}{*{20}l} \text{var}\!\left({g(\hat{\boldsymbol{\theta}})}\right) & \approx \sum_{i=1}^{n} \text{var}\!\left({\hat{\theta}_{i}}\right) g_{i}^{\prime}\!\left({\hat{\boldsymbol{\theta}}} \right)^{2} \notag \\ &\mspace{-2mu}{}+ 2\mathop{\sum\sum}_{i< j} \text{cov}\!\left({\hat{\theta}_{i}}, {\hat{\theta}_{j}} \right) g_{i}^{\prime}\!\left({\hat{\boldsymbol{\theta}}}\right) g_{j}^{\prime}\!\left({\hat{\boldsymbol{\theta}}} \right) \end{array} $$
(18)

or, assuming independence…

$$\begin{array}{*{20}l} &\approx \sum_{i=1}^{n} \text{var}\!\left({\hat{\theta}_{i}} \right) g_{i}^{\prime}\!\left({\hat{\boldsymbol{\theta}}}\right)^{2} \end{array} $$
(19)

In addition in typical applications it is necessary to estimate the variance and covariance terms. In this section we will assume without loss of generality that A=1. Let us define the function g in the formula for the Delta method with \(\hat {\theta }_{1} = \hat {B}_{{\mathrm {c}}}, \hat {\theta }_{2} = \hat {V}_{\mathrm {v}}\), and \(\hat {\theta }_{3} = \hat {B}_{\mathrm {v}}\) as follows:

$$ g\left(\hat{B}_{{\mathrm{c}}},\hat{V}_{\mathrm{v}},\hat{B}_{\mathrm{v}}\right)=\hat{B}_{{\mathrm{c}}}\left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}}\right) $$
(20)

The Delta method requires the following three partial derivatives:

$$ \frac{\partial g}{\partial\hat{B}_{{\mathrm{c}}}}=\left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}}\right) $$
(21)
$$ \frac{\partial g}{\partial\hat{V}_{\mathrm{v}}} = \left(\frac{\hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}}\right) $$
(22)
$$ \frac{\partial g}{\partial\hat{B}_{\mathrm{v}}} = -\left(\frac{\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}^{2}}\right) $$
(23)

Applying the Delta method and substituting estimates for variances, covariances and means we then have:

$$\begin{array}{*{20}l} \widehat{\text{var}}_{\delta_{1}}\!\left({{\hat{V}_{\mathcal{B}}}}\right) &= \left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}}\right)^{2} \widehat{\text{var}}\!\left({\hat{B}_{{\mathrm{c}}}} \right)\notag\\ &\quad{}+\left(\frac{\hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}}\right)^{2} \widehat{\text{var}}\!\left({\hat{V}_{\mathrm{v}}} \right)\notag\\ &\quad{}+\left(\frac{\hat{V}_{\mathrm{v}}\hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}^{2}}\right)^{2} \widehat{\text{var}}\!\left({\hat{B}_{\mathrm{v}}}\right) \notag\\ &\quad{}+ 2\left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}}\right)\left(\frac{\hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}}\right) \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{V}_{\mathrm{v}}} \right) \notag\\ &\quad{}- 2\left(\frac{\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}^{2}}\right) \left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}}\right) \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{B}_{\mathrm{v}}} \right) \notag \\ &\quad{}- 2\left(\frac{\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}^{2}}\right) \left(\frac{\hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}}\right) \widehat{\text{cov}}\!\left({\hat{V}_{\mathrm{v}}}, {\hat{B}_{\mathrm{v}}} \right) \end{array} $$
(24)

The estimated variance of the volume per a unit area based on the large BAF angle gauge alone is

$$ \widehat{\text{var}}\!\left({\hat{V}_{\mathrm{v}}}\right) = \frac{\sum_{s=1}^{n}(\hat{V}_{\mathrm{v}_{s}}-\hat{V}_{\mathrm{v}})^{2}}{n(n-1)} = \frac{\widehat{\text{var}}\!\left({\hat{V}_{\mathrm{v}_{s}}}\right)}{n} $$
(25)

where

$$ \hat{V}_{\mathrm{v}_{s}} = \mathcal{F}_{\mathrm{v}} \sum_{i=1}^{m_{\mathrm{v}_{s}}}\mathbb{V}_{i} $$
(26)

which is the total volume at sample point s and

$$ \hat{V}_{\mathrm{v}}=\frac{\sum_{s=1}^{n} \hat{V}_{\mathrm{v}_{s}}}{n} $$
(27)

The estimated variance for the basal area per hectare based on the large angle gauge \(\mathcal {F}_{\mathrm {v}}\) is

$$ \widehat{\text{var}}\!\left({\hat{B}_{\mathrm{v}}}\right) =\frac{\sum^{n}_{s=1}(\hat{B}_{\mathrm{v}_{s}} - \hat{B}_{\mathrm{v}})^{2}}{n(n-1)} = \frac{\widehat{\text{var}}\!\left({{\hat{B}_{\mathrm{v}_{s}}}}\right)}{n} $$
(28)

where \(\hat {B}_{\mathrm {v}_{s}} = m_{s}\mathcal {F}_{\mathrm {v}}\) is the basal area per hectare at point s with the large basal area factor BAFv. The estimated variance \(\widehat {\text {var}}\!\left ({\hat {B}_{{\mathrm {c}}}}\right)\) for the basal area per hectare based on the small angle gauge \(\mathcal {F}_{{\mathrm {c}}}\) is given by Eq. 8.

Now since Eq. 24 utilizes covariance terms, we present the computational formulas for these. Recall the relationship between the sample covariance and the estimated covariance between sample means based on n independent samples is:

$$ \text{cov}\!\left({\bar{X}}, {\bar{Y}}\right) =\frac{\text{cov}\!\left(X, Y\right) }{n} $$
(29)

Using this relationship the estimated covariance between \(\hat {B}_{{\mathrm {c}}}\) and \(\hat {V}_{\mathrm {v}}\) is:

$$\begin{array}{*{20}l} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{V}_{\mathrm{v}}}\right) &= \frac{\sum^{n}_{s=1}(\hat{B}_{{\mathrm{c}}_{s}} - \hat{B}_{{\mathrm{c}}}) (\hat{V}_{\mathrm{v}_{s}}- \hat{V}_{\mathrm{v}})}{n(n-1)} \notag \\ &= \frac{\widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{V}_{\mathrm{v}_{s}}}\right)}{n} \end{array} $$
(30)

the estimated covariance between \(\hat {B}_{\mathrm {v}}\) and \(\hat {V}_{\mathrm {v}}\) is:

$$\begin{array}{*{20}l} \widehat{\text{cov}}\!\left({\hat{B}_{\mathrm{v}}}, {\hat{V}_{\mathrm{v}}}\right) &= \frac{\sum^{n}_{s=1}(\hat{B}_{\mathrm{v}_{s}}-\hat{B}_{\mathrm{v}})(\hat{V}_{\mathrm{v}_{s}} - \hat{V}_{\mathrm{v}})}{n(n-1)} \notag \\ &= \frac{\widehat{\text{cov}}\!\left({\hat{B}_{\mathrm{v}_{s}}}, {\hat{V}_{\mathrm{v}_{s}}}\right) }{n} \end{array} $$
(31)

and the estimated covariance between \(\hat {B}_{\mathrm {v}}\) and \(\hat {B}_{{\mathrm {c}}}\) is:

$$\begin{array}{*{20}l} \widehat{\text{cov}}\!\left({\hat{B}_{\mathrm{v}}}, {\hat{B}_{{\mathrm{c}}}}\right) &= \frac{{\sum^{n}_{s=1}(\hat{B}_{\mathrm{v}_{s}}}-\hat{B}_{\mathrm{v}})(\hat{B}_{{\mathrm{c}}_{s}}-\hat{B}_{{\mathrm{c}}})}{n(n-1)} \notag \\ &= \frac{\widehat{\text{cov}}\!\left({\hat{B}_{\mathrm{v}_{s}}}, {\hat{B}_{{\mathrm{c}}_{s}}}\right)}{n} \end{array} $$
(32)

As is shown in Supplementary Materials equationsS.3–S.7 variance estimator (24) can also be derived as a special case of an estimator presented by Hansen et al. (1953, p. 512–514) for the variance of a ratio between the product of k random variables and the product of pk random variables (Wolter2007, p. 233–234).

The variance equation can be simplified by noting that the two basal area estimators \(\hat {B}_{\mathrm {v}}\) and \(\hat {B}_{{\mathrm {c}}}\) have the same expected value B and the variance of \(\hat {B}_{{\mathrm {c}}}\) is likely to be smaller because it is based on the smaller BAFc which selects more trees per point. In the original true variance approximation the coefficients multiplied by variances and covariances are functions of parameters which we must estimate when obtaining the approximate variance estimator. This justifies substitution of \(\hat {B}_{{\mathrm {c}}}\) for \(\hat {B}_{\mathrm {v}}\) in the variance formula above because they have the same expectation. Making this substitution and factoring out sample size n, the variance formula can be simplified to:

$$\begin{array}{*{20}l} \widehat{\text{var}}_{\delta_{2}}\!\left({{\hat{V}_{\mathcal{B}}}}\right) &= \frac{\hat{V}_{\mathrm{v}}^{2}}{n} \left(\frac{\widehat{\text{var}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}\right)}{{\hat{B}_{{\mathrm{c}}}^{2}}} + \frac{\widehat{\text{var}}\!\left({\hat{V}_{\mathrm{v}_{s}}}\right)}{\hat{V}_{\mathrm{v}}^{2}} \right. \notag\\ &\mspace{-4mu}{}+\frac{\widehat{\text{var}}\!\left({\hat{B}_{\mathrm{v}_{s}}}\right)}{\hat{B}_{{\mathrm{c}}}^{2}} + 2\frac{\widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{V}_{\mathrm{v}_{s}}}\right)} {\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}} \notag\\ &\mspace{-4mu}{}-\left. 2\frac{\widehat{\text{cov}}\!\left({\hat{B}_{\mathrm{v}_{s}}}, {\hat{B}_{{\mathrm{c}}_{s}}} \right) }{\hat{B}_{{\mathrm{c}}}^{2}} - 2\frac{\widehat{\text{cov}}\!\left({\hat{V}_{\mathrm{v}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}}\right)}{\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}^{2}} \right) \end{array} $$
(33)

We have derived this variance estimation formula under the assumption that A=1 so the variance estimate for an entire tract of area A can be obtained by multiplying by A2 or alternatively expressing \(\hat {V}_{\mathrm {v}}\) in total tract units rather than as per hectare. Note that because we have factored out a quantity of \(\frac {1}{n}\) the estimator above is a function of the sample variances and covariances among sample point HPS estimates.

The variance estimator above is very similar to equation (12) of Palley and Horwitz (1961) which they obtained for the Bell and Alexander (1957) estimator which was essentially double sampling with a ratio estimator. However an important difference is that the Bell and Alexander (1957) estimator consists of a large sample of points on which basal area counts are made and a subsample of points on which tree volumes are also determined. By contrast for big BAF sampling the volume subsample is made on every point so there is no smaller point-wise sample. Therefore some of the variances and covariances for the Palley and Horwitz (1961) variance estimator of the Bell and Alexander (1957) volume estimator must be determined on the subsample which is smaller than n, but for big BAF sampling all the variances and covariances have the same point-wise sample size of n. A consequence is that for the big BAF variance estimator we cannot further simplify the variance estimator above by utilizing the ratio of large-to-small point-wise sample size as was done by Palley and Horwitz (1961).

Simulation trials

We used two simulated forest populations that were previously employed by Gove et al. (2020) to compare traditional and previously proposed big BAF sampling variance estimators. The sampling simulation program sampSurfGove (2012) which was written in R (R Core Team 2021) was used to conduct the simulations. The concept of “sampling surface” (Williams 2001a, b) was used to construct the sampSurf simulator in which a raster tract of area A is tessellated into square grid cells. Trees are located on the tract and inclusion zones are established for each tree based on the sampling procedure (horizontal point sampling for these simulations). A sample point is considered to be located in the center of each grid cell. Totals for each grid cell are based on the attributes of trees whose inclusion zones contain the sample point at the grid cell center. The sampling surface is developed based on the total attributes values over all the grid cells. For our simulations square tracts were used with grid cells 1 m2 in size.

Nine sets of simulations were conducted using every combination of BAF pairs (\(\mathcal {F}_{\mathrm {v}}\) and \(\mathcal {F}_{{\mathrm {c}}}\)) where \(\mathcal {F}_{{\mathrm {c}}}\ \in \{3, 4, 5\}\) and \(\mathcal {F}_{\mathrm {v}}\ \in \{10, 20, 30\}\) for both forest populations. For each sampling simulation sampling surfaces were developed for total basal area and total volume using every combination of \(\mathcal {F}_{{\mathrm {c}}}\) and \(\mathcal {F}_{\mathrm {v}}\) resulting in 36 simulation surfaces. A Monte Carlo experiment was conducted for each of the 9 BAF factor combinations in which random samples of n=10,25,50 and 100 were drawn with 1,000 replications. Summary statistics were computed for HPS and big BAF sampling for each sample on each sampling surface. The statistics available for each BAF combination made it possible to compare the big BAF results to an HPS sample using \(\mathcal {F}_{{\mathrm {c}}}\) in which every sample tree was measured for volume (e.g., DBH and height measured).

Mixed northern hardwood population

The mixed northern hardwood population is the same one used by Gove et al. (2020). The population is artificially constructed but resembles what could typically be found in a mixed northern hardwoods forest. It is established on a tract having an area of A=3.17 ha and containing 31,684 grid cells. The tract is bounded by an external buffer 18 m wide so that the portion of the tract containing the tree population internal to the buffer has an area of 2 ha. A population of m=667 trees with a total basal area of 48.4 m2 was established within the tract boundaries. This is approximately equivalent to 333 trees ·ha−1 and a basal area of 24.2 m2 ·ha−1 with a stand quadratic mean diameter of \(\bar {D}_{q} = 30.3\) cm. According to northern hardwoods stocking guides by Leak et al. (2014) the stand would be in fully stocked condition. A three-parameter Weibull distribution (Bailey and Dell 1973) was used to assign tree DBHs, with location, scale and shape parameters respectively being α=10 cm, γ = 2 and ζ = 30 cm. Total heights for each tree in the simulated northern hardwoods stand were assigned using the all-species DBH-height equation by Fast and Ducey (2011) for northern hardwoods in New Hampshire converted to metric units. A normal random error term with mean zero and standard deviation 2.5 m was added to each height prediction. A spatial inhibition process (Venables and Ripley2002, p. 434) with an inhibition distance of 3 m was used to assign trees to spatial locations within the simulated northern hardwoods forest tract. The method of Masuyama (1953) for boundary overlap correction was used in which tree inclusion zones were allowed to overlap into the buffer region (Gregoire and Valentine2008, p. 224). Because random sample points can fall anywhere in the tract which includes the buffer region, each tree has a complete inclusion zone.

The following taper function is used within the sampSurf simulation (Van Deusen 1990):

$$ d(h) = D_{u} + (D_{b} - D_{u})\left(\frac{H-h}{H}\right)^{\frac{2}{r}} $$
(34)

where Du is the top diameter at tree stem height h, Db is the tree stem butt diameter and 0≤hH is tree height. The value of the taper parameter r was randomly selected for each tree from the range r[1.5,3]. With the taper function above a neiloidal form results if 0<r<2, a cone if r=2 and a paraboic form if r>2. The taper function for each tree was used to compute individual tree volume according to the procedures of Gove (2011a, p. 8). There was a correlation coefficient \(\rho (\mathbb {V}, b) = 0.62\) between individual tree VBAR and basal area in the simulated northern hardwoods population. Figure S.2 in the Supplementary Material for Gove et al. (2020) displays histograms of the DBH and height distributions for the simulated northern hardwoods forests.

Eastern white pine population

The eastern white pine (Pinus strobus L.) population used by Gove et al. (2020) was also used in this study. Gove et al. (2000) describes data collection for the eastern white pine based on Barr & Stroud FP-12 dendrometry over a 20-year period. These data were obtained from pure even-aged white pine forest stands in southern New Hampshire. Data processing utilized the RDendrometry package (Gove 2011a). The white pine population used for simulations consists of m=316 white pine trees with multiple measurements on some during the period. Trees were located within a 1 ha tract having an 18 m wide buffer and having a total area of A=1.85 ha in size with 18,496 grid cells. The population has a basal area of 47.2 m2 and a quadratic mean DBH of \(\bar {D}_{q} = 43.6\) cm. According to the Leak and Lamson (1999) white pine stocking guide, the tract is solidly in the full stocking range. The trees were originally measured in several different stands without location information. To assign trees spatial locations for the simulation stand, a spatial inhibition process having an inhibition distance of 3 m was employed similarly to the northern hardwoods stand discussed above. As with the northern hardwoods stand, Mayasuma’s method was used to correct for boundary overlap in point sampling, so that randomly located sample points were permitted to fall into the buffer strip surrounding the 1 ha white pine tract. No taper function was required for the white pine stand because dendrometry measurements were available for upper-stem taper on each tree. As described by Gove (2011b), a cubic spline was fitted to tree dendrometry measurements. Smalian’s formula (Kershaw et al. (2016, p. 241)) was used to calculate individual tree volumes. Figure S.5 in the Supplementary Material of Gove et al. (2020) displays histograms of DBH and total height distributions for the white pine population.

Results

Big BAF estimator bias

We derived two expressions for the bias in big BAF sampling, Eq. 15 which is an approximation to the bias and (16) which is an exact expression of the bias. An indication of the bias in big BAF sampling is shown in Fig. 1 which was prepared using Eq. 15 with variance and covariance values computed from all the lattice points on the sampling surfaces for the northern hardwoods and white pine populations. Use of all the lattice points in these computations should provide a very close approximation to true population values. Only sampling plans with \(\mathcal {F}_{{\mathrm {c}}} = 3\) are presented because results from the other values of BAFc used in this study are very similar and the bias with \(\mathcal {F}_{{\mathrm {c}}} = 3\) is larger than other values of BAFc used in this study by a very small amount. Figure 1 shows that the bias is quite small even for n=10 which is 0.17% for the white pine population. Bias percentages decline steeply with increasing sample sizes and are essentially negligible for all sample sizes equal to or greater than 10 for both the white pine and northern hardwoods population. As expected bias also declines as BAFv approaches the value of \(\mathcal {F}_{{\mathrm {c}}} = 3\).

Fig. 1
figure1

Approximate bias in the big BAF estimator for the northern hardwoods and white pine populations with BAFv = 30 (dotted, ∙), BAFv = 20 (dash, ) and BAFv = 10 (solid, +)

Population sampling surfaces

The results concerning the sampling surfaces for the Northern Hardwoods and the White Pine populations were given by Gove et al. (2020) in Table 1 of that paper. As expected the results for basal area and volume from the sampling surfaces were quite close to the actual population values. The white pine population had higher stocking and volume per hectare than the northern hardwoods population as would be typical for fully-stocked stands in the New England, USA region. Gove et al. (2020) noted that the northern hardwoods DBH distribution was more positively skewed than the white pine DBH distribution. Tree heights in the white pine populations were generally taller than the northern hardwoods population which was likely the primary reason that the volume per tree in the white pine was considerably greater than that for the northern hardwoods population. Figure S.1 in Supplementary Material shows sampling surfaces for the northern hardwoods population for (a) total BAFc with \(\mathcal {F}_{{\mathrm {c}}} = 3\) basal area and (b) total BAFc volume with \(\mathcal {F}_{\mathrm {v}} = 30\). A realization of a Monte Carlo sample consisting of n=100 is also indicated with each point denoted by a red “ ×”.

Figure S.2 in Supplementary Materials indicates the population correlations ρ for the northern hardwoods population between important variables such as the basal area and volume on the count and volume points and the volume for all 6 combinations of point sampling BAFs used in the simulations. As might be expected the correlation between basal area and volume when using the same BAF factor is close to 1 and fairly constant over variation in the range of BAFv for big BAF sampling. As also might be expected the correlation between two variables when one is sampled on BAFc and the other is sampled on BAFv declines with increasing count BAFc and ranges from 0.83 to 0.52. Covariances between these variables are displayed on Figure S.3 in Supplementary Materials.

Figure S.4 in Supplementary Materials illustrates the sampling surfaces for basal area and volume for \(\mathcal {F}_{{\mathrm {c}}} = 3\) and \(\mathcal {F}_{\mathrm {v}} = 30\) for the eastern white pine population. A red “ ×” is indicated on the volume surface illustration for each point in a realization of one Monte Carlo sample of n=100.

Population correlations between important variables for the eastern white pine population are given in Figure S.5. The patterns are similar to those for the northern hardwoods population. However the correlations between pairs of variables, in which one is from the large BAFv and the other is from the small BAFc, are higher. Correlations range from 0.91 to 0.67, declining with increasing values of BAFv. Population covariances for this population are given in Figure S.6.

Monte Carlo simulations

Standard error comparisons

Figure 2 displays the standard error results from Monte Carlo simulations comparing Goodman’s Method (Eq. 6), the “traditional” Delta method (Eq. 11), the new point-based Delta method (Eq. 24) and simplified point-based Delta method (Eq. 33) for the northern hardwoods population. For total sample size of n=100, the standard errors of the four methods are virtually indistinguishable for all three count BAFs, \(\mathcal {F}_{{\mathrm {c}}}=3, \mathcal {F}_{{\mathrm {c}}}=4\) and \(\mathcal {F}_{{\mathrm {c}}}=5\). As expected standard errors decline for all four variance approximation methods as the BAFv declines and more closely approaches the value of BAFc. For reference purposes the standard errors for HPS using all trees selected on the count BAFc are displayed. In Supplementary Material it is indicated in Figure S.7 that this HPS standard error is extremely close to the standard error among big BAF simulation estimates for all values of BAFc and BAFv utilized in this study. Because these were so extremely close we present the HPS standard errors only for comparison to the results for big BAF sampling in Figs. 2 and 3. It is a remarkable fact that HPS with measurement of all trees for volume determination provides no appreciable reduction in variance in volume estimation compared with big BAF in which only a much reduced subsample of trees are measured at each point.

Fig. 2
figure2

The northern hardwood Monte Carlo standard error simulation results as the average over 1,000 replications for each BAF pair and sample size with the Delta Method (dashed, ), Goodman’s (dot-dashed, +), the point-based Delta method (dotted, \(\triangledown \)) and the simplified point-based Delta method (long-dash, ×). The reference line (solid, ∙) is the average Monte Carlo standard error for the BAFc HPS results

Fig. 3
figure3

The white pine Monte Carlo standard error simulation results as the average over 1,000 replications for each BAF pair and sample size with the Delta Method (dashed, ), Goodman’s (dot-dashed, +), the point-based Delta method (dotted, \(\triangledown \)) and the simplified point-based Delta method (long-dash, ×). The reference line (solid, ∙) is the average Monte Carlo standard error for the BAFc HPS results

As the total point-wise sample size n decrease from n=100 to n=10 separations between the standard errors given by the four approximations become more evident with Goodman’s method being slightly below the Delta method and the point-based Delta method standard error being lower than either of the other three approximations, particularly for n=10 and the largest value of BAFv. The simplified point-based Delta method is consistently lower than the other three methods. This makes it closer to the reference line indicated for HPS except for n=10 where simplified point-based Delta method underestimates compared to the reference line for the two smaller values of BAFv. However it should be noted that even in the latter case the difference between the point-based Delta method and simplified point-based Delta method compared to the traditional Delta method is only in the range of 5% even for n=10 and the largest value of BAFv. In this latter case both point-based Delta method and simplified point-based Delta method are within 2% of the reference line for HPS with point-based Delta method overestimating and simplified point-based Delta method underestimating for the two largest values of BAFv. For samples sizes equal to or larger than n=25 which are more likely to be representative of typical big BAF sampling plans all four variance estimators are consistently within 2% of each other becoming closer as samples size n becomes larger.

Figure 3 contains the white pine population Monte Carlo simulation standard error results. As was the case for the northern hardwoods population, the traditional Delta method (Eq. 11), Goodman’s method (Eq. (6)), the point-based Delta method (Eq. 24) and the simplified point-based Delta method (Eq. (33)) are displayed in the figure. However, the maximum difference between the standard errors of the three methods was about 8% for n=10 with differences at n=100 being only about 0.5%. In the case of n=10, both point-based Delta method and simplified point-based Delta method were within approximately 2% of the HPS reference line but point-based Delta method was an overestimate while simplified point-based Delta method was an underestimate. Consistently the lowest estimate of standard error was provided by simplified point-based Delta method which made it closer to the HPS reference line than the other methods except in the case of n=10 where it was somewhat lower than the HPS reference line. As expected, standard errors for each given level of n and BAFc mostly decline with decreasing levels of BAFv though there are some slight exceptions in the case of point-based Delta method for n=10 These trends are similar to those for the northern hardwoods population. However, the standard errors from the northern hardwoods population ranged from about 32% for n=100 to 100% for n=10 (Fig. 2) while the standard errors associated with the white pine population were substantially greater ranging from approximately 50% for n=100 to 160% for n=10 (Fig. 3).

Confidence interval captures

Figure 4 depicts the confidence interval capture rates on the northern hardwoods population for the 1,000 replications of Monte Carlo simulation for big BAF standard error estimates obtained using the traditional Delta method (11), Goodman’s method (6), the point-based Delta method (24) and the simplified point-based Delta method (33). The results in this figure are based on the percentage of simulation trials in which a 95% confidence interval for total volume from the big BAF trial contains the true mean total volume of the simulated population. Thus a capture rate of 95% would be ideal. For n=100 and n=50 the capture rates for the traditional Delta method, Goodman’s method, the point-based Delta method and the simplified point-based Delta method are very close for all values of BAFc and BAFv, all ranging between 94.1% and 95.4%. In some cases such as n=100 with \(\mathcal {F}_{{\mathrm {c}}} = 3\) and \(\mathcal {F}_{{\mathrm {c}}} = 4\), the big BAF capture rates are even closer to 95% than the capture rate for HPS with the given value of BAFc. For \(n = 25, \mathcal {F}_{{\mathrm {c}}} = 3\) and \(\mathcal {F}_{\mathrm {v}} = 10\) the point-based Delta method and the simplified point-based Delta method have a capture rate slightly lower than the other two big BAF standard error estimators but all capture rates are between 93.5% and 95%. All capture rates were between 93.5% and 95.3% for n=10, with the point-based Delta method and the simplified point-based Delta method being lower than the other two big BAF methods for \(\mathcal {F}_{\mathrm {v}} = 30\) and the simplified point-based Delta method being somewhat lower for all values of BAFv, otherwise the three methods were extremely close. In summary the four big BAF standard error estimates produced confidence interval capture rates that were very similar as might be expected from the fact that they produced very similar variance estimates as indicated by Fig. 2.

Fig. 4
figure4

The northern hardwoods Monte Carlo simulation results for confidence interval capture rates as the average over 1,000 replications for each BAF pair and sample size with the Delta Method (dashed, ), Goodman’s (dot-dashed, +), the point-based Delta method (dotted, \(\triangledown \)) and the simplified point-based Delta method (long-dash, ×). The reference line (solid, ∙) is the average Monte Carlo standard error for the BAFc HPS results

Figure 5 displays the confidence interval capture rates for the white pine population using confidence intervals constructed with standard errors based on the traditional Delta method (Eq. 11) Goodman’s method (Eq. (6)) the point-based Delta method (Eq. 24) and the simplified point-based Delta method (Eq. (24)). Similarly to the northern hardwoods population, simulations with the white pine population produced confidence interval capture rates between 91.6% and 96% for all combinations of \(\mathcal {F}_{{\mathrm {c}}} = 3, 4\) and 5, \(\mathcal {F}_{\mathrm {v}}\ = 10, 20\) and 30, and sample sizes n=10,25,50, and 100. As well, the capture rates for a conventional HPS with BAFc are very similar to the capture rates with the big BAF approaches. For n=100 the three big BAF capture rates were all very close to each other and between 93.7% and 94.8%. It is perhaps surprising that for n=100 the largest deviation from the ideal capture rate of 95% were the results for the conventional HPS with BAFc which was lowest among all 5 methods ranging from 93.2% to 94.5% for the case of BAFc = 5. For n=100, the capture rates for the point-based Delta method and the simplified point-based Delta method were very slightly lower than those for the other two big BAF methods. For the n=50 samples size, the capture rates for the big BAF methods as well as the conventional HPS with BAFc were all between 94.7% and 96%. Again the capture rates for the point-based Delta method and the simplified point-based Delta method were slightly lower than for the other two big BAF methods which in this case made them slightly closer to 95%. Capture rates for the conventional HPS with BAFc were extremely close to 95% and sightly lower than the big BAF methods for \(\mathcal {F}_{{\mathrm {c}}} = 3\) and \(\mathcal {F}_{{\mathrm {c}}} = 4\) but for \(\mathcal {F}_{{\mathrm {c}}} = 5\), the HPS capture rates were slightly higher than those for the point-based Delta method and the simplified point-based Delta method and slightly lower than those associated with the other two big BAF methods. Similar results were obtained for n=25 with the point-based Delta method and the simplified point-based Delta method having nearly equal or slightly lower capture rates than the other two big BAF methods, and all capture rates being between 94.7% and 96%. Capture rates for conventional HPS with BAFc were slightly lower or nearly equal to the three big BAF methods for n=25. In the case of n=10, all capture rates ranged between 91.6% and 94.6% with the point-based Delta method and the simplified point-based Delta method once again trending somewhat lower than the other two big BAF methods.

Fig. 5
figure5

The white pine Monte Carlo simulation results for confidence interval capture rates as the average over 1,000 replications for each BAF pair and sample size with the Delta Method (dashed, ), Goodman’s (dot-dashed, +), the point-based Delta method (dotted, \(\triangledown \)) and the simplified point-based Delta method (long-dash, ×). The reference line (solid, ∙) is the average Monte Carlo standard error for the BAFc HPS results

Correlations

Point-wise estimated correlations between estimates of basal area and volume for BAFv\(\hat {\rho }\!\left ({V_{\mathrm {v}_{s}}}, {B_{\mathrm {v}_{s}}}\right)\), basal area from BAFc and volume from \( BAFv\kern1em \hat{\rho}\kern0.3em \left({B}_{{\mathrm{c}}_s},{V}_{{\mathrm{v}}_s}\right)\kern1em \) and basal area from BAFc and BAFv \(\hat {\rho }\!\left ({B_{{\mathrm {c}}_{s}}}, {B_{\mathrm {v}_{s}}}\right)\ \) are given for the northern hardwoods Monte Carlo simulations in Figure S.8 and the white pine Monte Carlo simulations in Figure S.10 for each combination of n, BAFc and BAFv. Estimated correlations between volume and basal area estimates \(\hat {\rho }\!\left ({V_{\mathrm {v}_{s}}},{B_{\mathrm {v}_{s}}}\right)\) for both the northern hardwoods in Figure S.8 and white pine in Figure S.10 were extremely close to one for all sample sizes and combinations of BAFc and BAFv.

Estimated correlations between basal area obtained from BAFc and BAFv \(\hat {\rho }\!\left ({B_{{\mathrm {c}}_{s}}}, {B_{\mathrm {v}_{s}}}\right)\) and correlations between BAFc and volume estimates \(\hat {\rho }\!\left ({B_{{\mathrm {c}}_{s}}},{V_{\mathrm {v}_{s}}}\right)\) were very close and, within each sample size, declined with increasing values of BAFv. These estimated correlations are generally somewhat higher for the white pine populations for a given value of n, BAFc and BAFv. For the northern hardwoods population, estimated correlations range between 0.83 and 0.52 and decline with increasing BAFv. As indicated, estimated correlations in the white pine simulations tended to be higher than for the northern hardwoods population and ranged between 0.91 to 0.67.

Discussion

Results from inspection of Fig. 1 indicated that the bias in the big BAF estimator is quite low for both the northern hardwoods and the white pine populations used in this study. For larger values of n the bias approaches zero and is especially low for the northern hardwoods population. The great majority of big BAF forest sampling plans in practice would be expected to have 10 or more sample points. Perhaps an exception might be a small stratum in a stratified sample design. While it would be possible to estimate bias from sample statistics using Eq. 15, it should not be necessary given the very low values of bias obtained in this example, especially for the larger values of n that are commonly used in practical applications of big BAF sampling. Furthermore, we are not aware of any instances reported in the literature of big BAF sampling in which bias has presented problems for practical applications.

It should be noted that compared to the traditional Delta method (11), Goodman’s method (6) contains a negative term which causes it to be smaller than the traditional Delta method, although the difference is quite modest according to Figs. 2 and 3. Both the traditional Delta method and Goodman’s method omit covariance terms between variables used in the estimation process while the point-based Delta method (24) and the simplified point-based Delta method (33) do account for covariances. This may be the reason that the point-based Delta method and the simplified point-based Delta method standard error estimates are somewhat smaller for the smaller sample sizes (especially n=10) in Figs. 2 and 3.

Gove et al. (2020) have previously compared the traditional Delta method Eq. (11) to Goodman’s Eq. 6 with the same results presented here for these two variance estimation methods. However they did not include the point-based Delta method Eq. 24 or the simplified point-based Delta method (33) in their simulations, simulations of the traditional Delta method and Goodman’s equation were included in the present study so that the performance of the point-based Delta method and the simplified point-based Delta method could be compared to those previously-developed variance estimators.

The impact of the correlations discussed above which is included in the point-based Delta method and the simplified point-based Delta method but neglected in the two traditional variance estimation methods was apparently small for the simulated populations tested here. However, it is conceivable that these correlations could have a larger effect in some natural populations. Because of the way the artificial tree populations were constructed for this article, the possible effects of local variations in stand density on tree dimensions and the tree DBH-height relationship were minimized. For some species, density variations may affect the DBH-height relationship thus inducing some degree of correlation between volume per tree and basal area per hectare. For example, suppose an even-aged natural pine stand has extreme density variations so that DBHs tend to be smaller in relation to height in locally dense areas but larger in relation to height in areas where density is substantially less. This could induce a negative correlation between basal area and volume per tree because higher basal area regions would have less volume per tree than lower basal area regions. From this point of view the point-based Delta method and the simplified point-based Delta method may be a more conservative approaches because they do account for covariances of the kind just discussed. As well, the correlation terms present in the point-based Delta method (24) and the simplified point-based Delta method (33) may provide additional opportunities to investigate the effects of forest stand structure on big BAF variance.

By looking at the estimation problem as point-wise selection of random sample points an estimator of the covariance between the basal area estimate and ratio of mean volume to mean basal area estimates was derived in the Appendix (Eq. A.34):

$$\begin{array}{*{20}l} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\frac{\hat{V}_{\mathrm{v}}} {\hat{B}_{\mathrm{v}}}}\right) &= \frac{1}{n\hat{B}_{\mathrm{v}}}\left(\widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{V}_{\mathrm{v}_{s}}}\right) \right.\notag\\ &\mspace{-4mu}{}-\left.\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}} \right) \right) \end{array} $$
(35)

The derivation uses an equation derived by Taylor’s series methods as was the case for the Delta method (Kendall and Stuart1977, p. 247). This equation could be used to estimate a correlation coefficient between the basal area estimate and the ratio of sample mean volume to sample mean basal area when expressed as a ratio of means after the manner of the form of the big BAF estimator (12) proposed by Gregoire and Valentine (2008, equation 8.33). Equation 14 would be used to estimate the standard error of the volume basal area ratio in the denominator of the correlation formula. In addition the formula above might be used to incorporate covariance information into past approaches to big BAF variance estimator that were based on the variance of a product when Eq. 14 is also used to estimate the variance of the ratio of mean volume to mean basal area.

The point-based Delta method and the simplified point-based Delta method result in a computational Eqs. 24 and (33) which are longer and more complex than Goodman’s equation (6) or the traditional Delta method (11). However the formulas for the point-based Delta method or the simplified point-based Delta method can easily be coded in programming languages such as R (R Core Team 2021) as was done for the simulations reported here or in a spreadsheet. It is perhaps becoming rare to rely on “hand” calculations with data entered in calculators to perform the computations required for a forest inventory. Once the point-based Delta method or the simplified point-based Delta method has been coded in a programming language or a spreadsheet template, computations required for the method should not be a barrier to its use.

Instances in which the variances of the point-based Delta method and the simplified point-based Delta method were associated with lower confidence interval capture rates than Goodman’s or the traditional Delta method are associated with simulation parameters for which the estimated standard error for the point-wise Delta method were lower than standard errors from the other two methods. As indicated above, this may possibly be due to negative terms associated with covariances in the point-based Delta method Eq. (24) and the simplified point-based Delta method (33) which are not present in Goodman’s formula Eq. 6 or the traditional Delta method Eq. (11) as they have traditionally been applied to the big BAF variance estimation problem. Inspection of Eq. 24 for the point-based Delta method and (33) for the simplified point-based Delta method indicate that the last two terms in the equations will likely be negative because they contain covariances expected to be positive but which are multiplied by negative coefficients. These negative coefficients result from taking the partial derivative of a ratio with respect to the denominator in the ratio (Eq. (23)) as required by the Delta method. Intuitively, when the denominator in a ratio is positively correlated to a term in the numerator of the ratio, this tends to stabilize the variability in the ratio because large values in the numerator then tend to be matched by large values in the denominator. This intuition accords with negative terms in the variance formula associated with covariances between terms in the numerator and the denominator of the ratio for the big BAF estimator.

Acceptable results from confidence interval captures tends to confirm that the Delta method based on a first-order Taylor series approximation can provide good variance estimates for big BAF sampling. As indicated previously, Wolter(2007, p. 231) states that the first order approximation has frequently been found to be acceptable in practice. The point-based Delta method, the simplified point-based Delta method and the traditional Delta method tested here are based on first-order Taylor series expansions, but the traditional Delta method assumes that covariance terms are negligible.

Another technical advantage of the point-based Delta method and the simplified point-based Delta method is that these equations do not depend on the variation among trees within points. Thus there is no dependence on variance terms that implicitly assume that trees sampled within the same point are independent as Eq. 7 does. Actually the only independent random samples in big BAF sampling are the sample points themselves. Similarly Palley and Horwitz(1961, p. 60) noted when considering a traditional variance estimator based on Bell and Alexander (1957) for two-stage sampling in which a subset of the count basal area points are selected for volume measurements, “There is some confusion here, since in point sampling the measurement we are concerned with attaches to points...rather than to trees.” Technically, trees sampled at the same point are correlated, although apparently this fact did not prevent accurate evaluations of variance with (7) in simulations. However, use of the traditional big BAF variance estimation methods with the ratio variance estimate Eq. 14 instead of Eq. 7 also avoids the problem of estimating variance using possibly non-independent sample trees within the same points.

In comparing the point-based Delta method to the simplified point-based Delta method we recommend the use of the simplified point-based Delta method (33) for applications. Inspection of Figs. 2 and 3 show that the simplified point-based Delta method was slightly closer to the HPS reference line than the other three variance estimation methods for sample sizes of n=25 or greater. In the case of n=10, the simplified point-based Delta method was generally about equally distant from the HPS reference line as point-based Delta method but tended to be a slight underestimate instead of an overestimate. In practical applications the majority of big BAF sampling plans will likely have samples sizes of n>10.

From a theoretical point of view the point-based Delta method should be preferred to Bruce’s method because Bruce’s method does not take into account possible correlations between the basal area estimate obtained from BAFc and the estimate of mean volume to basal area ratio. The point-based Delta method implicitly does this by accounting for the correlations between all basic basal area and volume estimates. In a very similar way Palley and Horwitz (1961, p. 60) used the Delta method to derive a “conceptually sounder” variance estimator that they recommend as an alternative to the Bell and Alexander (1957, p. 17) variance estimator for double sampling with a ratio estimate in the context of point sampling. The simplified point-based Delta method should be preferred to the point-based Delta method because the estimate of total basal area using BAFc is more precise than the estimate of basal area using BAFv and therefore a better estimate of the total basal area B for use in the variance approximation formula for the simplified point-based Delta method.

A final thought concerning the efficacy of the point-based Delta method and the simplified point-based Delta method relates to the current practice of estimating covariances and correlations to assess the independence assumption for big BAF sampling. As noted in Gove et al. (2020), past attempts at calculating these quantities have all been ad hoc due to the nature of differing ‘sample support’ between the VBAR estimates, which are tree-based, and the basal area estimates, which are point-based. The point-based Delta method and the simplified point-based Delta method solve this dilemma by determining covariances and correlations completely on a point-wise basis, yielding true estimates in each case rather than aggregating tree-wise attributes for comparison on a point-wise manner as in the traditional Delta method application.

Conclusions

New variance formulas for big BAF sampling have been derived and tested. They have been termed the point-based Delta method and the simplified point-based Delta method because they have been derived using the Delta method based on sources of variation among sample points. This approach takes the covariances among the variables in the big BAF sampling estimator into account. More traditional methods of estimating the big BAF sampling variance are based on the variances of variables comprising the big BAF estimator but do not take the covariances between these variables into account. Monte Carlo simulation experiments conducted on a northern hardwoods forest population and a white pine population indicated that the point-based Delta method and the simplified point-based Delta method performed comparably to two existing big BAF estimators. Estimates from the point-based Delta method and the simplified point-based Delta method were sometimes slightly lower than estimates from the other two methods in smaller sample sizes (numbers of sample points). This might be partially due to negative terms of modest magnitude associated covariances among variables which are not considered with the more traditional estimators. We have also shown mathematically that the bias in big BAF sampling approaches zero as sample size becomes large on the order of \(\frac {1}{n}\), behavior that is similar to the standard ratio of means estimator commonly used in survey sampling.

Appendix

On the bias in the big BAF estimator and the covariance between the basal area and volume basal area ratio estimates

Approximate bias

According to Seber (1982, p. 7) a second-order approximation to the bias in a function g of the means of random variables xi based on Talyor series is:

$$ Bias = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\text{cov}\!\left({x_{i}}, {x_{j}}\right) \frac{\partial^{2} g}{\partial x_{i} \partial x_{j}} $$
(A.1)

Begining with the equations for the first partial derivatives (21), (22), and (23) we obtain the following required second partial and cross-partial derivatives. Note that the second partials of g with respect to \(\hat {B}_{{\mathrm {c}}}\) and \(\hat {V}_{{\mathrm {c}}}\) are zero so they are not given below. We assume without loss of generality in this section that tract area A=1 so the results are on a per-unit area basis.

$$\begin{array}{*{20}l} \frac{\partial^{2}g}{\partial\hat{B}^{2}_{\mathrm{v}}} &= \left(\frac{2\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}^{3}}\right) \end{array} $$
(A.2)
$$\begin{array}{*{20}l} \frac{\partial^{2} g}{\partial\hat{B}_{{\mathrm{c}}} \partial{\hat{B}_{\mathrm{v}}}} &=-\left(\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}^{2}}\right) \end{array} $$
(A.3)
$$\begin{array}{*{20}l} \frac{\partial^{2}g}{\partial{\hat{B}_{{\mathrm{c}}}} \partial\hat{V}_{\mathrm{v}}} &= \left(\frac{1}{\hat{B}_{\mathrm{v}}}\right) \end{array} $$
(A.4)
$$\begin{array}{*{20}l} \frac{\partial^{2}g}{\partial{\hat{B}_{{\mathrm{c}}}} \partial\hat{V}_{\mathrm{v}}} &=-\left(\frac{\hat{B}_{{\mathrm{c}}}}{\hat{B}_{\mathrm{v}}^{2}}\right) \end{array} $$
(A.5)

The true population bias is a function of expected values and population variances. Noting the expected values for the HPS sample means \(\mathrm {E}\!\left [ {\hat {B}_{{\mathrm {c}}}}\right ] = \mathrm {E}\!\left [{\hat {B}_{\mathrm {v}}}\right ] = B\) and \(\mathrm {E}\!\left [ {\hat {V}_{\mathrm {v}}}\right ] = V\) the relevant second partial and cross partial derivatives evaluated at the mean vector θ=(B,B,V) are:

$$\begin{array}{*{20}l} \frac{\partial^{2}g(\theta)}{\partial\hat{B}^{2}_{\mathrm{v}}} &= \left(\frac{2V}{B^{2}}\right) \end{array} $$
(A.6)
$$\begin{array}{*{20}l} \frac{\partial^{2} g(\theta)}{\partial\hat{B}_{{\mathrm{c}}} \partial{\hat{B}_{\mathrm{v}}}} &= -\left(\frac{V}{B^{2}}\right) \end{array} $$
(A.7)
$$\begin{array}{*{20}l} \frac{\partial^{2}g(\theta)}{\partial{\hat{B}_{{\mathrm{c}}}} \partial\hat{V}_{\mathrm{v}}} &= \left(\frac{1}{B}\right) \end{array} $$
(A.8)
$$\begin{array}{*{20}l} \frac{\partial^{2}g(\theta)}{\partial{\hat{B}_{{\mathrm{c}}}} \partial\hat{V}_{\mathrm{v}}} &= -\left(\frac{1}{B}\right) \end{array} $$
(A.9)

Substituting into (A.1) with algebraic rearrangement we obtain:

$$\begin{array}{*{20}l} Bias &= V\left(\frac{\text{var}\!\left({\hat{B}_{\mathrm{v}}}\right)}{B^{2}} - \frac{\text{cov}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{B}_{{\mathrm{c}}}}\right) }{B^{2}} \right.\notag\\ &\quad{}-\left.\frac{\text{cov}\!\left({\hat{V}_{\mathrm{v}}}, {\hat{B_{\mathrm{v}}}}\right)}{VB} + \frac{\text{cov}\!\left({\hat{V}_{\mathrm{v}}}, {\hat{B}_{{\mathrm{c}}}}\right)}{VB} \right) \end{array} $$
(A.10)

The expression above is essentially the same as bias derived from Equation 11 in Palley and Horwitz (1961) for the Bell and Alexander (1957) estimate which is essentially the same as double sampling with a ratio estimator. However, in their case two different point-wise sample sizes were involved, the total sample size and a subsample size, which is not the case for the big BAF estimator. That means some of the formulas for the variances and covariances within Equation 11 of Palley and Horwitz (1961) would not be the same as for big BAF sampling because they would depend on the sub-sample size rather than the total sample size. Note that \(\hat {B}_{{\mathrm {c}}}, \hat {B}_{\mathrm {v}}\) and \(\hat {V}_{\mathrm {v}}\) are means of independent identical HPS samples so that \(\text {var}\!\left ({\hat {B}_{\mathrm {v}}}\right)\) = \(\frac {\text {var}\!\left ({\hat {B}_{\mathrm {v}_{s}}}\right) }{n}, \text {cov}\!\left ({\hat {B}_{{\mathrm {c}}}}, {\hat {B}_{\mathrm {v}}}\right) =\frac {\text {cov}\!\left ({\hat {B}_{{\mathrm {c}}_{s}}}, {\hat {B}_{\mathrm {v}_{s}}}\right)}{n}, \text {cov}\!\left ({\hat {B}_{{\mathrm {c}}}}, {\hat {V}_{\mathrm {v}}}\right)=\frac {\text {cov}\!\left ({\hat {B}_{{\mathrm {c}}_{s}}}, {\hat {V}_{\mathrm {v}_{s}}}\right)}{n}\) and \(\text {cov}\!\left ({\hat {B}_{\mathrm {v}}}, {\hat {V}_{\mathrm {v}}}\right) =\frac {\text {cov}\!\left ({\hat {B}_{\mathrm {v}_{s}}}{\hat {V}_{\mathrm {v}_{s}}}\right)}{n}\) leading to:

$$\begin{array}{*{20}l} Bias &= \frac{V}{n} \left(\frac{\text{var}\!\left({\hat{B}_{\mathrm{v}_{s}}}\right)}{B^{2}} - \frac{\text{cov}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}}\right) }{B^{2}} \right.\notag\\ &\quad{}-\left.\frac{\text{cov}\!\left({\hat{V}_{\mathrm{v}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}}\right) }{VB} + \frac{\text{cov}\!\left({\hat{V}_{\mathrm{v}_{s}}}, {\hat{B}_{{\mathrm{c}}_{s}}}\right) }{VB} \right) \end{array} $$
(A.11)

We note that this expression approaches zero as n becomes large on the order of \(\frac {1}{n}\) which is similar to the behavior of the standard ratio estimator according to Cochran (1977, p. 160).

Exact bias

We can mathematically investigate the bias in the big BAF estimator by a method similar to that give by Cochran (1977, p. 162) and originally developed by Hartley and Ross (1954). In the derivations to simplify notation we assume without loss of generality area A=1 because the final results are ratios that do not involve area. Consider the covariance between the big BAF estimator \({\hat {V}_{\mathcal {B}}}\) and the basal area estimate from the small BAF factor angle gauge \(\hat {B}_{\mathrm {v}}\)

$$ \text{cov}\!\left({{\hat{V}_{\mathcal{B}}}},{\hat{B}_{\mathrm{v}}} \right) = \mathrm{E}\!\left[ {{\hat{V}_{\mathcal{B}}} \hat{B}_{\mathrm{v}}}\right] - \mathrm{E}\!\left[ {{\hat{V}_{\mathcal{B}}}}\right] \mathrm{E}\!\left[ {\hat{B}_{\mathrm{v}}}\right] $$
(A.12)

Now because \(\hat {B}_{\mathrm {v}}\) is known to be a design-unbiased HPS estimator of the true basal area B and by (12) we have \({\hat {V}_{\mathcal {B}}} \hat {B}_{\mathrm {v}} = \hat {V}_{\mathrm {v}} \hat {B}_{{\mathrm {c}}}\) the following results:

$$ \text{cov}\!\left({{\hat{V}_{\mathcal{B}}}}, {\hat{B}_{\mathrm{v}}}\right) = \mathrm{E}\!\left[ {\hat{V}_{\mathrm{v}} \hat{B}_{{\mathrm{c}}}}\right] - B \mathrm{E}\!\left[ {{\hat{V}_{\mathcal{B}}}}\right] $$
(A.13)

Then by the definition of the covariance and the fact that \(\hat {V}_{\mathrm {v}}\) and \(\hat {B}_{{\mathrm {c}}}\) are design-unbiased HPS estimators (Palley and Horwitz 1961) of total volume V and basal area B respectively we have \(\mathrm {E}\!\left [ {\hat {V}_{\mathrm {v}} \hat {B}_{{\mathrm {c}}}}\right ] = \text {cov}\!\left ({\hat {V}_{\mathrm {v}}}, {\hat {B}_{{\mathrm {c}}}}\right) + BV \) resulting in:

$$ \text{cov}\!\left({{\hat{V}_{\mathcal{B}}}}, {\hat{B}_{\mathrm{v}}}\right) = \text{cov}\!\left({\hat{V}_{\mathrm{v}}}, {\hat{B}_{{\mathrm{c}}}}\right) + BV - B \mathrm{E}\!\left[ {{\hat{V}_{\mathcal{B}}}}\right] $$
(A.14)

Now we may quantify the exact bias as:

$$ Bias = \left(\mathrm{E}\!\left[ {{\hat{V}_{\mathcal{B}}}}\right] - V \right) = \frac{\text{cov}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{V}_{\mathrm{v}}}\right) - \text{cov}\!\left({\hat{V}_{\mathcal{B}}}, {\hat{B}_{\mathrm{v}}}\right)}{B} $$
(A.15)

Using the definition of the correlation coefficient ρ the absolute value of the bias is then

$$\begin{array}{*{20}l} \left| Bias \right| &= \frac{1}{B} \left| \left(\rho_{\hat{B}_{{\mathrm{c}}},\hat{V}_{\mathrm{v}}}\right) \sqrt{\text{var}\!\left({\hat{B}_{{\mathrm{c}}}} \right) \text{var}\!\left({\hat{V}_{\mathrm{v}}} \right)}\right. \notag\\ &\quad{}- \left. \left(\rho_{\hat{B}_{\mathrm{v}},{\hat{V}_{\mathcal{B}}}}\right) \sqrt{\text{var}\!\left({\hat{B}_{\mathrm{v}}}\right) \text{var}\!\left({{\hat{V}_{\mathcal{B}}}}\right)} \right| \end{array} $$
(A.16)

For forest populations we generally expect that estimates of basal area and volume from common samples on the same populations would be positively correlated so that \(0\le \rho {\left ({\hat {B}_{\mathrm {c}},\hat {V}_{\mathrm {v}}}\right)} \le 1\) and \(0\le \rho {\left ({\hat {B}_{\mathrm {v}},\hat {V}_{\mathcal {B}}}\right)}\le 1\) so that the maximum value of the correlations is one. If the correlations are positive the maximum possible difference in the absolute value on right hand side of the equation above occurs when one of the terms is zero and the other is greater than zero we have:

$$\begin{array}{*{20}l} \left| Bias \right| &\le \frac{1}{B} \max\left(\sqrt{\text{var}\!\left({\hat{B}_{{\mathrm{c}}}}\right) \text{var}\!\left({\hat{V}_{\mathrm{v}}}\right)}, \right.\notag\\ &\quad\left. \sqrt{\text{var}\!\left({\hat{B}_{\mathrm{v}}} \right) \text{var}\!\left({{\hat{V}_{\mathcal{B}}}}\right)} \right) \end{array} $$
(A.17)

In the case where \(\left (\text {var}\!\left ({\hat {B}_{{\mathrm {c}}}} \right) \text {var}\!\left ({\hat {V}_{\mathrm {v}}} \right)\right) \ge \left (\text {var}\!\left ({\hat {B}_{\mathrm {v}}}\right) \linebreak \text {var}\!\left ({{\hat {V}_{\mathcal {B}}}} \right) \right)\) we have

$$ \frac{\left| Bias \right|}{\sqrt{\text{var}\!\left({{\hat{V}_{\mathcal{B}}}}\right)}} \le \frac{\sqrt{\text{var}\!\left({\hat{B}_{{\mathrm{c}}}} \right) \text{var}\!\left({\hat{V}_{\mathrm{v}}}\right)}} {B\sqrt{\text{var}\!\left({{\hat{V}_{\mathcal{B}}}}\right)}} $$
(A.18)

Because \(\text {var}\!\left ({{\hat {V}_{\mathcal {B}}}}\right) \) approaches \(\text {var}\!\left ({\hat {V}_{\mathrm {v}}}\right) \) as a maximum as the small BAF factor approaches the large BAF factor we have \(\text {var}\!\left ({{\hat {V}_{\mathcal {B}}}}\right) \ge \text {var}\!\left ({\hat {V}_{\mathrm {v}}}\right) \) leading to

$$ \frac{\left|Bias\right|}{\sigma_{{\hat{V}_{\mathcal{B}}}}} = \frac{\left| Bias \right|}{\sqrt{\text{var}\!\left({{\hat{V}_{\mathcal{B}}}}\right) }}\le \frac{\sqrt{\text{var}\!\left({\hat{B}_{{\mathrm{c}}}}\right) }}{B} = C_{\hat{B}_{{\mathrm{c}}}} $$
(A.19)

where \(\sigma _{{\hat {V}_{\mathcal {B}}}} = \sqrt {\text {var}\!\left ({{\hat {V}_{\mathcal {B}}}}\right)} \) is the standard error of the big BAF estimator and \(C_{\hat {B}_{{\mathrm {c}}}}\) is the coefficient of variation for the basal area estimate with the smaller BAF factor BAFc. Because \(\hat {B}_{{\mathrm {c}}}\) is the mean of independent identically distributed HPS point samples we have \(\text {var}\!\left ({\hat {B}_{{\mathrm {c}}}}\right) = \frac {\text {var}\!\left ({B_{{\mathrm {c}}}}\right) }{n}\) where var (Bc) is the variance among point-wise basal area estimates. This leads to:

$$ \frac{\left|Bias\right|}{\sigma_{{\hat{V}_{\mathcal{B}}}}} \le \frac{1}{\sqrt{n}} \frac{\sqrt{\text{var}\!\left({B_{{\mathrm{c}}}}\right)}}{B} $$
(A.20)

Because B and var (Bc) are constant population values, as n approaches infinity the right-hand side of the equation above goes to zero, with the result that the relative bias for the big BAF estimator approaches zero on the order of \(\frac {1}{\sqrt {n}}\) (using “big O” notation \(O(\frac {1}{\sqrt {n}}\))).

If \(\left (\text {var}\!\left ({\hat {B}_{{\mathrm {c}}}}\right) \text {var}\!\left ({\hat {V}_{\mathrm {v}}}\right) \right) \le \left (\text {var}\!\left ({\hat {B}_{\mathrm {v}}}\right) \text {var}\!\left ({{\hat {V}_{\mathcal {B}}}}\right)\right)\) in a similar way we have

$$ \frac{\left|Bias\right|}{\sigma_{{\hat{V}_{\mathcal{B}}}}} \le \frac{1}{\sqrt{n}} \frac{\sqrt{\text{var}\!\left({B_{\mathrm{v}}}\right) }}{B} $$
(A.21)

Once again because B and var (Bv) are population constants the relative bias in the big BAF estimator approaches zero as n approaches infinity.

In the unusual case were one of the correlations between basal area and volume estimates may be negative, we can posit

$$ \frac{\left|Bias\right|}{\sigma_{{\hat{V}_{\mathcal{B}}}}} \le \frac{1}{\sqrt{n}} \left(\frac{\sqrt{\text{var}\!\left({B_{\mathrm{v}}}\right) }}{B} + \frac{\sqrt{\text{var}\!\left({B_{{\mathrm{c}}}}\right)}}{B} \right) $$
(A.22)

In this case as well the relative bias in the big BAF estimator approaches zero as n approaches infinity. Thus for all three possible cases the relative bias in the big BAF estimator approaches zero as the sample size approaches infinity on the order of \(\frac {1}{\sqrt {n}}\). According to Cochran (1977, p. 160) this is similar to the behavior of the standard ratio estimator often used in sample surveys.

Covariance between basal area and volume basal area ratios

The covariance between two possibly nonlinear functions \(f(\hat {\boldsymbol {\theta }})\) and \(h(\hat {\boldsymbol {\theta }})\) can be approximated using Taylor’s series methods in a way similar to the derivation of the Delta method. The following approximation formula is based on Kendall and Stuart (1977, p. 247)

$$\begin{array}{*{20}l} \text{cov}\!\left({f(\hat{\boldsymbol{\theta}})}, {h(\hat{\boldsymbol{\theta}})} \right) &\approx \sum_{i=1}^{n} \text{var}\!\left({\hat{\theta}_{i}}\right) \frac{\partial f(\hat{\boldsymbol{\theta}})}{\partial \hat{\theta}_{i}} \frac{\partial h(\hat{\boldsymbol{\theta}})}{\partial\hat{\theta}_{i}} \notag \\ &\mspace{-6mu}{}+ \mathop{\sum\sum}_{i\neq j} \text{cov}\!\left({\hat{\theta}_{i}}, {\hat{\theta}_{j}} \right) \frac{\partial f(\hat{\boldsymbol{\theta}})}{\partial \hat{\theta}_{i}} \frac{\partial h(\hat{\boldsymbol{\theta}})}{\partial\hat{\theta}_{j}} \end{array} $$
(A.23)

To find the approximate covariance between estimated basal area and the ratio of the estimates of mean volume to mean basal area we define:

$$ f(\hat{\boldsymbol{\theta}}) = \hat{B}_{{\mathrm{c}}} $$
(A.24)

and the ratio of the estimates of mean volume and mean basal area:

$$ h(\hat{\boldsymbol{\theta}}) = \frac{\hat{V}_{\mathrm{v}}} {\hat{B}_{\mathrm{v}}} $$
(A.25)

We then obtain the following partial derivatives needed for the approximation formula:

$$\begin{array}{*{20}l} \frac{\partial f}{\partial\hat{B}_{{\mathrm{c}}}} &= 1 \end{array} $$
(A.26)
$$\begin{array}{*{20}l} \frac{\partial f}{\partial\hat{V}_{\mathrm{v}}} &= 0 \end{array} $$
(A.27)
$$\begin{array}{*{20}l} \frac{\partial f}{\partial\hat{B}_{\mathrm{v}}} &= 0 \end{array} $$
(A.28)
$$\begin{array}{*{20}l} \frac{\partial h}{\partial\hat{B}_{{\mathrm{c}}}} &= 0 \end{array} $$
(A.29)
$$\begin{array}{*{20}l} \frac{\partial h}{\partial\hat{V}_{\mathrm{v}}} &= \frac{1}{\hat{B}_{\mathrm{v}}} \end{array} $$
(A.30)
$$\begin{array}{*{20}l} \frac{\partial h}{\partial\hat{V}_{\mathrm{v}}} &= \frac{-\hat{V}_{\mathrm{v}}}{\hat{B}^{2}_{\mathrm{v}}} \end{array} $$
(A.31)

Finally inserting the function definitions and partial derivatives in to the approximation formula we find the following approximate estimator for the covariance between the basal area estimate and the volume basal area estimate:

$$\begin{array}{*{20}l} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\frac{\hat{V}_{\mathrm{v}}} {\hat{B}_{\mathrm{v}}}}\right) &= \frac{1}{\hat{B}_{\mathrm{v}}}\left(\widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{V}_{\mathrm{v}}} \right) \right. \notag\\ &\left.-\frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\hat{B}_{\mathrm{v}}} \right) \right) \end{array} $$
(A.32)

We may factor a quantity of \(\frac {1}{n}\) from the parentheses:

$$\begin{array}{*{20}l} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}}}, {\frac{\hat{V}_{\mathrm{v}}} {\hat{B}_{\mathrm{v}}}} \right) &= \frac{1}{n\hat{B}_{\mathrm{v}}}\left(\widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{V}_{\mathrm{v}_{s}}} \right) \right. \notag\\ &\left.- \frac{\hat{V}_{\mathrm{v}}}{\hat{B}_{\mathrm{v}}} \widehat{\text{cov}}\!\left({\hat{B}_{{\mathrm{c}}_{s}}}, {\hat{B}_{\mathrm{v}_{s}}} \right) \right) \end{array} $$
(A.33)

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations

Abbreviations

BAF:

Basal area factor

BAFc :

Count sample basal area factor

BAFv :

Volume sample basal area factor

HPS:

Horizontal point sampling

VBAR:

Volume to basal area ratio

References

  1. Bailey, RL, Dell TR (1973) Quantifying diameter distributions with the Weibull function. For Sci 19:97–104.

    Google Scholar 

  2. Bell, JF, Alexander LB (1957) Application of the variable plot method of sampling forest stands. Research Note 30, Oregon State Board of Forestry.

  3. Bell, JF, Iles K, Marshall DD (1983) Balancing the ratio of tree count-only sample points and VBAR measurements in variable plot sampling. In: Bell JF Atterbury T (eds)Renewable Resouce Inventories for Monitoring Changes and Trends, 699–702.. College of Forestry, OSU, Corvallis, Oregon.

    Google Scholar 

  4. Brooks, JR (2006) An evaluation of big basal area factor sampling in Appalachian hardwoods. North J Appl For 23(1):52–65.

    Google Scholar 

  5. Bruce, D (1961) Prism Cruising in the western United States with volume tables for use therewith. Tech. rep. Mason, Bruce & Girard Consulting Foresters, Portland, Oregon.

  6. Chen, Y, Yang TR, Hsu YH, Kershaw JA, Prest D (2019) Application of big BAF sampling for estimating carbon on small woodlots. Forest Ecosyst 6(13):1–11.

    Google Scholar 

  7. Cochran, W (1977) Sampling techniques. John Wiley, New York.

    Google Scholar 

  8. Corrin, D (1998) A very Efficient sampling method for cruising timber. Tech. rep. John Bell Associates. http://www.john-bell-associates.com/guest/guest43a.htm. Accessed 17 Oct 2020.

  9. de Vries, PG (1986) Sampling Theory for Forest Inventory. A Teach Yourself Course. Springer-Verlag.

  10. Desmarais, KM (2002) Using BigBAF Sampling in a New England Mixedwood Forest. Tech. rep. John Bell Associates. http://www.john-bell-associates.com/guest/guest58b.htm. Accessed 17 Oct 2020.

  11. Fast, AJ, Ducey MJ (2011) Height-diameter equations for select New Hampshire tree species. North J Appl For 28(3):157–160.

    Article  Google Scholar 

  12. Goodman, LA (1960) On the exact variance of products. J Am Stat Assoc 55(292):708–713.

    Article  Google Scholar 

  13. Goodman, LA (1962) The Variance of the Product of K Random Variables. J Am Stat Assoc 57(297):54–60.

    Google Scholar 

  14. Gove, JH (2011a) The Dendrometry Package. https://r-forge.r-project.org/projects/dendrometry/. Accessed 17 Oct 2020.

  15. Gove, JH (2011b) The “Stem” Class. sampSurf package vignette. http://CRAN.R-project.org/package=sampSurf. Accessed 17 Oct 2020.

  16. Gove, JH (2012) sampSurf: Sampling surface simulation. https://r-forge.r-project.org/projects/sampsurf/. Accessed 17 Oct 2020.

  17. Gove, JH, Gregoire TG, Ducey MJ, Lynch TB (2020) A note on the estimation of variance for big BAF sampling. Forest Ecosyst 7(62):1–14.

    Google Scholar 

  18. Gove, JH, Valentine HT, Holmes MJ (2000) A field test of cut-off importance sampling for bole volume. In: Hansen M Burk T (eds)Integrated tools for natural resources inventories in the 21st century, 372–376. U.S. Dept. of Agriculture, Forest Service, North Central Forest Experiment Station, St. Paul, MN, General Technical Report NC-212.

  19. Gregoire, TG, Valentine HT (2008) Sampling strategies for natural resources and the environment. Applied environmental statistics. Chapman & Hall/CRC, N.Y.

  20. Grosenbaugh, LR (1952) Plotless timber estimates, new, fast, easy. J For 50:32–37.

    Google Scholar 

  21. Hansen, M, Hurwitz W, Madow W (1953) Sample survey methods and theory vol 1. John Wiley.

  22. Hartley, HO, Ross A (1954) Unbiased ratio estimates. Nature 174:220–271.

    Google Scholar 

  23. Iles, K (2012) Some current subsampling techniques in forestry. Math Comput For Nat-Resour Sci 4(2):77–80.

    Google Scholar 

  24. Kendall, M, Stuart A (1977) The advanced theory of statistics. 4th edn, Vol. 1. Macmillan.

  25. Kershaw, JA, Ducey MJ, Beers T, Husch B (2016) Forest Mensuration. 5th edn. Wiley-Blackwell.

  26. Leak, WB, Lamson NI (1999) Revised white pine stocking guide for managed stands. Tech. Rep. NA-TP-01-99, USDA Forest Service, Northeastern Area State and Private Forestry.

  27. Leak, WB, Yamasaki M, Holleran R (2014) Silvicultural Guide for Northern Hardwoods in the Northeast. General Technical Report NRS-132, USDA Forest Service, Northern Research Station.

  28. Marshall, DD, Iles K, Bell JF (2004) Using a large-angle gauge to select trees for measurement in variable plot sampling. Can J Forest Res 34:840–845.

    Article  Google Scholar 

  29. Masuyama, M (1953) A rapid method for estimating basal area in a timber survey—an application of integral geometry to areal sampling problems. Sankhyā 12(3):291–302.

    Google Scholar 

  30. Oderwald, RG, Jones E (1992) Sample sizes for point, double sampling. Can J Forest Res 22:980–983.

    Article  Google Scholar 

  31. Palley, MN, Horwitz LG (1961) Properties of some random and systematic point sampling estimators. Forest Sci 7:52–65.

    Google Scholar 

  32. R Core Team (2021) R: A Language and Environment for Statistical Computing, Vienna, Austria. https://www.R-project.org. Accessed 17 Oct 2020.

  33. Rice, B, Weiskittel AR, Wagner RG (2014) Efficiency of alternative forest inventory methods in partially harvested stands. Eur J Forest Res 133:261–272.

    Article  Google Scholar 

  34. Schreuder, H, Gregoire T, Wood G (1993) Sampling methods for multiresource forest inventory. John Wiley, New York.

    Google Scholar 

  35. Seber, GAF (1982) The estimation of animal abundance and related parameters. 2nd edn. Charles Griffin & Company LTD.

  36. Sukhatme, P, Sukhatme B, Sukhatme S, Asok C (1984) Sampling theory of surveys with applications. 3rd edn. Iowa State Univeristy Press.

  37. Van Deusen, P (1990) Critical height versus importance sampling for log volume: does critical height prevail?Forest Sci 36(4):930–938.

    Google Scholar 

  38. Venables, WN, Ripley BD (2002) Modern Applied Statistics with S. 4th edn. Springer, New York. http://www.stats.ox.ac.uk/pub/MASS4. Accessed 17 Oct 2020. iSBN 0-387-95457-0.

    Book  Google Scholar 

  39. Ver Hoef, JM (2012) Who Invented the Delta Method?Am Stat 66(2):124–127.

    Article  Google Scholar 

  40. Williams, MS (2001a) New approach to areal sampling in ecological surveys. Forest Ecol Manag 154:11–22.

  41. Williams, MS (2001b) Nonuniform random sampling: an alternative method of variance reduction for forest surveys. Can J Forest Res 31:2080–2088.

  42. Wolter, KM (2007) Introduction to variance estimation. Springer-Verlag.

  43. Yang, S, Burkhart H (2019) Comparison of volume and stand table estimates with alternate methods for selecting measurement trees in point samples. Forestry 92:42–51.

    Article  Google Scholar 

  44. Yang, TR, Hsu YH, Kershaw ME, Kilham D (2017) Big BAF sampling in mixed species forest structures of northeastern North America: influence of count and measure BAF under cost constraints. Forestry 90:649–660.

    Article  Google Scholar 

Download references

Acknowledgements

This is the second in a series of two papers dedicated to a pioneer of research and education in point sampling. In recognition of his many contributions and insights in areas relating to forest inventory, and his influential teaching and mentoring of many students, professionals and inventory scientists, we dedicate this paper in memory of our esteemed colleague Dr. John F. Bell.

Funding

MJD: Support was provided by Research Joint Venture Agreement 17-JV-11242306045, “Old Growth Forest Dynamics and Structure,” between the USDA Forest Service and the University of New Hampshire. Additional support to MJD was provided by the USDA National Institute of Food and Agriculture McIntire-Stennis Project Accession Number 1020142, “Forest Structure, Volume, and Biomass in the Northeastern United States.” TBL: This work was supported by the USDA National Institute of Food and Agriculture, McIntire-Stennis project OKL0 2834 and the Division of Agricultural Sciences and Natural Resources at Oklahoma State University.

Author information

Affiliations

Authors

Contributions

TBL initial concept, writing main manuscript, deriving bias and variance equations; JHG conducting computer simulations, deriving equations, assisted in writing manuscript, writing Supplemental Materials; TGG and MJD assisted in study design and contributed ideas, text and comments to the manuscript. All authors read and approved the final manuscript. The findings and conclusions in this publication are those of the authors and should not be construed to represent any official USDA or U.S. Government determination or policy.

Corresponding author

Correspondence to Thomas B. Lynch.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

Supplementary Material: An Approximate Point-Based Alternative for the Estimation of Variance under Big BAF Sampling

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lynch, T.B., Gove, J.H., Gregoire, T.G. et al. An approximate point-based alternative for the estimation of variance under big BAF sampling. For. Ecosyst. 8, 33 (2021). https://doi.org/10.1186/s40663-021-00304-0

Download citation

Keywords

  • Bitterlich sampling
  • Delta method
  • Double sampling
  • Estimator bias
  • Forest inventory
  • Horizontal point sampling
  • Variance of a product
  • Volume basal area ratio
  • Covariance estimation