Use of models in largearea forest surveys: comparing modelassisted, modelbased and hybrid estimation
 Göran Ståhl^{1},
 Svetlana Saarela^{1}Email authorView ORCID ID profile,
 Sebastian Schnell^{1},
 Sören Holm^{1},
 Johannes Breidenbach^{2},
 Sean P. Healey^{3},
 Paul L. Patterson^{3},
 Steen Magnussen^{4},
 Erik Næsset^{5},
 Ronald E. McRoberts^{3} and
 Timothy G. Gregoire^{6}
Received: 12 November 2015
Accepted: 17 February 2016
Published: 18 February 2016
Abstract
This paper focuses on the use of models for increasing the precision of estimators in largearea forest surveys. It is motivated by the increasing availability of remotely sensed data, which facilitates the development of models predicting the variables of interest in forest surveys. We present, review and compare three different estimation frameworks where models play a core role: modelassisted, modelbased, and hybrid estimation. The first two are well known, whereas the third has only recently been introduced in forest surveys. Hybrid inference mixes designbased and modelbased inference, since it relies on a probability sample of auxiliary data and a model predicting the target variable from the auxiliary data..We review studies on largearea forest surveys based on modelassisted, modelbased, and hybrid estimation, and discuss advantages and disadvantages of the approaches. We conclude that no general recommendations can be made about whether modelassisted, modelbased, or hybrid estimation should be preferred. The choice depends on the objective of the survey and the possibilities to acquire appropriate field and remotely sensed data. We also conclude that modelling approaches can only be successfully applied for estimating target variables such as growing stock volume or biomass, which are adequately related to commonly available remotely sensed data, and thus purely field based surveys remain important for several important forest parameters.
Keywords
Designbased inference Modelassisted estimation Modelbased inference Hybrid inference National forest inventory Remote sensing SamplingIntroduction
Use of models in largearea surveys of forests is attracting increased interest. The reason is the improved availability of auxiliary data from various remote sensing platforms. Aerial photographs (e.g., Næsset 2002a, Bohlin et al. 2012) and optical satellite data (e.g., Reese et al. 2002) have been available and used operationally for many decades, while data from profiling (e.g., Nelson et al. 1984, Nelson et al. 1988) and scanning lasers (e.g., Næsset 1997) and radars (Solberg et al. 2010) have become available for practical applications more recently. Some of the new types of remotely sensed data, such as data from laser scanners, have already become widely applied in forest inventories (e.g., Næsset 2002b). A common application involves the development of models that are applied walltowall over an area of interest (e.g., Næsset 2004), often for providing data for forest management. However, this type of data is increasingly applied also in connection with largearea forest surveys, such as nationallevel forest inventories (Tomppo et al. 2010, Asner et al. 2012).
Applications of models in largearea forest surveys often use the modelassisted estimation framework (Särndal et al. 1992) where a model is used to support the estimation following probability sampling within the context of designbased inference (Gregoire 1998). Importantly, an inadequately specified model will not make the estimators biased in this case, but only affect the variance of the estimators. Examples of largearea forest inventory applications include Andersen et al. (2011) who applied the technique in Alaska, Gregoire et al. (2011) and Gobakken et al. (2012), who applied it in Hedmark County, Norway, and Saarela et al. (2015a) who used it in Kuortane, Finland.
Some applications of models in largearea forest surveys involve modelbased inference (Gregoire 1998), which to a larger extent than modelassisted estimation relies on model assumptions. In this case an inadequately specified model might make the estimators both biased and imprecise. On the other hand, with accurate models this mode of inference can be very efficient (e.g., Magnussen 2015). Examples of applications in forest inventory include McRoberts (2006, 2010), who used modelbased inference for estimating forest area based on Landsat data in northern Minnesota, U.S.A., Ståhl et al. (2011) who used it for estimating biomass in Hedmark, Norway, using laser data, and Healey et al. (2012) who applied the technique in California, U.S.A., using data from the spaceborne Geoscience Laser Altimeter System (GLAS).
Nonparametric modelling, applying methods such as the kNearest Neighbours (kNN) technique (Tomppo and Katila 1991, Tomppo et al. 2008), has a long tradition in forest inventories. These techniques typically have been applied for providing smallarea estimates through combining field sample plots and various sources of remotely sensed data. However, the kNN technique has also been used in connection with modelassisted estimation (e.g., Baffetta et al. 2009, 2011, Magnussen and Tomppo 2015) and modelbased inference (e.g., McRoberts et al. 2007).
The objective of this paper was to present, review and discuss how models are applied in the case of modelassisted and modelbased estimation in largearea forest surveys, and to discuss advantages and disadvantages of the two estimation frameworks in this context. We also present, review and discuss a newly introduced estimation framework where probability sampling is applied for the selection of auxiliary data, upon which modelbased inference is applied in a second phase. This framework in denoted hybrid inference, after Corona et al. (2014).
We restrict the study to largearea estimation. This is the case of national forest inventories and greenhouse gas inventories under the United Nations Framework Convention on Climate Change (e.g., Tomppo et al. 2010). Importantly, in this case there is no need to make assumptions about residual error terms linked to individual population elements, which is a core issue in modelbased smallarea estimation (e.g., Breidenbach and Astrup 2012, Breidenbach et al. 2015). The reason is that the residual error terms will have almost no influence on the results, as will be demonstrated below. However, we do not specify how large a “large area” must be, but use the term as a general concept.
Below, we present the basics of modelassisted, modelbased, and hybrid inference (chapter 2). Subsequently we present a brief review of the application of these methods in forest surveys (chapter 3), and, finally, we discuss advantages and disadvantages of the different approaches and make conclusions (chapters 4 and 5).
Basics of modelassisted, modelbased and hybrid estimation
In this chapter we summarize some basic concepts related to the use of models in largearea forest surveys. We restrict the scope to cases where models are applied for improving estimators (or predictors) once sample or walltowall data have been collected. However, models may also be used in the design phase for improving the sample selection (e.g., Fattorini et al. 2009, Grafström et al. 2014), but such cases are not covered in this article.
Designbased inference
This paper requires a basic understanding of the concepts designbased and modelbased inference (e.g., Cassel et al. 1977, Särndal 1978, Gregoire 1998, McRoberts 2010).
Designbased inference typically assumes a finite population of elements to which one or more fixed target quantities are linked. The objective normally is to estimate some fixed population parameter, such as the total or the mean of these quantities (e.g., Gregoire and Valentine 2008). In order to estimate the fixed but unknown parameters a probability sample is selected from the population according to some appropriate sampling design, which assigns positive inclusion probabilities to each element. Mathematical formulas (estimators) are used for estimating the parameters based on the sample data. The estimates are random variables due to the random selection of samples, i.e., the estimators produce different values depending on which population elements are included in the sample.
Here, y _{ i } is the variable of interest for the i:th sampled element, π _{ i } is the inclusion probability, and s is the sample.
The precision of an estimator is usually expressed through its variance, which is a fixed quantity given the population, the design, and the estimator. The variance usually can be estimated through a variance estimator, and confidence intervals can be computed as a means to provide decision makers with the range of values wherein the true population parameter is located with a defined probability.
In addition to the previously introduced notation, π _{ ij } is the joint probability of inclusion for unit i and j. The step from the variance to a variance estimator and a confidence interval normally is straightforward (e.g., Gregoire and Valentine 2008).

The values that are linked to the population elements are fixed

The population parameters about which we wish to infer information are also fixed

Our estimators of the parameters are random because a probability sample is selected according to some sampling design, such as simple random sampling

The probability of obtaining different samples can be deduced from the design and used for inference
The foundations of designbased inference were laid out by Neyman (1934) and it is the standard mode of inference in most statistical surveys, including samplebased national forest inventories (Tomppo et al. 2010) that are carried out in a large number of countries.
Designbased inference through modelassisted estimation
This is almost the same expression as the variance in Eq. (2), but the y _{ i } terms have been replaced by e _{ i } = y _{ i } − ŷ _{ i }. If an accurate model is used the latter terms should be much smaller than the former, and thus the variance of the modelassisted estimator should be much smaller than the variance of the ordinary HorvitzThompson estimator, although this is not immediately clear when comparing Eq. 2 and Eq. 4.
Modelbased inference
In contrast to designbased inference (including modelassisted estimators), a basic assumption underlying modelbased inference is that the values that are linked to the elements in the population are realizations of random variables. As a consequence, target survey quantities such as population totals and means are also random variables. Thus, due to the different points of view underlying designbased and modelbased inference some caution must be exercised when comparing results from the two inferential frameworks. For example, with modelbased inference the random population total (or mean) may be predicted or (as in this study) the expected value of the population total may be estimated. For large population the difference between these two quantities, in relative terms, typically is minor although for small populations the relative difference may be substantial. However, just like designbased inference, modelbased inference in many cases is a useful and straightforward approach for quantifying target features of a population (e.g., Chambers and Clark 2012). In forest inventories, examples of such cases are surveys of remote areas with poor road infrastructure and smallarea estimation for forest management. In both cases the field sample sizes typically are small or acquired through nonprobability sampling whereas remotely sensed data are available walltowall.
where Y is an N × 1 matrix of the target variable, X an N × p matrix of auxiliary data, β is a p × 1 matrix of model parameters, and ϵ an N × 1 matrix of random variables that follow some joint probability distribution; N is the population size; in a forest survey it might be the number of grid cells which tessellate the study area.
Note the distinction in nomenclature between estimating a fixed but unknown value (a population parameter) and predicting a random variable (e.g., Särndal 1978, Gregoire 1998). Note also that some authors (Chambers and Clark 2012) present the modelbased predictor as a sum of two terms: the sum of the values of the sampled elements and the sum of the predictions for the nonsampled elements. The difference between such a predictor and Eq. (6) would, however, be very small in case a small sample is selected from a large population.
Turning to the mean square error of the predictor in Eq. (6) we need to acknowledge that uncertainty is introduced both by the estimation of the model parameters and by the random residual terms linked to each population element. Since the residuals may often be spatially autocorrelated estimating the mean square error of the Eq. (6) predictor may be very complicated.
However, an important feature of largearea surveys is that the relative difference between τ ^{*} and E (τ ^{*}) typically is very small (e.g., Chambers and Clark 2012, p. 16). The relative difference is 1′ε/(1′Xβ + 1′ε), which intuitively can be seen to tend to zero as N tends to infinity, since in the cases we focus on the X _{ i } β terms are almost always positive and typically much larger (in absolute value) than the residual terms, which may be either negative or positive. Thus, instead of predicting τ ^{*}, in largearea estimation we can estimate E (τ ^{*}), which simplifies the modelbased inference. The estimator will be identical to Eq. (6), i.e., \( \widehat{E\left({\tau}^{*}\right)} = {\mathbf{1}}^{\mathbf{\prime}}\widehat{\mathbf{Y}} \), but it is now an estimator rather than a predictor. The variance (due to the model) of this estimator is simpler to derive, since it does not involve any residual terms; thus uncertainty in this case is introduced only through the model parameter estimation.
The matrix \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left(\widehat{\boldsymbol{\beta}}\right) \) is the variancecovariance matrix of the model parameter estimates. A variance estimator is obtained by inserting the estimated covariance matrix in Eq. (7).

The values linked to population elements are random variables

Since the individual values are random variables so is the population total or mean that we wish to predict

A model for the relationship between the target variable and one or more auxiliary variable(s) can adequately conform to the trend in Y.

Auxiliary data are commonly available for all population elements

After having selected a sample – that need not be random – for estimating the model parameters, we apply the fitted model for predicting the target population quantity or estimating the expected value of this quantity.
Hybrid inference: a special case of modelbased inference
Auxiliary data may not be available prior to a forest survey and they may be very expensive to collect for all units in a population, as required for standard application of modelbased inference. In such cases a probability sample of auxiliary data can be acquired, based on which the population total or mean of the auxiliary variable is estimated following designbased inference. A model can still be specified and applied regarding the relationship between the study variable and the auxiliary variables, and thus modelbased inference can be applied once the auxiliary variable totals (or means) have been estimated through designbased inference.
Thus, designbased principles are applied in a first phase and modelbased principles in a second phase. This approach was termed hybrid inference by Corona et al. (2014) and in the present paper we follow that terminology. In a previous study by Mandallaz (2013) it was called pseudosynthetic estimation. In a study by Ståhl et al. (2011) it was simply called modelbased inference, although later denoted modeldependent estimation by Gobakken et al. (2012). However, the term modeldependent estimation appears to have been first proposed by Hansen et al. (1978, 1983) to include all sampling strategies that depend on the correctness of a model; according to Hansen et al. (1978) “a modeldependent design consists of a sampling plan and estimators for which either the plan or the estimators, or both, are chosen because they have desirable properties under an assumed model, and for which the validity of inferences about the population depends on the degree to which the population conforms to the assumed model.” Thus, standard modelbased inference as well as hybrid inference, and other approaches, belong to Hansen’s modeldependent category.
where s is the sample of auxiliary data, π _{ i } is the probability of including population element i into the auxiliary data sample, π is an nlength column vector of (1/π _{ i }) – values, and X is an n × p matrix of sampled auxiliary data. The model parameters are estimated from a sample that is assumed to be independent from the sample of auxiliary data.
where \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left({\widehat{\tau}}_{\boldsymbol{x}}\right) \) is the covariance matrix of the estimators of the auxiliary variable totals and \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left(\widehat{\boldsymbol{\beta}}\right) \) is the covariance matrix of the model parameter estimators. The Troperator is the trace, i.e., the sum of the diagonal entries in the matrix. The diagonal entries in \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left({\widehat{\tau}}_{\boldsymbol{x}}\right) \) are of the kind presented in Eq. (2). The offdiagonal entries are computed in a similar fashion (Särndal et al. 1992). The covariance matrix of the model parameter estimators normally, under ordinary least squares regression assumptions, is derived as σ ^{2}(X′X)^{− 1} where σ ^{2} is the residual variance, given the regression model. In case of heteroskedastic residual variance, alternative estimators can be applied (e.g., Saarela et al. 2015b). We do not offer a proof of Eq. (9), but readers familiar with the variance of a product of two independent random variables (i.e., var(WZ) = E(W)^{2} var(Z) + E(Z)^{2} var(W) + var(W)var(Z)) can identify the similarity with Eq. (9).
Although it seems likely that hybrid type estimators have been applied outside forest inventories, we have not yet found any description of them in nonforest publications.
A brief review of the use of models in largearea forest surveys

Use of models in the context of designbased inference through modelassisted estimation

Use of models in the context of modelbased inference through modelbased estimation

Use of models in the context of hybrid inference
Modelassisted estimation in largearea forest surveys
Formal modelassisted estimators appear to be fairly recently introduced to largearea forest surveys, although standard regression estimators (i.e., a simple kind of modelassisted estimators) have been applied in forest surveys for a long time. An important example of the latter kind is the Swiss national forest inventory (Köhl and Brassel 2001) where air photo interpretation has been combined with field surveys for a long time and the Italian national forest inventory, where a threephase sampling approach is applied (Fattorini et al. 2006).
An early modelassisted study was conducted by Breidt et al. (2005), who used spline models in estimating population totals in a simulation study linked to surveys of forest health. Modelassisted estimation was found to perform well in the context of a twophase survey with multiple auxiliary variables.
Opsomer et al. (2007) used modelassisted estimation in a twophase systematic sampling design, applying generalized additive models linking ground measurements with auxiliary information from remote sensing. The study was an extension of the study by Breidt and Opsomer (2000), where univariate models and a singlephase sampling strategy were applied.
In Boudreau et al. (2008), modelassisted estimation was used for estimating biomass in Quebec, Canada, based on data from a laser profiler, GLAS satellite data, and land cover maps based on data from Landsat7 ETM+. The study demonstrated that GLAS data could improve largescale monitoring of aboveground biomass at large spatial scales; however, the presented estimators were not denoted “modelassisted”. Nelson et al. (2009) built upon the study by Boudreau et al. (2008) and introduced some new, partly modelbased, estimation techniques. Andersen et al. (2009) presented a study based on modelassisted estimation where the biomass of western Kenai, Alaska, was estimated based on samples of field and laser scanner data.
In Gregoire et al. (2011) modelassisted estimation was used for estimating aboveground biomass in Hedmark County, Norway, using sample data from laser profilers and scanners. The study triggered the start of a series of studies where the modelassisted theory, developed by Särndal et al. (1992), was applied for largescale forest surveys based on samples of laser scanner data. Næsset et al. (2011) applied and compared two sources of auxiliary information, laser scanner data and interferometric synthetic aperture radar data for modelassisted estimation of biomass over a large boreal forest area in the AurskogHøland municipality in Norway and quantified to what extent the two types of auxiliary data improved the estimated precision. Gobakken et al. (2012) compared the performance of modelassisted estimation with modelbased prediction of aboveground biomass in Hedmark County, Norway using data from airborne laser scanning as auxiliary data. The two approaches were found to yield similar results. Nelson et al. (2012) conducted a similar study over the same area using data from a profiling rather than scanning airborne laser, while Næsset et al. (2013b) evaluated the precision of the twostage modelassisted estimation conducted by Gobakken et al. (2012). The authors noted the sensitivity of variance estimators to unequal sample strip length and systematically selected strips. The latter issue was further pursued by Ene et al. (2012), who showed that the variance was often severely overestimated when estimators assuming simple random sampling were applied in this context. Similar results were reported by Magnussen et al. (2014).
Strunk et al. (2012a, 2012b) investigated different aspects of modelassisted estimation. For example, the authors found that the laser pulse density had almost no effect on the precision of modelassisted estimators of core parameters, such as basal area, volume, and biomass.
Saarela et al. (2015a) proposed to use probabilityproportionaltosize sampling of laser scanning strips in a twophase modelassisted sampling study where the total growing stock volume was estimated in a boreal forest area in Kuortane, Finland. It was also found that full cover of Landsat auxiliary information improved the precision of estimators compared to using only sampled LiDAR strip data.
Massey et al. (2014) evaluated the performance of the modelassisted estimation technique in connection with the Swiss national forest inventory. The authors also addressed several methodological issues and, among other things, evaluated the performance of nonparametric methods in connection with modelassisted estimation and the close connection between difference estimators and regression estimators.
As some of the first laser scanning campaigns carried out for inventory purposes at the turn of the millennium have been repeated in recent years, change estimation assisted by laser data have become an important research area. Bollandsås et al. (2013), Næsset et al. (2013a, 2015), Skowronski et al. (2014), McRoberts et al. (2015), and Magnussen et al. (2015) analysed different approaches to modelling of change in biomass, such as separate modelling of biomass at each point in time and then estimate the difference, direct modelling of change with different predictor variables, such as the variables at each time point or their differences, and longitudinal models. These modelling techniques have been combined with different designbased and modelbased estimators to produce change estimates and confidence intervals. Sannier et al. (2014) investigated change estimation based on a series of maps, which provided the auxiliary data for modelassisted difference estimation. A comprehensive review and discussion of change estimation can be found in McRoberts et al. (2014, 2015). Melville et al. (2015) evaluated three modelbased and three designbased methods for assessing the number of stems using airborne laser scanning data. The authors reported that among the designbased estimators, the most precise estimates were achieved through stratification.
Stephens et al. (2012) applied double sampling regression estimators in the designbased framework for estimating carbon stocks in New Zealand forests using laser data as auxiliary information.
Chirici et al. (2016) compared the performance of two types of airborne LiDARbased metrics in estimating total aboveground biomass through modelassisted estimators. The study area was located in Molise Region in central Italy. Corona et al. (2015) dealt with the use of map data as auxiliary information in a similar context.
Modelbased and hybrid inference in largearea forest surveys
McRoberts (2006, 2010) applied modelbased inference for estimating forest area using Landsat data as auxiliary information and field plots data. The studies were performed in northern Minnesota, U.S.A. In the studies the expected value of the total forest area was estimated, as a means to reduce the complexity of the variance estimators.
A large number of studies have applied modelbased prediction for mapping forest attributes across large areas using remotely sensed auxiliary information. Baccini et al. (2008) used moderate resolution imaging spectroradiometer (MODIS) and GLAS for mapping aboveground biomass across tropical Africa. Armston et al. (2009) used Landsat5 TM and Landsat7 ETM+ sensors for prediction foliage projective cover across a large area in Queensland, Australia. Asner et al. (2010) applied modelbased prediction for mapping the aboveground carbon stocks using satellite imaging, airborne LiDAR and field plots over 4.3 million ha of Peruvian Amazon. Helmer et al. (2010) used time series from 24 Landsat TM/ETM+ and Advance Land Imager (ALI) scenes for mapping forest attributes on the island of Eleuthera. These are only examples of a very large number of studies where walltowall remotely sensed data have been applied for mapping and monitoring forest resources. However, a majority of these studies do not apply a formal modelbased inferential framework. For example, in case the uncertainty of estimators is addressed, usually the strict modelbased inference approach [Eq. (7)] is not applied but instead some other, often adhoc, method that does not correctly reflect the uncertainty of the estimator or predictor involved.
Saarela et al. (2015b) evaluated the effects of model form and sample size on the precision of modelbased estimators in the study area Kuortane, Finland, and identified minor to moderate differences in results when different model forms were applied. In a simulation study, Magnussen (2015) demonstrated the usefulness of modelbased inference for forest surveys and argued that this approach has several advantages over traditional designbased sampling. McRoberts et al. (2014a,b) assessed the effects of uncertainty in model predictions of individual tree volume model predictions on largearea volume estimates in the survey framework of hybrid inference.
As previously mentioned, Corona et al. (2014) proposed to use the term hybrid inference for the case where a probability sample of auxiliary data may be selected, on which modelbased inference is applied; the study by Corona et al. mainly dealt with smallarea estimation issues. Ståhl et al. (2011), Gobakken et al. (2012), Nelson et al. (2012) and Magnussen et al. (2014) used hybrid inference for estimating the forest resources in Hedmark county, Norway, based on combinations of laser scanner data, laser profiler data, and field data. In the study by Magnussen et al. two populations were simulated using the data. Healey et al. (2012) applied the technique in California, using GLAS data. In a study of boreal forests in Canada, Margolis et al. (2015) likewise used GLAS data, in combination with airborne laser data, to estimate aboveground biomass.
Geographical mismatches between remotely sensed data and field measurements may considerably affect the precision of estimators in largearea surveys. The effects of such errors in modelbased and modelassisted estimation were evaluated by Saarela et al. (2016).
Discussion
The review revealed that use of models in largescale forest inventories is widespread, although statistically strict applications of modelassisted estimators, modelbased inference, or hybrid inference are rather limited. While the modelassisted estimation framework is attracting large interest, modelbased inference and hybrid inference are not applied as much. A large number of studies apply approaches that could be classified as modelbased inference, although they do not pursue any strict uncertainty analyses. In this context there is room for substantial improvement regarding how mean square errors or variances are estimated.
An advantage of modelassisted estimation, as compared to modelbased and hybrid inference, is that the unbiasedness of estimators of totals and means do not rely on the correctness of the model; the model is only applied for enhancing a designbased estimator (Särndal et al. 1992). Whereas there is a theoretical chance that a modelassisted estimator is worse (in terms of variance) than a strictly designbased estimator if the model is extremely poor, a well specified model might substantially increase the precision of the modelassisted estimator compared to the strictly designbased estimator. This was shown by, e.g., Ene et al. (2012) and Saarela et al. (2015a).
If well specified models are available modelbased inference is definitely a competitive alternative to designbased inference through modelassisted estimation (McRoberts et al. 2014a, b, Magnussen 2015). It has advantages since it does not rely on a probability sample from the target area. Such samples may sometimes not be feasible due to poor infrastructure conditions, restricted access to private land, or the presence of areas that are for some reason dangerous to visit in the field. Further, in case a probability sample has been selected, based upon which models are developed and applied, modelbased inference and modelassisted estimation usually lead to similar total estimates. In case the condition \( {\displaystyle {\sum}_{i\in s}^n\frac{\left({y}_i{\widehat{y}}_i\right)}{\pi_i}=0} \) holds the estimated values will be identical. However, Saarela et al. (2016) showed that the modelbased variance estimators are less prone to problems with geolocation mismatches between field plots and remotely sensed auxiliary data.
Hybrid inference is a straightforward approach in cases where auxiliary data are not available walltowall and such data are expensive to acquire. In such cases a sample of auxiliary data can be selected, upon which the auxiliary variable totals and means can be estimated and used together with model predictions that link the auxiliary variables with the target variable. The approach so far appears to have been applied only in a limited number of forest inventories, although implicitly it has been used for a long time in forest inventories where models (such as volume, biomass and growth models) have been applied based on data from forest plots (Ståhl et al. 2014).
Overall, the use of models relies on auxiliary data that are correlated with or otherwise related with the target variable. Considering the variables normally included in national forest inventories (Tomppo et al. 2010) it is likely that a large number of variables would be very difficult to model in terms of remotely sensed data. This might be the case for forest floor vegetation, soil properties, and several types of forest damage. Modelling approaches linked to such variables would probably not improve the precision of estimators. Thus, a large number of variables, such as site index, forest floor vegetation, soil type, etc., are likely to require probability field samples.
Conclusions
We conclude by noting that all three approaches studied: modelassisted estimation, modelbased inference, and hybrid inference, have advantages and disadvantages when applied in largearea forest surveys. A main advantage of modelassisted estimation is that unbiasedness of estimators does not rely on the suitability of the model, but the model only helps to improve the precision of an estimator known to be (approximately) unbiased. Modelbased and hybrid inference rely on the suitability of the model, but may have several advantages under conditions where access to field plots is difficult or expensive. All three approaches rely on the possibility to develop accurate models, which is possible for several important forest variables (such as biomass), but not for all variables that are included in a normal national forest inventory.
Declarations
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Andersen HE, Barrett T, Winterberger K, Strunk J, Temesgen H (2009) Estimating forest biomass on the western lowlands of the Kenai Peninsula of Alaska using airborne lidar and field plot data in a modelassisted sampling design. In: Proceedings of the IUFRO Division 4 Conference: “Extending Forest Inventory and Monitoring over Space and Time”., pp 19–22Google Scholar
 Andersen HE, Strunk J, Temesgen H (2011) Using airborne light detection and ranging as a sampling tool for estimating forest biomass resources in the Upper Tanana Valley of Interior Alaska. West J Appl Forestry 26:157–164Google Scholar
 Armston JD, Denham RJ, Danaher TJ, Scarth PF, Moffiet TN (2009) Prediction and validation of foliage projective cover from Landsat5 TM and Landsat7 ETM+ imagery. J Appl Remote Sensing 3:33540–33540, http://dx.doi.org/10.1117/1.3216031 View ArticleGoogle Scholar
 Asner GP, Powell GV, Mascaro J, Knapp DE, Clark JK, Jacobson J, Hughes RF (2010) Highresolution forest carbon stocks and emissions in the Amazon. Proc Natl Acad Sci 107:16738–16742, http://dx.doi.org/10.1073/pnas.1004875107 PubMed CentralView ArticlePubMedGoogle Scholar
 Asner GP, Mascaro J, MullerLandau HC, Vieilledent G, Vaudry R, Rasamoelina M, Hall S, van Breugel M (2012) A universal airborne LiDAR approach for tropical forest carbon mapping. Oecologia 168:1147–1160, http://dx.doi.org/10.1007/s004420112165z View ArticlePubMedGoogle Scholar
 Baccini A, Laporte N, Goetz SJ, Sun M, Dong H (2008) A first map of tropical Africa’s aboveground biomass derived from satellite imagery. Environ Res Lett 3:9View ArticleGoogle Scholar
 Baffetta F, Fattorini L, Franceschi S, Corona P (2009) Designbased approach to knearest neighbours technique for coupling field and remotely sensed data in forest surveys. Remote Sensing Environ 113(3):463–475, http://dx.doi.org/10.1016/j.rse.2008.06.014 View ArticleGoogle Scholar
 Baffetta F, Corona P, Fattorini L (2011) Designbased diagnostics for kNN estimators of forest resources. Can J Forest Res 41:59–72View ArticleGoogle Scholar
 Bohlin J, Wallerman J, Fransson JE (2012) Forest variable estimation using photogrammetric matching of digital aerial images in combination with a highresolution DEM. Scand J Forest Res 27:692–699, http://dx.doi.org/10.1080/02827581.2012.686625 View ArticleGoogle Scholar
 Bollandsås OM, Gregoire TG, Næsset E, Øyen BH (2013) Detection of biomass change in a Norwegian mountain forest area using small footprint airborne laser scanner data. Stat Methods Appl 22:113–129, http://dx.doi.org/10.1007/s1026001202205 View ArticleGoogle Scholar
 Boudreau J, Nelson RF, Margolis HA, Beaudoin A, Guindon L, Kimes DS (2008) Regional aboveground forest biomass using airborne and spaceborne LiDAR in Québec. Remote Sensing Environ 112:3876–3890, http://dx.doi.org/10.1016/j.rse.2008.06.003 View ArticleGoogle Scholar
 Breidenbach J, Astrup R (2012) Small area estimation of forest attributes in the Norwegian National Forest Inventory. Eur J Forest Res 131:1255–1267, http://dx.doi.org/10.1007/s1034201205967 View ArticleGoogle Scholar
 Breidenbach J, McRoberts RE, Astrup R (2015) Empirical coverage of modelbased variance estimators for remote sensing assisted estimation of standlevel timber volume. Remote Sensing Environ (in press). http://dx.doi.org/10.1016/j.rse.2015.07.026
 Breidt FJ, Opsomer JD (2000) Local polynomial regression estimators in survey sampling. Ann Stat 2000:1026–1053Google Scholar
 Breidt FJ, Claeskens G, Opsomer JD (2005) Modelassisted estimation for complex surveys using penalised splines. Biometrika 92:831–846, http://dx.doi.org/10.1093/biomet/92.4.831 View ArticleGoogle Scholar
 Cassel CM, Särndal CE, Wretman JH (1977) Foundations of inference in survey sampling. Wiley, New YorkGoogle Scholar
 Chambers R, Clark R (2012) An introduction to modelbased survey sampling with applications. Oxford University Press. http://dx.doi.org/10.1093/acprof:oso/9780198566625.001.0001
 Chirici G, McRoberts RE, Fattorini L, Mura M, Marchetti M (2016) Comparing echobased and canopy height modelbased metrics for enhancing estimation of forest aboveground biomass in a modelassisted framework. Remote Sensing Environ 174:1–9, http://dx.doi.org/10.1016/j.rse.2015.11.010 View ArticleGoogle Scholar
 Corona P, Fattorini L, Franceschi S, Scrinzi G, Torresan C (2014) Estimation of standing wood volume in forest compartments by exploiting airborne laser scanning information: modelbased, designbased, and hybrid perspectives. Can J Forest Res 44:1303–1311, http://dx.doi.org/10.1139/cjfr20140203 View ArticleGoogle Scholar
 Corona P, Fattorini L, Pagliarella MC (2015) Sampling strategies for estimating forest cover from remote sensingbased twostage inventories. Forest Ecosystems 2(1):1–12, http://dx.doi.org/10.1186/s4066301500427 View ArticleGoogle Scholar
 Ene LT, Næsset E, Gobakken T, Gregoire TG, Ståhl G, Nelson R (2012) Assessing the accuracy of regional LiDARbased biomass estimation using a simulation approach. Remote Sensing Environ 123:579–592, http://dx.doi.org/10.1016/j.rse.2012.04.017 View ArticleGoogle Scholar
 Fattorini L, Marcheselli M, Pisani C (2006) A threephase sampling strategy for largescale multiresource forest inventories. J Agric Biol Environ Stat 11(3):296–316, http://dx.doi.org/10.1198/108571106X130548 View ArticleGoogle Scholar
 Fattorini L, Franceschi S, Pisani C (2009) A twophase sampling strategy for largescale forest carbon budgets. J Stat Plann Inference 139(3):1045–1055, http://dx.doi.org/10.1016/j.jspi.2008.06.014 View ArticleGoogle Scholar
 Gobakken T, Næsset E, Nelson R, Bollandsås OM, Gregoire TG, Ståhl G, Holm S, Ørka HO, Astrup R (2012) Estimating biomass in Hedmark County, Norway using national forest inventory field plots and airborne laser scanning. Remote Sensing Environ 123:443–456, http://dx.doi.org/10.1016/j.rse.2012.01.025 View ArticleGoogle Scholar
 Grafström A, Saarela S, Ene LT (2014) Efficient sampling strategies for forest inventories by spreading the sample in auxiliary space. Can J Forest Res 44:1156–1164, http://dx.doi.org/10.1139/cjfr20140202 View ArticleGoogle Scholar
 Gregoire TG (1998) Designbased and modelbased inference in survey sampling: appreciating the difference. Can J Forest Res 28:1429–1447, http://dx.doi.org/10.1139/x98166 View ArticleGoogle Scholar
 Gregoire TG, Valentine HT (2008) Sampling strategies for natural resources and the environment. CRC Press, Taylor & Francis Group, Boca RatonGoogle Scholar
 Gregoire TG, Ståhl G, Næsset E, Gobakken T, Nelson R, Holm S (2011) Modelassisted estimation of biomass in a LiDAR sample survey in Hedmark County, Norway This article is one of a selection of papers from Extending Forest Inventory and Monitoring over Space and Time. Can J Forest Res 41:83–95, http://dx.doi.org/10.1139/X10195 View ArticleGoogle Scholar
 Hansen MH, Madow WG, Tepping BJ (1978) On inference and estimation from sample surveys. In: Proceedings of the Survey Research Methods Section., pp 82–107Google Scholar
 Hansen MH, Madow WG, Tepping BJ (1983) An evaluation of modeldependent and probabilitysampling inferences in sample surveys. J Am Stat Assoc 78:776–793, http://dx.doi.org/10.1080/01621459.1983.10477018 View ArticleGoogle Scholar
 Healey SP, Patterson PL, Saatchi S, Lefsky MA, Lister AJ, Freeman EA (2012) A sample design for globally consistent biomass estimation using lidar data from the Geoscience Laser Altimeter System (GLAS). Carbon Balance Manage 7:1–9, http://dx.doi.org/10.1186/17500680710 View ArticleGoogle Scholar
 Helmer EH, Ruzycki TS, Wunderle JM, Vogesser S, Ruefenacht B, Kwit C, Ewert DN (2010) Mapping tropical dry forest height, foliage height profiles and disturbance type and age with a time series of cloudcleared Landsat and ALI image mosaics to characterize avian habitat. Remote Sensing Environ 114:2457–2473, http://dx.doi.org/10.1016/j.rse.2010.05.021 View ArticleGoogle Scholar
 Köhl M, Brassel P (2001) Zur Auswirkung der Hangneigungskorrektur auf Schätzwerte im Schweizerischen Landesforstinventar (LFI) [Investigation of the effect of the slope correction method as applied in the Swiss National Forest Inventory of estimates.]. Schweizerische Zeitschrift fur Forstwesen 152(6):215–225, http://dx.doi.org/10.3188/szf.2001.0215 View ArticleGoogle Scholar
 Magnussen S (2015) Arguments for a modeldependent inference? Forestry 88(3):317–325, http://dx.doi.org/10.1093/forestry/cpv002 View ArticleGoogle Scholar
 Magnussen S, Tomppo E (2015) Modelcalibrated knearest neighbor estimators. Scandinavian J Forest Res 1–11. http://dx.doi.org/10.1080/02827581.2015.1073348
 Magnussen S, Næsset E, Gobakken T (2014) An estimator of variance for twostage ratio regression estimators. Forest Sci 60(4):663–676, http://dx.doi.org/10.5849/forsci.12163 View ArticleGoogle Scholar
 Magnussen S, Næsset E, Gobakken T (2015) LiDARsupported estimation of change in forest biomass with timeinvariant regression models. Can J Forest Res 45(999):1514–1523, http://dx.doi.org/10.1139/cjfr20150084 View ArticleGoogle Scholar
 Mandallaz D (2013) Designbased properties of some smallarea estimators in forest inventory with twophase sampling. Can J Forest Res 43:441–449, http://dx.doi.org/10.1139/cjfr20120381 View ArticleGoogle Scholar
 Margolis HA, Nelson RF, Montesano PM, Beaudoin A, Sun G, Andersen HE, Wulder M (2015) Combining satellite lidar, airborne lidar and ground plots to estimate the amount and distribution of aboveground biomass in the Boreal forest of North America. Can J Forest Res 45(7):838–855, http://dx.doi.org/10.1139/cjfr20150006 View ArticleGoogle Scholar
 Massey A, Mandallaz D, Lanz A (2014) Integrating remote sensing and past inventory data under the new annual design of the Swiss National Forest Inventory using threephase designbased regression estimation. Can J Forest Res 44:1177–1186, http://dx.doi.org/10.1139/cjfr20140152 View ArticleGoogle Scholar
 McRoberts RE (2006) A modelbased approach to estimating forest area. Remote Sensing Environ 103:56–66, http://dx.doi.org/10.1016/j.rse.2006.03.005 View ArticleGoogle Scholar
 McRoberts RE (2010) Probabilityand modelbased approaches to inference for proportion forest using satellite imagery as ancillary data. Remote Sensing Environ 114:1017–1025, http://dx.doi.org/10.1016/j.rse.2009.12.013 View ArticleGoogle Scholar
 McRoberts RE, Tomppo EO, Finley AO, Heikkinen J (2007) Estimating areal means and variances of forest attributes using the kNearest Neighbors technique and satellite imagery. Remote Sensing Environ 111:466–480View ArticleGoogle Scholar
 McRoberts RE, Bollandsås OM, Næsset E (2014) Modeling and estimating change. In: Maltamo M, Næsset E, Vauhkonen J. (eds) Forestry Applications of Airborne Laser Scanning. Concepts and Case Studies. Springer, pp. 293–314. http://dx.doi.org/10.1007/9789401786638_15
 McRoberts RE, Næsset E, Gobakken T, Bollandsås OM (2015) Indirect and direct estimation of forest biomass change using forest inventory and airborne laser scanning data. Remote Sensing Environ 164:36–42, http://dx.doi.org/10.1016/j.rse.2015.02.018 View ArticleGoogle Scholar
 Melville GJ, Welsh AH, Stone C (2015) Improving the efficiency and precision of tree counts in pine plantations using airborne LiDAR data and flexibleradius plots: modelbased and designbased approaches. J Agric Biol Environ Stat 20(2):229–257, http://dx.doi.org/10.1007/s1325301502056 View ArticleGoogle Scholar
 Næsset E (1997) Estimating timber volume of forest stands using airborne laser scanner data. Remote Sensing Environ 61:246–253, http://dx.doi.org/10.1016/S00344257(97)000412 View ArticleGoogle Scholar
 Næsset E (2002a) Determination of mean tree height of forest stands by means of digital photogrammetry. Scand J Forest Res 17: 446–459. http://dx.doi.org/10.1080/028275802320435469
 Næsset E (2002b) Predicting forest stand characteristics with airborne scanning laser using a practical twostage procedure and field data. Remote Sensing Environ 80: 88–99. http://dx.doi.org/10.1016/S00344257(01)002905
 Næsset E (2004) Accuracy of forest inventory using airborne laser scanning: evaluating the first Nordic fullscale operational project. Scand J Forest Res 19:554–557, http://dx.doi.org/10.1080/02827580410019544 View ArticleGoogle Scholar
 Næsset E, Gobakken T, Solberg S, Gregoire TG, Nelson R, Ståhl G, Weydahl D (2011) Modelassisted regional forest biomass estimation using LiDAR and InSAR as auxiliary data: A case study from a boreal forest area. Remote Sensing Environ 115:3599–3614, http://dx.doi.org/10.1016/j.rse.2011.08.021 View ArticleGoogle Scholar
 Næsset E, Bollandsås OM, Gobakken T, Gregoire TG, Ståhl G (2013a) Modelassisted estimation of change in forest biomass over an 11year period in a sample survey supported by airborne LiDAR: A case study with poststratification to provide “activity data”. Remote Sensing Environ 128: 299–314. http://dx.doi.org/10.1016/j.rse.2012.10.008
 Næsset E, Gobakken T, Bollandsås OM, Gregoire TG, Nelson R, Ståhl G (2013b) Comparison of precision of biomass estimates in regional field sample surveys and airborne LiDARassisted surveys in Hedmark County, Norway. Remote Sensing Environ 130: 108–120. http://dx.doi.org/10.1016/j.rse.2012.11.010
 Næsset E, Bollandsås OM, Gobakken T, Solberg S, McRoberts RE (2015) The effects of field plot size on modelassisted estimation of aboveground biomass change using multitemporal interferometric SAR and airborne laser scanning data. Remote Sensing Environ 168:252–264, http://dx.doi.org/10.1016/j.rse.2015.07.002 View ArticleGoogle Scholar
 Nelson R, Krabill W, Maclean G (1984) Determining forest canopy characteristics using airborne laser data. Remote Sensing Environ 15:201–212, http://dx.doi.org/10.1016/00344257(84)900312 View ArticleGoogle Scholar
 Nelson R, Krabill W, Tonelli J (1988) Estimating forest biomass and volume using airborne laser data. Remote Sensing Environ 24:247–267, http://dx.doi.org/10.1016/00344257(88)900284 View ArticleGoogle Scholar
 Nelson R, Boudreau J, Gregoire TG, Margolis H, Næsset E, Gobakken T, Ståhl G (2009) Estimating Quebec provincial forest resources using ICESat/GLAS. Can J Forest Res 39:862–881, http://dx.doi.org/10.1139/X09002 View ArticleGoogle Scholar
 Nelson R, Gobakken T, Næsset E, Gregoire TG, Ståhl G, Holm S, Flewelling J (2012) Lidar sampling  using an airborne profiler to estimate forest biomass in Hedmark County, Norway. Remote Sensing Environ 123:563–578, http://dx.doi.org/10.1016/j.rse.2011.10.036 View ArticleGoogle Scholar
 Neyman J (1934) On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J R Stat Soc 97:558–606, http://dx.doi.org/10.2307/2342192 View ArticleGoogle Scholar
 Opsomer JD, Breidt FJ, Moisen GG, Kauermann G (2007) Modelassisted estimation of forest resources with generalized additive models. J Am Stat Assoc 102:400–409, http://dx.doi.org/10.1198/016214506000001491 View ArticleGoogle Scholar
 Reese H, Nilsson M, Sandström P, Olsson H (2002) Applications using estimates of forest parameters derived from satellite and forest inventory data. Comput Electron Agric 37:37–55, http://dx.doi.org/10.1016/S01681699(02)001187 View ArticleGoogle Scholar
 Saarela S, Grafström A, Ståhl G, Kangas A, Holopainen M, Tuominen S, Nordkvist K, Hyyppä, J (2015a) Modelassisted estimation of growing stock volume using different combinations of LiDAR and Landsat data as auxiliary information. Remote Sensing Environ 158: 431–440. http://dx.doi.org/10.1016/j.rse.2014.11.020
 Saarela S, Schnell S, Grafström A, Tuominen S, Nordkvist K, Hyyppä J, Kangas A, Ståhl G (2015b) Effects of sample size and model form on the accuracy of modelbased estimators of growing stock volume in Kuortane, Finland. Can J Forest Re 45:1524–1534. http://dx.doi.org/10.1139/cjfr20150077
 Saarela S, Schnell S, Tuominen S, Balazs A, Hyyppä J, Grafström A, Ståhl G (2016) Effects of positional errors in modelassisted and modelbased estimation of growing stock volume. Remote Sensing Environ 172:101–108, http://dx.doi.org/10.1016/j.rse.2015.11.002 View ArticleGoogle Scholar
 Sannier C, McRoberts RE, Fichet LV, Makaga EMK (2014) Using the regression estimator with Landsat data to estimate proportion forest cover and net proportion deforestation in Gabon. Remote Sensing Environ 151:138–148, http://dx.doi.org/10.1016/j.rse.2013.09.015 View ArticleGoogle Scholar
 Särndal CE (1978) Designbased and modelbased inference in survey sampling [with discussion and reply]. Scand J Stat 5(1):27–52Google Scholar
 Särndal CE, Swensson B, Wretman J (1992) Model Assisted Survey Sampling. Springer. http://dx.doi.org/10.1007/9781461243786
 Skowronski NS, Clark KL, Gallagher M, Birdsey RA, Hom JL (2014) Airborne laser scannerassisted estimation of aboveground biomass change in a temperate oakpine forest. Remote Sensing Environ 151:166–174, http://dx.doi.org/10.1016/j.rse.2013.12.015 View ArticleGoogle Scholar
 Solberg S, Astrup R, Bollandsås OM, Næsset E, Weydahl DJ (2010) Deriving forest monitoring variables from Xband InSAR SRTM height. Can J Remote Sensing 36:68–79, http://dx.doi.org/10.5589/m10025 View ArticleGoogle Scholar
 Ståhl G, Holm S, Gregoire TG, Gobakken T, Næsset E, Nelson R (2011) Modelbased inference for biomass estimation in a LiDAR sample survey in Hedmark County, Norway. Can J Forest Res 41:96–107, http://dx.doi.org/10.1139/X10161 View ArticleGoogle Scholar
 Ståhl G, Heikkinen J, Petersson H, Repola J, Holm S (2014) Samplebased estimation of greenhouse gas emissions from forests – A new approach to account for both sampling and model errors. Forest Sci 60:3–13, http://dx.doi.org/10.5849/forsci.13005 View ArticleGoogle Scholar
 Stephens PR, Kimberley MO, Beets PN, Paul TS, Searles N, Bell A, Brack C, Broadley J (2012) Airborne scanning LiDAR in a double sampling forest carbon inventory. Remote Sensing Environ 117:348–357, http://dx.doi.org/10.1016/j.rse.2011.10.009 View ArticleGoogle Scholar
 Strunk JL, Reutebuch SE, Andersen HE, Gould PJ, McGaughey RJ (2012a) Modelassisted forest yield estimation with light detection and ranging. West J Appl Forestry 27: 53–59. http://dx.doi.org/10.5849/wjaf.10043
 Strunk J, Temesgen H, Andersen HE, Flewelling JP, Madsen L (2012b) Effects of lidar pulse density and sample size on a modelassisted approach to estimate forest inventory variables. Can J Remote Sensing 38: 644–654. http://dx.doi.org/10.5589/m12052
 Tomppo E. Katila M (1991) Satellite imagebased national forest inventory of Finland for publication in the IGARSS’91 digest. In: Geoscience and Remote Sensing Symposium, 1991. IGARSS’91. Remote Sensing: Global Monitoring for Earth Management., International (Vol. 3, pp. 1141–1144). http://dx.doi.org/10.1109/igarss.1991.579272
 Tomppo E, Olsson H, Ståhl G, Nilsson M, Hagner O, Katila M (2008) Combining national forest inventory field plots and remote sensing data for forest databases. Remote Sensing Environ 112(5):1982–1999View ArticleGoogle Scholar
 Tomppo E, Gschwantner T, Lawrence M, McRoberts RE, Gabler K, Schadauer K, Vidal C, Lanz A, Ståhl G, Cienciala E (2010) National forest inventories. Pathways for Common Reporting. Springer, 541–553. http://dx.doi.org/10.1007/9789048132331