 Discussion
 Open Access
 Published:
Use of models in largearea forest surveys: comparing modelassisted, modelbased and hybrid estimation
Forest Ecosystems volume 3, Article number: 5 (2016)
Abstract
This paper focuses on the use of models for increasing the precision of estimators in largearea forest surveys. It is motivated by the increasing availability of remotely sensed data, which facilitates the development of models predicting the variables of interest in forest surveys. We present, review and compare three different estimation frameworks where models play a core role: modelassisted, modelbased, and hybrid estimation. The first two are well known, whereas the third has only recently been introduced in forest surveys. Hybrid inference mixes designbased and modelbased inference, since it relies on a probability sample of auxiliary data and a model predicting the target variable from the auxiliary data..We review studies on largearea forest surveys based on modelassisted, modelbased, and hybrid estimation, and discuss advantages and disadvantages of the approaches. We conclude that no general recommendations can be made about whether modelassisted, modelbased, or hybrid estimation should be preferred. The choice depends on the objective of the survey and the possibilities to acquire appropriate field and remotely sensed data. We also conclude that modelling approaches can only be successfully applied for estimating target variables such as growing stock volume or biomass, which are adequately related to commonly available remotely sensed data, and thus purely field based surveys remain important for several important forest parameters.
Introduction
Use of models in largearea surveys of forests is attracting increased interest. The reason is the improved availability of auxiliary data from various remote sensing platforms. Aerial photographs (e.g., Næsset 2002a, Bohlin et al. 2012) and optical satellite data (e.g., Reese et al. 2002) have been available and used operationally for many decades, while data from profiling (e.g., Nelson et al. 1984, Nelson et al. 1988) and scanning lasers (e.g., Næsset 1997) and radars (Solberg et al. 2010) have become available for practical applications more recently. Some of the new types of remotely sensed data, such as data from laser scanners, have already become widely applied in forest inventories (e.g., Næsset 2002b). A common application involves the development of models that are applied walltowall over an area of interest (e.g., Næsset 2004), often for providing data for forest management. However, this type of data is increasingly applied also in connection with largearea forest surveys, such as nationallevel forest inventories (Tomppo et al. 2010, Asner et al. 2012).
Applications of models in largearea forest surveys often use the modelassisted estimation framework (Särndal et al. 1992) where a model is used to support the estimation following probability sampling within the context of designbased inference (Gregoire 1998). Importantly, an inadequately specified model will not make the estimators biased in this case, but only affect the variance of the estimators. Examples of largearea forest inventory applications include Andersen et al. (2011) who applied the technique in Alaska, Gregoire et al. (2011) and Gobakken et al. (2012), who applied it in Hedmark County, Norway, and Saarela et al. (2015a) who used it in Kuortane, Finland.
Some applications of models in largearea forest surveys involve modelbased inference (Gregoire 1998), which to a larger extent than modelassisted estimation relies on model assumptions. In this case an inadequately specified model might make the estimators both biased and imprecise. On the other hand, with accurate models this mode of inference can be very efficient (e.g., Magnussen 2015). Examples of applications in forest inventory include McRoberts (2006, 2010), who used modelbased inference for estimating forest area based on Landsat data in northern Minnesota, U.S.A., Ståhl et al. (2011) who used it for estimating biomass in Hedmark, Norway, using laser data, and Healey et al. (2012) who applied the technique in California, U.S.A., using data from the spaceborne Geoscience Laser Altimeter System (GLAS).
Nonparametric modelling, applying methods such as the kNearest Neighbours (kNN) technique (Tomppo and Katila 1991, Tomppo et al. 2008), has a long tradition in forest inventories. These techniques typically have been applied for providing smallarea estimates through combining field sample plots and various sources of remotely sensed data. However, the kNN technique has also been used in connection with modelassisted estimation (e.g., Baffetta et al. 2009, 2011, Magnussen and Tomppo 2015) and modelbased inference (e.g., McRoberts et al. 2007).
The objective of this paper was to present, review and discuss how models are applied in the case of modelassisted and modelbased estimation in largearea forest surveys, and to discuss advantages and disadvantages of the two estimation frameworks in this context. We also present, review and discuss a newly introduced estimation framework where probability sampling is applied for the selection of auxiliary data, upon which modelbased inference is applied in a second phase. This framework in denoted hybrid inference, after Corona et al. (2014).
We restrict the study to largearea estimation. This is the case of national forest inventories and greenhouse gas inventories under the United Nations Framework Convention on Climate Change (e.g., Tomppo et al. 2010). Importantly, in this case there is no need to make assumptions about residual error terms linked to individual population elements, which is a core issue in modelbased smallarea estimation (e.g., Breidenbach and Astrup 2012, Breidenbach et al. 2015). The reason is that the residual error terms will have almost no influence on the results, as will be demonstrated below. However, we do not specify how large a “large area” must be, but use the term as a general concept.
Below, we present the basics of modelassisted, modelbased, and hybrid inference (chapter 2). Subsequently we present a brief review of the application of these methods in forest surveys (chapter 3), and, finally, we discuss advantages and disadvantages of the different approaches and make conclusions (chapters 4 and 5).
Basics of modelassisted, modelbased and hybrid estimation
In this chapter we summarize some basic concepts related to the use of models in largearea forest surveys. We restrict the scope to cases where models are applied for improving estimators (or predictors) once sample or walltowall data have been collected. However, models may also be used in the design phase for improving the sample selection (e.g., Fattorini et al. 2009, Grafström et al. 2014), but such cases are not covered in this article.
Designbased inference
This paper requires a basic understanding of the concepts designbased and modelbased inference (e.g., Cassel et al. 1977, Särndal 1978, Gregoire 1998, McRoberts 2010).
Designbased inference typically assumes a finite population of elements to which one or more fixed target quantities are linked. The objective normally is to estimate some fixed population parameter, such as the total or the mean of these quantities (e.g., Gregoire and Valentine 2008). In order to estimate the fixed but unknown parameters a probability sample is selected from the population according to some appropriate sampling design, which assigns positive inclusion probabilities to each element. Mathematical formulas (estimators) are used for estimating the parameters based on the sample data. The estimates are random variables due to the random selection of samples, i.e., the estimators produce different values depending on which population elements are included in the sample.
The HorvitzThompson estimator can be applied to any probability sampling design with inclusion probabilities known at least for the sampled units (e.g., Särndal et al. 1992). Using this estimator, a population total, τ, is estimated as
Here, y _{ i } is the variable of interest for the i:th sampled element, π _{ i } is the inclusion probability, and s is the sample.
The precision of an estimator is usually expressed through its variance, which is a fixed quantity given the population, the design, and the estimator. The variance usually can be estimated through a variance estimator, and confidence intervals can be computed as a means to provide decision makers with the range of values wherein the true population parameter is located with a defined probability.
In case of the HorvitzThompson estimator, a general formula for the variance is
In addition to the previously introduced notation, π _{ ij } is the joint probability of inclusion for unit i and j. The step from the variance to a variance estimator and a confidence interval normally is straightforward (e.g., Gregoire and Valentine 2008).
Some key features of designbased inference are:

The values that are linked to the population elements are fixed

The population parameters about which we wish to infer information are also fixed

Our estimators of the parameters are random because a probability sample is selected according to some sampling design, such as simple random sampling

The probability of obtaining different samples can be deduced from the design and used for inference
The foundations of designbased inference were laid out by Neyman (1934) and it is the standard mode of inference in most statistical surveys, including samplebased national forest inventories (Tomppo et al. 2010) that are carried out in a large number of countries.
Designbased inference through modelassisted estimation
Models can be used to improve estimators under the designbased framework. An important category of such estimators are known as modelassisted estimators (Särndal et al. 1992). The general form of such estimators, for estimating a population total, is
where the first part of the estimator is a sum of model estimates of each element in the population; the second term is a HorvitzThompson estimator of the total of the deviations between observed values and values estimated by the model; the subscript ‘ma’ is used to point out that the estimator is modelassisted. Thus, the modelassisted estimator can be seen as composed of a first crude estimator which is refined through a correction term that makes it asymptotically unbiased when the model is external (in which case Eq. 3 is often referred to as a difference estimator), and approximately unbiased when the model is internal (in which case Eq. 3 is often referred to as a generalised regression estimator). In case the model is external the variance is
This is almost the same expression as the variance in Eq. (2), but the y _{ i } terms have been replaced by e _{ i } = y _{ i } − ŷ _{ i }. If an accurate model is used the latter terms should be much smaller than the former, and thus the variance of the modelassisted estimator should be much smaller than the variance of the ordinary HorvitzThompson estimator, although this is not immediately clear when comparing Eq. 2 and Eq. 4.
Modelbased inference
In contrast to designbased inference (including modelassisted estimators), a basic assumption underlying modelbased inference is that the values that are linked to the elements in the population are realizations of random variables. As a consequence, target survey quantities such as population totals and means are also random variables. Thus, due to the different points of view underlying designbased and modelbased inference some caution must be exercised when comparing results from the two inferential frameworks. For example, with modelbased inference the random population total (or mean) may be predicted or (as in this study) the expected value of the population total may be estimated. For large population the difference between these two quantities, in relative terms, typically is minor although for small populations the relative difference may be substantial. However, just like designbased inference, modelbased inference in many cases is a useful and straightforward approach for quantifying target features of a population (e.g., Chambers and Clark 2012). In forest inventories, examples of such cases are surveys of remote areas with poor road infrastructure and smallarea estimation for forest management. In both cases the field sample sizes typically are small or acquired through nonprobability sampling whereas remotely sensed data are available walltowall.
A basic assumption of modelbased inference is that the random values of the population elements follow some specific model, e.g., a model based on auxiliary data derived from remote sensing. Thus, in the standard case, auxiliary data are available for all population elements. A simple and fairly general example is the linear model, i.e., (in matrix form)
where Y is an N × 1 matrix of the target variable, X an N × p matrix of auxiliary data, β is a p × 1 matrix of model parameters, and ϵ an N × 1 matrix of random variables that follow some joint probability distribution; N is the population size; in a forest survey it might be the number of grid cells which tessellate the study area.
Our objective typically is to predict a random population quantity, e.g., the mean or the total, following the selection of a sample for estimating the model parameters. Regardless of how the sample is selected, the observations are realizations of random variables due to the model assumptions. Once the model parameters are estimated, we can use the estimated model, \( \widehat{\boldsymbol{Y}} = \boldsymbol{X}\widehat{\boldsymbol{\beta}} \), for predicting the population quantities of interest based on the auxiliary data; in standard cases these are assumed available for all population elements. Introducing 1 as an N × 1 vector of “1”entries, the random population total τ* = 1′Y = 1′Xβ + 1′ε may be predicted as
Note the distinction in nomenclature between estimating a fixed but unknown value (a population parameter) and predicting a random variable (e.g., Särndal 1978, Gregoire 1998). Note also that some authors (Chambers and Clark 2012) present the modelbased predictor as a sum of two terms: the sum of the values of the sampled elements and the sum of the predictions for the nonsampled elements. The difference between such a predictor and Eq. (6) would, however, be very small in case a small sample is selected from a large population.
Turning to the mean square error of the predictor in Eq. (6) we need to acknowledge that uncertainty is introduced both by the estimation of the model parameters and by the random residual terms linked to each population element. Since the residuals may often be spatially autocorrelated estimating the mean square error of the Eq. (6) predictor may be very complicated.
However, an important feature of largearea surveys is that the relative difference between τ ^{*} and E (τ ^{*}) typically is very small (e.g., Chambers and Clark 2012, p. 16). The relative difference is 1′ε/(1′Xβ + 1′ε), which intuitively can be seen to tend to zero as N tends to infinity, since in the cases we focus on the X _{ i } β terms are almost always positive and typically much larger (in absolute value) than the residual terms, which may be either negative or positive. Thus, instead of predicting τ ^{*}, in largearea estimation we can estimate E (τ ^{*}), which simplifies the modelbased inference. The estimator will be identical to Eq. (6), i.e., \( \widehat{E\left({\tau}^{*}\right)} = {\mathbf{1}}^{\mathbf{\prime}}\widehat{\mathbf{Y}} \), but it is now an estimator rather than a predictor. The variance (due to the model) of this estimator is simpler to derive, since it does not involve any residual terms; thus uncertainty in this case is introduced only through the model parameter estimation.
The variance of the estimator of E (τ ^{*}) is
The matrix \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left(\widehat{\boldsymbol{\beta}}\right) \) is the variancecovariance matrix of the model parameter estimates. A variance estimator is obtained by inserting the estimated covariance matrix in Eq. (7).
Thus, some key features of modelbased inference are:

The values linked to population elements are random variables

Since the individual values are random variables so is the population total or mean that we wish to predict

A model for the relationship between the target variable and one or more auxiliary variable(s) can adequately conform to the trend in Y.

Auxiliary data are commonly available for all population elements

After having selected a sample – that need not be random – for estimating the model parameters, we apply the fitted model for predicting the target population quantity or estimating the expected value of this quantity.
Hybrid inference: a special case of modelbased inference
Auxiliary data may not be available prior to a forest survey and they may be very expensive to collect for all units in a population, as required for standard application of modelbased inference. In such cases a probability sample of auxiliary data can be acquired, based on which the population total or mean of the auxiliary variable is estimated following designbased inference. A model can still be specified and applied regarding the relationship between the study variable and the auxiliary variables, and thus modelbased inference can be applied once the auxiliary variable totals (or means) have been estimated through designbased inference.
Thus, designbased principles are applied in a first phase and modelbased principles in a second phase. This approach was termed hybrid inference by Corona et al. (2014) and in the present paper we follow that terminology. In a previous study by Mandallaz (2013) it was called pseudosynthetic estimation. In a study by Ståhl et al. (2011) it was simply called modelbased inference, although later denoted modeldependent estimation by Gobakken et al. (2012). However, the term modeldependent estimation appears to have been first proposed by Hansen et al. (1978, 1983) to include all sampling strategies that depend on the correctness of a model; according to Hansen et al. (1978) “a modeldependent design consists of a sampling plan and estimators for which either the plan or the estimators, or both, are chosen because they have desirable properties under an assumed model, and for which the validity of inferences about the population depends on the degree to which the population conforms to the assumed model.” Thus, standard modelbased inference as well as hybrid inference, and other approaches, belong to Hansen’s modeldependent category.
In the case of hybrid inference, expected values and variances are derived by considering both the design through which auxiliary data were collected and the model used for predicting values of population elements based on the auxiliary data. Thus, assuming we use a linear model, a general estimator of E (τ ^{*}) is given as
where s is the sample of auxiliary data, π _{ i } is the probability of including population element i into the auxiliary data sample, π is an nlength column vector of (1/π _{ i }) – values, and X is an n × p matrix of sampled auxiliary data. The model parameters are estimated from a sample that is assumed to be independent from the sample of auxiliary data.
In deriving the variance of the estimator in Eq. (8), note that the part π′X of the estimator is a 1 × p matrix of designunbiased estimators of population totals of auxiliary data, which we denote \( {\widehat{\tau}}_{\boldsymbol{x}} \). This matrix is multiplied by the matrix of estimated model parameters, i.e., the result is a sum of estimated population totals of auxiliary variables times the corresponding model parameter estimate, such as \( {\widehat{\tau}}_{Xj} \cdot {\widehat{\beta}}_j \). In each term the two components are independent, but the estimators of the auxiliary variable totals as well as the estimators of the parameters are typically correlated. Thus, the variance (due to the sample and the model) is
where \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left({\widehat{\tau}}_{\boldsymbol{x}}\right) \) is the covariance matrix of the estimators of the auxiliary variable totals and \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left(\widehat{\boldsymbol{\beta}}\right) \) is the covariance matrix of the model parameter estimators. The Troperator is the trace, i.e., the sum of the diagonal entries in the matrix. The diagonal entries in \( \boldsymbol{c}\boldsymbol{o}\boldsymbol{v}\left({\widehat{\tau}}_{\boldsymbol{x}}\right) \) are of the kind presented in Eq. (2). The offdiagonal entries are computed in a similar fashion (Särndal et al. 1992). The covariance matrix of the model parameter estimators normally, under ordinary least squares regression assumptions, is derived as σ ^{2}(X′X)^{− 1} where σ ^{2} is the residual variance, given the regression model. In case of heteroskedastic residual variance, alternative estimators can be applied (e.g., Saarela et al. 2015b). We do not offer a proof of Eq. (9), but readers familiar with the variance of a product of two independent random variables (i.e., var(WZ) = E(W)^{2} var(Z) + E(Z)^{2} var(W) + var(W)var(Z)) can identify the similarity with Eq. (9).
Although it seems likely that hybrid type estimators have been applied outside forest inventories, we have not yet found any description of them in nonforest publications.
In Fig. 1 an overview of the “positions” of standard designbased estimation (without using models), modelassisted estimation, hybrid estimation, and modelbased estimation is shown with regard to how much these estimation techniques rely on (i) the correctness of the model and (ii) the use of probability sampling.
A brief review of the use of models in largearea forest surveys
From the methods section it is clear that models can be used in several ways for improving the estimation of target quantities in largearea forests surveys. Our review is separated into the following cases:

Use of models in the context of designbased inference through modelassisted estimation

Use of models in the context of modelbased inference through modelbased estimation

Use of models in the context of hybrid inference
Modelassisted estimation in largearea forest surveys
Formal modelassisted estimators appear to be fairly recently introduced to largearea forest surveys, although standard regression estimators (i.e., a simple kind of modelassisted estimators) have been applied in forest surveys for a long time. An important example of the latter kind is the Swiss national forest inventory (Köhl and Brassel 2001) where air photo interpretation has been combined with field surveys for a long time and the Italian national forest inventory, where a threephase sampling approach is applied (Fattorini et al. 2006).
An early modelassisted study was conducted by Breidt et al. (2005), who used spline models in estimating population totals in a simulation study linked to surveys of forest health. Modelassisted estimation was found to perform well in the context of a twophase survey with multiple auxiliary variables.
Opsomer et al. (2007) used modelassisted estimation in a twophase systematic sampling design, applying generalized additive models linking ground measurements with auxiliary information from remote sensing. The study was an extension of the study by Breidt and Opsomer (2000), where univariate models and a singlephase sampling strategy were applied.
In Boudreau et al. (2008), modelassisted estimation was used for estimating biomass in Quebec, Canada, based on data from a laser profiler, GLAS satellite data, and land cover maps based on data from Landsat7 ETM+. The study demonstrated that GLAS data could improve largescale monitoring of aboveground biomass at large spatial scales; however, the presented estimators were not denoted “modelassisted”. Nelson et al. (2009) built upon the study by Boudreau et al. (2008) and introduced some new, partly modelbased, estimation techniques. Andersen et al. (2009) presented a study based on modelassisted estimation where the biomass of western Kenai, Alaska, was estimated based on samples of field and laser scanner data.
In Gregoire et al. (2011) modelassisted estimation was used for estimating aboveground biomass in Hedmark County, Norway, using sample data from laser profilers and scanners. The study triggered the start of a series of studies where the modelassisted theory, developed by Särndal et al. (1992), was applied for largescale forest surveys based on samples of laser scanner data. Næsset et al. (2011) applied and compared two sources of auxiliary information, laser scanner data and interferometric synthetic aperture radar data for modelassisted estimation of biomass over a large boreal forest area in the AurskogHøland municipality in Norway and quantified to what extent the two types of auxiliary data improved the estimated precision. Gobakken et al. (2012) compared the performance of modelassisted estimation with modelbased prediction of aboveground biomass in Hedmark County, Norway using data from airborne laser scanning as auxiliary data. The two approaches were found to yield similar results. Nelson et al. (2012) conducted a similar study over the same area using data from a profiling rather than scanning airborne laser, while Næsset et al. (2013b) evaluated the precision of the twostage modelassisted estimation conducted by Gobakken et al. (2012). The authors noted the sensitivity of variance estimators to unequal sample strip length and systematically selected strips. The latter issue was further pursued by Ene et al. (2012), who showed that the variance was often severely overestimated when estimators assuming simple random sampling were applied in this context. Similar results were reported by Magnussen et al. (2014).
Strunk et al. (2012a, 2012b) investigated different aspects of modelassisted estimation. For example, the authors found that the laser pulse density had almost no effect on the precision of modelassisted estimators of core parameters, such as basal area, volume, and biomass.
Saarela et al. (2015a) proposed to use probabilityproportionaltosize sampling of laser scanning strips in a twophase modelassisted sampling study where the total growing stock volume was estimated in a boreal forest area in Kuortane, Finland. It was also found that full cover of Landsat auxiliary information improved the precision of estimators compared to using only sampled LiDAR strip data.
Massey et al. (2014) evaluated the performance of the modelassisted estimation technique in connection with the Swiss national forest inventory. The authors also addressed several methodological issues and, among other things, evaluated the performance of nonparametric methods in connection with modelassisted estimation and the close connection between difference estimators and regression estimators.
As some of the first laser scanning campaigns carried out for inventory purposes at the turn of the millennium have been repeated in recent years, change estimation assisted by laser data have become an important research area. Bollandsås et al. (2013), Næsset et al. (2013a, 2015), Skowronski et al. (2014), McRoberts et al. (2015), and Magnussen et al. (2015) analysed different approaches to modelling of change in biomass, such as separate modelling of biomass at each point in time and then estimate the difference, direct modelling of change with different predictor variables, such as the variables at each time point or their differences, and longitudinal models. These modelling techniques have been combined with different designbased and modelbased estimators to produce change estimates and confidence intervals. Sannier et al. (2014) investigated change estimation based on a series of maps, which provided the auxiliary data for modelassisted difference estimation. A comprehensive review and discussion of change estimation can be found in McRoberts et al. (2014, 2015). Melville et al. (2015) evaluated three modelbased and three designbased methods for assessing the number of stems using airborne laser scanning data. The authors reported that among the designbased estimators, the most precise estimates were achieved through stratification.
Stephens et al. (2012) applied double sampling regression estimators in the designbased framework for estimating carbon stocks in New Zealand forests using laser data as auxiliary information.
Chirici et al. (2016) compared the performance of two types of airborne LiDARbased metrics in estimating total aboveground biomass through modelassisted estimators. The study area was located in Molise Region in central Italy. Corona et al. (2015) dealt with the use of map data as auxiliary information in a similar context.
Modelbased and hybrid inference in largearea forest surveys
McRoberts (2006, 2010) applied modelbased inference for estimating forest area using Landsat data as auxiliary information and field plots data. The studies were performed in northern Minnesota, U.S.A. In the studies the expected value of the total forest area was estimated, as a means to reduce the complexity of the variance estimators.
A large number of studies have applied modelbased prediction for mapping forest attributes across large areas using remotely sensed auxiliary information. Baccini et al. (2008) used moderate resolution imaging spectroradiometer (MODIS) and GLAS for mapping aboveground biomass across tropical Africa. Armston et al. (2009) used Landsat5 TM and Landsat7 ETM+ sensors for prediction foliage projective cover across a large area in Queensland, Australia. Asner et al. (2010) applied modelbased prediction for mapping the aboveground carbon stocks using satellite imaging, airborne LiDAR and field plots over 4.3 million ha of Peruvian Amazon. Helmer et al. (2010) used time series from 24 Landsat TM/ETM+ and Advance Land Imager (ALI) scenes for mapping forest attributes on the island of Eleuthera. These are only examples of a very large number of studies where walltowall remotely sensed data have been applied for mapping and monitoring forest resources. However, a majority of these studies do not apply a formal modelbased inferential framework. For example, in case the uncertainty of estimators is addressed, usually the strict modelbased inference approach [Eq. (7)] is not applied but instead some other, often adhoc, method that does not correctly reflect the uncertainty of the estimator or predictor involved.
Saarela et al. (2015b) evaluated the effects of model form and sample size on the precision of modelbased estimators in the study area Kuortane, Finland, and identified minor to moderate differences in results when different model forms were applied. In a simulation study, Magnussen (2015) demonstrated the usefulness of modelbased inference for forest surveys and argued that this approach has several advantages over traditional designbased sampling. McRoberts et al. (2014a,b) assessed the effects of uncertainty in model predictions of individual tree volume model predictions on largearea volume estimates in the survey framework of hybrid inference.
As previously mentioned, Corona et al. (2014) proposed to use the term hybrid inference for the case where a probability sample of auxiliary data may be selected, on which modelbased inference is applied; the study by Corona et al. mainly dealt with smallarea estimation issues. Ståhl et al. (2011), Gobakken et al. (2012), Nelson et al. (2012) and Magnussen et al. (2014) used hybrid inference for estimating the forest resources in Hedmark county, Norway, based on combinations of laser scanner data, laser profiler data, and field data. In the study by Magnussen et al. two populations were simulated using the data. Healey et al. (2012) applied the technique in California, using GLAS data. In a study of boreal forests in Canada, Margolis et al. (2015) likewise used GLAS data, in combination with airborne laser data, to estimate aboveground biomass.
Geographical mismatches between remotely sensed data and field measurements may considerably affect the precision of estimators in largearea surveys. The effects of such errors in modelbased and modelassisted estimation were evaluated by Saarela et al. (2016).
The findings from the brief literature review are summarized in Fig. 2.
Discussion
The review revealed that use of models in largescale forest inventories is widespread, although statistically strict applications of modelassisted estimators, modelbased inference, or hybrid inference are rather limited. While the modelassisted estimation framework is attracting large interest, modelbased inference and hybrid inference are not applied as much. A large number of studies apply approaches that could be classified as modelbased inference, although they do not pursue any strict uncertainty analyses. In this context there is room for substantial improvement regarding how mean square errors or variances are estimated.
An advantage of modelassisted estimation, as compared to modelbased and hybrid inference, is that the unbiasedness of estimators of totals and means do not rely on the correctness of the model; the model is only applied for enhancing a designbased estimator (Särndal et al. 1992). Whereas there is a theoretical chance that a modelassisted estimator is worse (in terms of variance) than a strictly designbased estimator if the model is extremely poor, a well specified model might substantially increase the precision of the modelassisted estimator compared to the strictly designbased estimator. This was shown by, e.g., Ene et al. (2012) and Saarela et al. (2015a).
If well specified models are available modelbased inference is definitely a competitive alternative to designbased inference through modelassisted estimation (McRoberts et al. 2014a, b, Magnussen 2015). It has advantages since it does not rely on a probability sample from the target area. Such samples may sometimes not be feasible due to poor infrastructure conditions, restricted access to private land, or the presence of areas that are for some reason dangerous to visit in the field. Further, in case a probability sample has been selected, based upon which models are developed and applied, modelbased inference and modelassisted estimation usually lead to similar total estimates. In case the condition \( {\displaystyle {\sum}_{i\in s}^n\frac{\left({y}_i{\widehat{y}}_i\right)}{\pi_i}=0} \) holds the estimated values will be identical. However, Saarela et al. (2016) showed that the modelbased variance estimators are less prone to problems with geolocation mismatches between field plots and remotely sensed auxiliary data.
Hybrid inference is a straightforward approach in cases where auxiliary data are not available walltowall and such data are expensive to acquire. In such cases a sample of auxiliary data can be selected, upon which the auxiliary variable totals and means can be estimated and used together with model predictions that link the auxiliary variables with the target variable. The approach so far appears to have been applied only in a limited number of forest inventories, although implicitly it has been used for a long time in forest inventories where models (such as volume, biomass and growth models) have been applied based on data from forest plots (Ståhl et al. 2014).
Overall, the use of models relies on auxiliary data that are correlated with or otherwise related with the target variable. Considering the variables normally included in national forest inventories (Tomppo et al. 2010) it is likely that a large number of variables would be very difficult to model in terms of remotely sensed data. This might be the case for forest floor vegetation, soil properties, and several types of forest damage. Modelling approaches linked to such variables would probably not improve the precision of estimators. Thus, a large number of variables, such as site index, forest floor vegetation, soil type, etc., are likely to require probability field samples.
Conclusions
We conclude by noting that all three approaches studied: modelassisted estimation, modelbased inference, and hybrid inference, have advantages and disadvantages when applied in largearea forest surveys. A main advantage of modelassisted estimation is that unbiasedness of estimators does not rely on the suitability of the model, but the model only helps to improve the precision of an estimator known to be (approximately) unbiased. Modelbased and hybrid inference rely on the suitability of the model, but may have several advantages under conditions where access to field plots is difficult or expensive. All three approaches rely on the possibility to develop accurate models, which is possible for several important forest variables (such as biomass), but not for all variables that are included in a normal national forest inventory.
References
Andersen HE, Barrett T, Winterberger K, Strunk J, Temesgen H (2009) Estimating forest biomass on the western lowlands of the Kenai Peninsula of Alaska using airborne lidar and field plot data in a modelassisted sampling design. In: Proceedings of the IUFRO Division 4 Conference: “Extending Forest Inventory and Monitoring over Space and Time”., pp 19–22
Andersen HE, Strunk J, Temesgen H (2011) Using airborne light detection and ranging as a sampling tool for estimating forest biomass resources in the Upper Tanana Valley of Interior Alaska. West J Appl Forestry 26:157–164
Armston JD, Denham RJ, Danaher TJ, Scarth PF, Moffiet TN (2009) Prediction and validation of foliage projective cover from Landsat5 TM and Landsat7 ETM+ imagery. J Appl Remote Sensing 3:33540–33540, http://dx.doi.org/10.1117/1.3216031
Asner GP, Powell GV, Mascaro J, Knapp DE, Clark JK, Jacobson J, Hughes RF (2010) Highresolution forest carbon stocks and emissions in the Amazon. Proc Natl Acad Sci 107:16738–16742, http://dx.doi.org/10.1073/pnas.1004875107
Asner GP, Mascaro J, MullerLandau HC, Vieilledent G, Vaudry R, Rasamoelina M, Hall S, van Breugel M (2012) A universal airborne LiDAR approach for tropical forest carbon mapping. Oecologia 168:1147–1160, http://dx.doi.org/10.1007/s004420112165z
Baccini A, Laporte N, Goetz SJ, Sun M, Dong H (2008) A first map of tropical Africa’s aboveground biomass derived from satellite imagery. Environ Res Lett 3:9
Baffetta F, Fattorini L, Franceschi S, Corona P (2009) Designbased approach to knearest neighbours technique for coupling field and remotely sensed data in forest surveys. Remote Sensing Environ 113(3):463–475, http://dx.doi.org/10.1016/j.rse.2008.06.014
Baffetta F, Corona P, Fattorini L (2011) Designbased diagnostics for kNN estimators of forest resources. Can J Forest Res 41:59–72
Bohlin J, Wallerman J, Fransson JE (2012) Forest variable estimation using photogrammetric matching of digital aerial images in combination with a highresolution DEM. Scand J Forest Res 27:692–699, http://dx.doi.org/10.1080/02827581.2012.686625
Bollandsås OM, Gregoire TG, Næsset E, Øyen BH (2013) Detection of biomass change in a Norwegian mountain forest area using small footprint airborne laser scanner data. Stat Methods Appl 22:113–129, http://dx.doi.org/10.1007/s1026001202205
Boudreau J, Nelson RF, Margolis HA, Beaudoin A, Guindon L, Kimes DS (2008) Regional aboveground forest biomass using airborne and spaceborne LiDAR in Québec. Remote Sensing Environ 112:3876–3890, http://dx.doi.org/10.1016/j.rse.2008.06.003
Breidenbach J, Astrup R (2012) Small area estimation of forest attributes in the Norwegian National Forest Inventory. Eur J Forest Res 131:1255–1267, http://dx.doi.org/10.1007/s1034201205967
Breidenbach J, McRoberts RE, Astrup R (2015) Empirical coverage of modelbased variance estimators for remote sensing assisted estimation of standlevel timber volume. Remote Sensing Environ (in press). http://dx.doi.org/10.1016/j.rse.2015.07.026
Breidt FJ, Opsomer JD (2000) Local polynomial regression estimators in survey sampling. Ann Stat 2000:1026–1053
Breidt FJ, Claeskens G, Opsomer JD (2005) Modelassisted estimation for complex surveys using penalised splines. Biometrika 92:831–846, http://dx.doi.org/10.1093/biomet/92.4.831
Cassel CM, Särndal CE, Wretman JH (1977) Foundations of inference in survey sampling. Wiley, New York
Chambers R, Clark R (2012) An introduction to modelbased survey sampling with applications. Oxford University Press. http://dx.doi.org/10.1093/acprof:oso/9780198566625.001.0001
Chirici G, McRoberts RE, Fattorini L, Mura M, Marchetti M (2016) Comparing echobased and canopy height modelbased metrics for enhancing estimation of forest aboveground biomass in a modelassisted framework. Remote Sensing Environ 174:1–9, http://dx.doi.org/10.1016/j.rse.2015.11.010
Corona P, Fattorini L, Franceschi S, Scrinzi G, Torresan C (2014) Estimation of standing wood volume in forest compartments by exploiting airborne laser scanning information: modelbased, designbased, and hybrid perspectives. Can J Forest Res 44:1303–1311, http://dx.doi.org/10.1139/cjfr20140203
Corona P, Fattorini L, Pagliarella MC (2015) Sampling strategies for estimating forest cover from remote sensingbased twostage inventories. Forest Ecosystems 2(1):1–12, http://dx.doi.org/10.1186/s4066301500427
Ene LT, Næsset E, Gobakken T, Gregoire TG, Ståhl G, Nelson R (2012) Assessing the accuracy of regional LiDARbased biomass estimation using a simulation approach. Remote Sensing Environ 123:579–592, http://dx.doi.org/10.1016/j.rse.2012.04.017
Fattorini L, Marcheselli M, Pisani C (2006) A threephase sampling strategy for largescale multiresource forest inventories. J Agric Biol Environ Stat 11(3):296–316, http://dx.doi.org/10.1198/108571106X130548
Fattorini L, Franceschi S, Pisani C (2009) A twophase sampling strategy for largescale forest carbon budgets. J Stat Plann Inference 139(3):1045–1055, http://dx.doi.org/10.1016/j.jspi.2008.06.014
Gobakken T, Næsset E, Nelson R, Bollandsås OM, Gregoire TG, Ståhl G, Holm S, Ørka HO, Astrup R (2012) Estimating biomass in Hedmark County, Norway using national forest inventory field plots and airborne laser scanning. Remote Sensing Environ 123:443–456, http://dx.doi.org/10.1016/j.rse.2012.01.025
Grafström A, Saarela S, Ene LT (2014) Efficient sampling strategies for forest inventories by spreading the sample in auxiliary space. Can J Forest Res 44:1156–1164, http://dx.doi.org/10.1139/cjfr20140202
Gregoire TG (1998) Designbased and modelbased inference in survey sampling: appreciating the difference. Can J Forest Res 28:1429–1447, http://dx.doi.org/10.1139/x98166
Gregoire TG, Valentine HT (2008) Sampling strategies for natural resources and the environment. CRC Press, Taylor & Francis Group, Boca Raton
Gregoire TG, Ståhl G, Næsset E, Gobakken T, Nelson R, Holm S (2011) Modelassisted estimation of biomass in a LiDAR sample survey in Hedmark County, Norway This article is one of a selection of papers from Extending Forest Inventory and Monitoring over Space and Time. Can J Forest Res 41:83–95, http://dx.doi.org/10.1139/X10195
Hansen MH, Madow WG, Tepping BJ (1978) On inference and estimation from sample surveys. In: Proceedings of the Survey Research Methods Section., pp 82–107
Hansen MH, Madow WG, Tepping BJ (1983) An evaluation of modeldependent and probabilitysampling inferences in sample surveys. J Am Stat Assoc 78:776–793, http://dx.doi.org/10.1080/01621459.1983.10477018
Healey SP, Patterson PL, Saatchi S, Lefsky MA, Lister AJ, Freeman EA (2012) A sample design for globally consistent biomass estimation using lidar data from the Geoscience Laser Altimeter System (GLAS). Carbon Balance Manage 7:1–9, http://dx.doi.org/10.1186/17500680710
Helmer EH, Ruzycki TS, Wunderle JM, Vogesser S, Ruefenacht B, Kwit C, Ewert DN (2010) Mapping tropical dry forest height, foliage height profiles and disturbance type and age with a time series of cloudcleared Landsat and ALI image mosaics to characterize avian habitat. Remote Sensing Environ 114:2457–2473, http://dx.doi.org/10.1016/j.rse.2010.05.021
Köhl M, Brassel P (2001) Zur Auswirkung der Hangneigungskorrektur auf Schätzwerte im Schweizerischen Landesforstinventar (LFI) [Investigation of the effect of the slope correction method as applied in the Swiss National Forest Inventory of estimates.]. Schweizerische Zeitschrift fur Forstwesen 152(6):215–225, http://dx.doi.org/10.3188/szf.2001.0215
Magnussen S (2015) Arguments for a modeldependent inference? Forestry 88(3):317–325, http://dx.doi.org/10.1093/forestry/cpv002
Magnussen S, Tomppo E (2015) Modelcalibrated knearest neighbor estimators. Scandinavian J Forest Res 1–11. http://dx.doi.org/10.1080/02827581.2015.1073348
Magnussen S, Næsset E, Gobakken T (2014) An estimator of variance for twostage ratio regression estimators. Forest Sci 60(4):663–676, http://dx.doi.org/10.5849/forsci.12163
Magnussen S, Næsset E, Gobakken T (2015) LiDARsupported estimation of change in forest biomass with timeinvariant regression models. Can J Forest Res 45(999):1514–1523, http://dx.doi.org/10.1139/cjfr20150084
Mandallaz D (2013) Designbased properties of some smallarea estimators in forest inventory with twophase sampling. Can J Forest Res 43:441–449, http://dx.doi.org/10.1139/cjfr20120381
Margolis HA, Nelson RF, Montesano PM, Beaudoin A, Sun G, Andersen HE, Wulder M (2015) Combining satellite lidar, airborne lidar and ground plots to estimate the amount and distribution of aboveground biomass in the Boreal forest of North America. Can J Forest Res 45(7):838–855, http://dx.doi.org/10.1139/cjfr20150006
Massey A, Mandallaz D, Lanz A (2014) Integrating remote sensing and past inventory data under the new annual design of the Swiss National Forest Inventory using threephase designbased regression estimation. Can J Forest Res 44:1177–1186, http://dx.doi.org/10.1139/cjfr20140152
McRoberts RE (2006) A modelbased approach to estimating forest area. Remote Sensing Environ 103:56–66, http://dx.doi.org/10.1016/j.rse.2006.03.005
McRoberts RE (2010) Probabilityand modelbased approaches to inference for proportion forest using satellite imagery as ancillary data. Remote Sensing Environ 114:1017–1025, http://dx.doi.org/10.1016/j.rse.2009.12.013
McRoberts RE, Tomppo EO, Finley AO, Heikkinen J (2007) Estimating areal means and variances of forest attributes using the kNearest Neighbors technique and satellite imagery. Remote Sensing Environ 111:466–480
McRoberts RE, Bollandsås OM, Næsset E (2014) Modeling and estimating change. In: Maltamo M, Næsset E, Vauhkonen J. (eds) Forestry Applications of Airborne Laser Scanning. Concepts and Case Studies. Springer, pp. 293–314. http://dx.doi.org/10.1007/9789401786638_15
McRoberts RE, Næsset E, Gobakken T, Bollandsås OM (2015) Indirect and direct estimation of forest biomass change using forest inventory and airborne laser scanning data. Remote Sensing Environ 164:36–42, http://dx.doi.org/10.1016/j.rse.2015.02.018
Melville GJ, Welsh AH, Stone C (2015) Improving the efficiency and precision of tree counts in pine plantations using airborne LiDAR data and flexibleradius plots: modelbased and designbased approaches. J Agric Biol Environ Stat 20(2):229–257, http://dx.doi.org/10.1007/s1325301502056
Næsset E (1997) Estimating timber volume of forest stands using airborne laser scanner data. Remote Sensing Environ 61:246–253, http://dx.doi.org/10.1016/S00344257(97)000412
Næsset E (2002a) Determination of mean tree height of forest stands by means of digital photogrammetry. Scand J Forest Res 17: 446–459. http://dx.doi.org/10.1080/028275802320435469
Næsset E (2002b) Predicting forest stand characteristics with airborne scanning laser using a practical twostage procedure and field data. Remote Sensing Environ 80: 88–99. http://dx.doi.org/10.1016/S00344257(01)002905
Næsset E (2004) Accuracy of forest inventory using airborne laser scanning: evaluating the first Nordic fullscale operational project. Scand J Forest Res 19:554–557, http://dx.doi.org/10.1080/02827580410019544
Næsset E, Gobakken T, Solberg S, Gregoire TG, Nelson R, Ståhl G, Weydahl D (2011) Modelassisted regional forest biomass estimation using LiDAR and InSAR as auxiliary data: A case study from a boreal forest area. Remote Sensing Environ 115:3599–3614, http://dx.doi.org/10.1016/j.rse.2011.08.021
Næsset E, Bollandsås OM, Gobakken T, Gregoire TG, Ståhl G (2013a) Modelassisted estimation of change in forest biomass over an 11year period in a sample survey supported by airborne LiDAR: A case study with poststratification to provide “activity data”. Remote Sensing Environ 128: 299–314. http://dx.doi.org/10.1016/j.rse.2012.10.008
Næsset E, Gobakken T, Bollandsås OM, Gregoire TG, Nelson R, Ståhl G (2013b) Comparison of precision of biomass estimates in regional field sample surveys and airborne LiDARassisted surveys in Hedmark County, Norway. Remote Sensing Environ 130: 108–120. http://dx.doi.org/10.1016/j.rse.2012.11.010
Næsset E, Bollandsås OM, Gobakken T, Solberg S, McRoberts RE (2015) The effects of field plot size on modelassisted estimation of aboveground biomass change using multitemporal interferometric SAR and airborne laser scanning data. Remote Sensing Environ 168:252–264, http://dx.doi.org/10.1016/j.rse.2015.07.002
Nelson R, Krabill W, Maclean G (1984) Determining forest canopy characteristics using airborne laser data. Remote Sensing Environ 15:201–212, http://dx.doi.org/10.1016/00344257(84)900312
Nelson R, Krabill W, Tonelli J (1988) Estimating forest biomass and volume using airborne laser data. Remote Sensing Environ 24:247–267, http://dx.doi.org/10.1016/00344257(88)900284
Nelson R, Boudreau J, Gregoire TG, Margolis H, Næsset E, Gobakken T, Ståhl G (2009) Estimating Quebec provincial forest resources using ICESat/GLAS. Can J Forest Res 39:862–881, http://dx.doi.org/10.1139/X09002
Nelson R, Gobakken T, Næsset E, Gregoire TG, Ståhl G, Holm S, Flewelling J (2012) Lidar sampling  using an airborne profiler to estimate forest biomass in Hedmark County, Norway. Remote Sensing Environ 123:563–578, http://dx.doi.org/10.1016/j.rse.2011.10.036
Neyman J (1934) On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J R Stat Soc 97:558–606, http://dx.doi.org/10.2307/2342192
Opsomer JD, Breidt FJ, Moisen GG, Kauermann G (2007) Modelassisted estimation of forest resources with generalized additive models. J Am Stat Assoc 102:400–409, http://dx.doi.org/10.1198/016214506000001491
Reese H, Nilsson M, Sandström P, Olsson H (2002) Applications using estimates of forest parameters derived from satellite and forest inventory data. Comput Electron Agric 37:37–55, http://dx.doi.org/10.1016/S01681699(02)001187
Saarela S, Grafström A, Ståhl G, Kangas A, Holopainen M, Tuominen S, Nordkvist K, Hyyppä, J (2015a) Modelassisted estimation of growing stock volume using different combinations of LiDAR and Landsat data as auxiliary information. Remote Sensing Environ 158: 431–440. http://dx.doi.org/10.1016/j.rse.2014.11.020
Saarela S, Schnell S, Grafström A, Tuominen S, Nordkvist K, Hyyppä J, Kangas A, Ståhl G (2015b) Effects of sample size and model form on the accuracy of modelbased estimators of growing stock volume in Kuortane, Finland. Can J Forest Re 45:1524–1534. http://dx.doi.org/10.1139/cjfr20150077
Saarela S, Schnell S, Tuominen S, Balazs A, Hyyppä J, Grafström A, Ståhl G (2016) Effects of positional errors in modelassisted and modelbased estimation of growing stock volume. Remote Sensing Environ 172:101–108, http://dx.doi.org/10.1016/j.rse.2015.11.002
Sannier C, McRoberts RE, Fichet LV, Makaga EMK (2014) Using the regression estimator with Landsat data to estimate proportion forest cover and net proportion deforestation in Gabon. Remote Sensing Environ 151:138–148, http://dx.doi.org/10.1016/j.rse.2013.09.015
Särndal CE (1978) Designbased and modelbased inference in survey sampling [with discussion and reply]. Scand J Stat 5(1):27–52
Särndal CE, Swensson B, Wretman J (1992) Model Assisted Survey Sampling. Springer. http://dx.doi.org/10.1007/9781461243786
Skowronski NS, Clark KL, Gallagher M, Birdsey RA, Hom JL (2014) Airborne laser scannerassisted estimation of aboveground biomass change in a temperate oakpine forest. Remote Sensing Environ 151:166–174, http://dx.doi.org/10.1016/j.rse.2013.12.015
Solberg S, Astrup R, Bollandsås OM, Næsset E, Weydahl DJ (2010) Deriving forest monitoring variables from Xband InSAR SRTM height. Can J Remote Sensing 36:68–79, http://dx.doi.org/10.5589/m10025
Ståhl G, Holm S, Gregoire TG, Gobakken T, Næsset E, Nelson R (2011) Modelbased inference for biomass estimation in a LiDAR sample survey in Hedmark County, Norway. Can J Forest Res 41:96–107, http://dx.doi.org/10.1139/X10161
Ståhl G, Heikkinen J, Petersson H, Repola J, Holm S (2014) Samplebased estimation of greenhouse gas emissions from forests – A new approach to account for both sampling and model errors. Forest Sci 60:3–13, http://dx.doi.org/10.5849/forsci.13005
Stephens PR, Kimberley MO, Beets PN, Paul TS, Searles N, Bell A, Brack C, Broadley J (2012) Airborne scanning LiDAR in a double sampling forest carbon inventory. Remote Sensing Environ 117:348–357, http://dx.doi.org/10.1016/j.rse.2011.10.009
Strunk JL, Reutebuch SE, Andersen HE, Gould PJ, McGaughey RJ (2012a) Modelassisted forest yield estimation with light detection and ranging. West J Appl Forestry 27: 53–59. http://dx.doi.org/10.5849/wjaf.10043
Strunk J, Temesgen H, Andersen HE, Flewelling JP, Madsen L (2012b) Effects of lidar pulse density and sample size on a modelassisted approach to estimate forest inventory variables. Can J Remote Sensing 38: 644–654. http://dx.doi.org/10.5589/m12052
Tomppo E. Katila M (1991) Satellite imagebased national forest inventory of Finland for publication in the IGARSS’91 digest. In: Geoscience and Remote Sensing Symposium, 1991. IGARSS’91. Remote Sensing: Global Monitoring for Earth Management., International (Vol. 3, pp. 1141–1144). http://dx.doi.org/10.1109/igarss.1991.579272
Tomppo E, Olsson H, Ståhl G, Nilsson M, Hagner O, Katila M (2008) Combining national forest inventory field plots and remote sensing data for forest databases. Remote Sensing Environ 112(5):1982–1999
Tomppo E, Gschwantner T, Lawrence M, McRoberts RE, Gabler K, Schadauer K, Vidal C, Lanz A, Ståhl G, Cienciala E (2010) National forest inventories. Pathways for Common Reporting. Springer, 541–553. http://dx.doi.org/10.1007/9789048132331
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
GS: Initiative and major contribution to writing and review. SvS: Major contribution to writing and review. SeS, SH, JB, SPH, PLP, SM, EN, REM, TGG: Contribution to review and suggestions for improvement to preliminary versions of the manuscript. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Ståhl, G., Saarela, S., Schnell, S. et al. Use of models in largearea forest surveys: comparing modelassisted, modelbased and hybrid estimation. For. Ecosyst. 3, 5 (2016). https://doi.org/10.1186/s4066301600649
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4066301600649
Keywords
 Designbased inference
 Modelassisted estimation
 Modelbased inference
 Hybrid inference
 National forest inventory
 Remote sensing
 Sampling