Skip to main content

Mapping aboveground biomass and its prediction uncertainty using LiDAR and field data, accounting for tree-level allometric and LiDAR model errors

Abstract

Background

The increasing availability of remotely sensed data has recently challenged the traditional way of performing forest inventories, and induced an interest in model-based inference. Like traditional design-based inference, model-based inference allows for regional estimates of totals and means, but in addition for wall-to-wall mapping of forest characteristics. Recently Light Detection and Ranging (LiDAR)-based maps of forest attributes have been developed in many countries and been well received by users due to their accurate spatial representation of forest resources. However, the correspondence between such mapping and model-based inference is seldom appreciated. In this study we applied hierarchical model-based inference to produce aboveground biomass maps as well as maps of the corresponding prediction uncertainties with the same spatial resolution. Further, an estimator of mean biomass at regional level, and its uncertainty, was developed to demonstrate how mapping and regional level assessment can be combined within the framework of model-based inference.

Results

Through a new version of hierarchical model-based estimation, allowing models to be nonlinear, we accounted for uncertainties in both the individual tree-level biomass models and the models linking plot level biomass predictions with LiDAR metrics. In a 5005 km2 large study area in south-central Sweden the predicted aboveground biomass at the level of 18 m ×18 m map units was found to range between 9 and 447 Mg ·ha−1. The corresponding root mean square errors ranged between 10 and 162 Mg ·ha−1. For the entire study region, the mean aboveground biomass was 55 Mg ·ha−1 and the corresponding relative root mean square error 8%. At this level 75% of the mean square error was due to the uncertainty associated with tree-level models.

Conclusions

Through the proposed method it is possible to link mapping and estimation within the framework of model-based inference. Uncertainties in both tree-level biomass models and models linking plot level biomass with LiDAR data are accounted for, both for the uncertainty maps and the overall estimates. The development of hierarchical model-based inference to handle nonlinear models was an important prerequisite for the study.

Introduction

The interest in forest inventories is increasing due to the key role of forests in global carbon cycles and, thus, in the climate change discourse (e.g., Bellassen and Luyssaert 2014). For a long time regional and national forest inventories have been based on field surveys of sparse networks of sample plots (e.g., Tomppo et al. 2010; McRoberts et al. 2009). Such surveys rely on sampling theory and design-based inference (e.g., Gregoire and Valentine 2008). An advantage is that estimates of population parameters such as mean and total biomass, and the corresponding uncertainties, can be obtained without making any assumptions about the population studied. Examples of national forest inventories of this kind are given in Tomppo et al. (2008a, 2010) and Fridman et al. (2014). Currently, they are the backbones of national forest statistics and international reporting for compiling regional and global forest resources assessments (e.g., Forest Europe 2015).

Some disadvantages of traditional forest inventories are that they are expensive to carry out in areas with poor road infrastructure and they provide only limited spatial information about the forest resources. Use of remotely sensed (RS) data is a means to overcome these disadvantages and an increasing number of studies suggests that RS data can be used in several ways to improve the efficiency of forest inventories and add new dimensions to the results. For example, RS data can be used for stratification in the design phase of forest inventories (e.g., Katila and Tomppo 2002; McRoberts et al. 2002; Haakana et al. 2019), unequal probability sampling (e.g., Saarela et al. 2015a), or balanced sampling (e.g., Grafstrom et al. 2017a). The latter design was implemented in the Swedish NFI in 2018 for the temporary plots (Grafstrom et al. 2017b). Gregoire et al. (2011) and Gobakken et al. (2012) have shown how RS data can be applied in the estimation phase to improve the precision of estimators through model-assisted estimation. Franco-Lopez et al. (2001), Tomppo et al. (2008b) and, recently, Esteban et al. (2019) have shown how RS data can be combined with field sample data to provide maps of forest resources as a complement to estimates of means and totals of the variables of interest.

When these types of maps are developed, RS and field data are used in regression analysis or other types of machine learning algorithms to predict the variable of interest for each map element based on the RS data, using an explicit or implicit model relationship between field and RS data (e.g., Andersen et al. 2005; Hudak et al. 2008). Other examples are Wulder et al. (2008), who used regression analysis to map predicted forest biomass over a large spatial domain in central Saskatchewan, Canada, using Landsat Thematic Mapper (TM) multispectral data at a spatial resolution of 30 m ×30 m, and Zald et al. (2016) who applied the random forest algorithm to map forest attributes using a combination of airborne Light Detection and Ranging (LiDAR) sample data and wall-to-wall multispectral data derived from Landsat TM and the Enhanced Thematic Mapper Plus (ETM+) imagery over a large spatial domain in Saskatchewan, Canada. An increasing number of studies of this kind address forest mapping not only at local and national scales, but also at regional and global scales. For example, Saatchi et al. (2011) used a fusion of wall-to-wall spatial data from multiple sensors and sampled forest height data collected by the Geoscience Laser Altimeter System (GLAS) onboard the Ice, Cloud and land Elevation Satellite-1 (ICESat-1) to construct a global map of forest carbon stocks at a 1-km spatial resolution. Hansen et al. (2003) compiled a global map of percent tree cover through a supervised regression tree algorithm using MODIS data at a 500-m spatial resolution; and in the GEDI project (Dubayah et al. 2014; Qi and Dubayah 2016; Qi et al. 2019; Patterson et al. 2019) a space-borne laser is applied for mapping biomass with almost global coverage at a 1-km spatial resolution.

Following the introduction of LiDAR-assisted forest inventories (e.g., Nelson et al. 1988; Næsset 2002) forest resource maps of this kind have become accurate enough to support management decisions at local level, ranging from the scale of individual map units, to stands and aggregates of forest stands. Examples are harvesting decisions at the level of stands and valuation at the level of forest properties (e.g., Nilsson et al. 2003). Thus, sparse samples of field data can now be used for several purposes, e.g., they can be used to provide traditional forest statistics as well as maps of forest resources and environmental conditions, when combined with RS data (e.g., Nilsson et al. 2017).

However, it is far from trivial how maps of forest resources should be used for producing reliable forest statistics. For example, McRoberts et al. (2018) point out that land cover statistics will be subject to systematic errors if classifications for map units are merely summed across a study area. Further, Gregoire et al. (2016) identify several problems related to how uncertainties are typically computed and reported in LiDAR-supported studies. For example, in many cases uncertainties are merely assessed intuitively based on subjectively selected sets of map units where predicted values are compared with the corresponding field measurements. It is not straightforward to use data from such comparisons for estimating variances or mean square errors of estimated means and totals for an entire study area.

One way of using RS auxiliary data (e.g. in the form of forest maps) in a statistically rigorous way for producing forest statistics is to apply model-assisted estimation (e.g., Gregoire et al. 2011; Saarela et al. 2015a). With this design-based inference technique model predictions are subtracted from field reference values for a probability sample of map units. The mean of the observed deviations is added to the mean of the model predictions across all map units to obtain an estimate of the mean of the variable of interest. While this approach has a potential to make cost-efficient use of RS data that correlate strongly with study variables of interest, a drawback of model-assisted estimation is that it requires a fairly large probability sample of field data (Ståhl et al. 2016).

An alternative approach to using RS auxiliary data in a statistically rigorous manner for producing forest statistics is to apply model-based inference (e.g., Magnussen 2015). Model-based inference relies on an assumption of a model that is assumed to generate random values for each map unit based on the auxiliary data available for the map units. This model is sometimes referred to as a superpopulation model, from which the actual population is realized (Cassel et al. 1977; Särndal et al. 1978; Gregoire 1998). Since the individual values of the population elements are random variables, so are the population means and totals. While many concepts and estimation procedures related to model-based inference tend to be more complicated than design-based inference (e.g., Gregoire 1998), model-based inference also has advantages (e.g., Magnussen 2015) e.g., with regard to combining forest mapping and compiling forest statistics, since the same basic procedures can be applied for both purposes. Moreover, model-based inference can be used for producing maps of uncertainties for the variables of interest, although this opportunity seems to be seldom exploited. An example is Esteban et al. (2019), who presented a map of uncertainties for forest attribute predictions using the random forest algorithm. The uncertainties were estimated through bootstrapping. Such uncertainty maps are important for assessing to what extent the map information is accurate enough for making different kinds of decisions.

Interestingly, uncertainty estimation under model-based inference is fairly straightforward both at the level of individual map units and at the level of totals and means across large study areas (e.g., Ståhl et al. 2016). For small aggregates of map units it is more complicated, due to the need for information about how strongly model errors are correlated across space (McRoberts et al. 2018). Still, many different techniques have been presented for small area estimation (e.g., Breidenbach and Astrup 2012; Magnussen et al. 2014).

Conventional model-based inference utilizes auxiliary information from all map units and a single model for the relationship between auxiliary data and the variable of interest. However, in many important applications several models need to be combined in a hierarchical manner, such as when biomass models are first developed for single trees and then applied to predict aggregate plot level biomass, which is used as training data for developing a model linking RS data with plot level biomass. In such cases it is not trivial how uncertainties should be estimated. Saarela et al. (2016) developed a method called hierarchical model-based (HMB) estimation and showed that neglecting the uncertainty associated with one of the two models involved might lead to severe underestimation of the uncertainty. The method was further developed in Saarela et al. (2018), although both studies were restricted to use of linear models.

Objectives

The objective of this study was to clarify the link between producing forest statistics for large areas through model-based inference and mapping of forest resources, providing high spatial resolution maps of aboveground biomass (AGB) and the corresponding uncertainties in terms of root mean square errors (RMSE). This link may be obvious, since model-based inference requires predictions for each map unit, but it is seldom acknowledged in research studies. Data were available from single trees, harvested and measured for developing allometric tree biomass models, field plots from the Swedish National Forest Inventory (NFI), and wall-to-wall airborne LiDAR data acquired in 2009 – 2011. The map units were 18 ×18 meters large. HMB inference was applied, through an individual tree biomass model and a biomass model linking LiDAR data with aggregate plot-level biomass. An important part of the study was to further develop the HMB inference theory to incorporate nonlinear models.

Material and Method

Overview

Forest attribute maps are usually based on wall-to-wall RS data. In our example, we used airborne wall-to-wall LiDAR data, collected on a national scale in Sweden. Regression analysis was employed to regress the forest attribute plot-level AGB on LiDAR metrics. The AGB-LiDAR regression model was trained on field plot data from the Swedish NFI. The plot-level AGB value is a sum of tree-level AGB predictions using field measurements of height and diameter at breast height (DBH), i.e. tree-level AGB was regressed on tree height and DBH. To train the tree-level regression models, hereafter referred to as AGB allometric models, datasets collected by Marklund (1987, 1988) were used. Therefore, in our estimation procedure there are two modelling steps involved: the AGB allometric models and the (plot-level) AGB-LiDAR model.

HMB inference was employed to estimate the RMSE and the relative RMSE (RelRMSE) accounting for uncertainties due to both modelling steps. New theory was developed for incorporating nonlinear models in the HMB framework. Through the new theory, the previous derivations of HMB estimators Saarela et al. (2016, 2018) could be simplified. A graphical overview of the study is shown in Fig. 1.

Fig. 1
figure1

Study overview

Study area

Our study area was a rectangular 5005 km2 large block located in south-central Sweden. The forest area was approximately 3909 km2, i.e. 78% of the total area. Agricultural lands and urban areas were masked out using land-use maps available through the Swedish National Mapping Agency (Lantmäteriet 2019). The study area was tessellated into 18 m ×18 m grid-cells (map units). The map unit size (324 m2) is approximately equal to the area size of the NFI circular field plots described in “Swedish NFI data and AGB-LiDAR regression model” section. The locations of the study area and the clusters of NFI plots are shown in Fig. 2.

Fig. 2
figure2

The study area and the clusters of NFI plots applied in the study

Swedish national airborne LiDAR survey data

In 2009 the Swedish National Mapping Agency initiated a national airborne LiDAR survey to obtain a new digital elevation model (DEM) with 2 m ×2 m spatial resolution. By 2015 almost all forest lands in Sweden were scanned (Nilsson et al. 2017). The scanning altitude was between 1700 m and 2300 m, and the pulse density about 0.5-1 pulses ·m −2.

For each map unit and NFI field plot, LiDAR metrics were derived using the Fusion software (McGaughey 2012) following the area-based approach proposed by Næsset (2002) from the height distribution of laser returns between 1.5 m and 50 m height above ground. The upper and lower height thresholds were set to exclude non-vegetation returns (e.g., Lindberg et al. 2012). As regressors in our AGB-LiDAR model, we used two LiDAR metrics: the laser height 80% percentile (hp80) and the vegetation ratio (vr), which is a ratio between the number of first returns above 1.5 m and the total number of first returns. The choice of these variables follows findings in Nilsson et al. (2017).

Swedish NFI data and AGB-LiDAR regression model

Plot-level field data were obtained from the Swedish NFI, which applies a systematic design of clusters with 4–12 plots. We utilized data collected during 2009–2011 from 504 permanent plots located in the study area and in the neighbourhood of the study area (Fig. 2). The plot radius was 10 m. A criterion for the plot selection was that the same LiDAR instrument as the one used to collect LiDAR over the study area had been applied. Information on individual tree locations, species, and DBH was available for each tree in all plots. However, tree height was available only for a subsample of the trees (Fridman et al. 2014). Table 1 provides descriptive statistics of the NFI data, separated for each species on trees with and without tree height measurements.

Table 1 NFI tree-level data descriptive statistics separated for each tree species on trees with and without height measurements

Figure 3 shows a histogram of plot-level AGB, aggregated from the tree-level AGB predictions, and converted to tonnes per hectare (Mg ·ha−1). The tree-level AGB values were predicted using the AGB allometric models described in “Tree-level data and AGB allometric models” section.

Fig. 3
figure3

A histogram of plot-level AGB (aggregated tree-level AGB predictions converted to tonnes per hectare), (Mg ·ha−1)

The plot-level AGB-LiDAR model, denoted as the G model in our study, has a model form similar to Nilsson et al. (2017). However, whereas Nilsson et al. (2017) used a standard square root transformation model, we applied a similar nonlinear model with additive errors including the same LiDAR metrics (see Table 2). We employed the restricted maximum likelihood estimators available in the R package nlme (Pinheiro et al. 2016) through the gnls() R function. To overcome problems due to heteroskedasticity, we fitted a random error variance model to estimate the individual error variance for every predicted plot-level AGB value. Table 2 presents the estimated model parameters. Figure 4 shows the standardized residuals for plot-level AGB predictions from LiDAR metrics, displayed versus predicted plot-level AGBs using LiDAR metrics, and (lower) LiDAR-based predictions of plot-level AGB displayed versus AGBs obtained from field measurements, i.e. the AGBs obtained from aggregating tree-level AGB predictions to plot level. Although the Swedish NFI applies clusters of plots, the long distance between plots in a cluster implied that we treated each plot as an independent observation in the regression analysis (cf. Nilsson et al. 2017).

Fig. 4
figure4

AGB-LiDAR model scatterplots at the NFI plot level

Table 2 Estimated AGB-LiDAR model parameters

In Table 2, \(\widehat {y}_{G_{i}}\) denotes predicted plot-level AGB (Mg ·ha−1) using LiDAR metrics from the ith NFI plot, α0,α1 and α2 are AGB-LiDAR model parameters to be estimated, V(υi) is individual random error variance, which was modelled through a nonlinear power model with three parameters denoted σ2,δ0 and δ1.

Tree-level data and AGB allometric models

The tree-level data used for developing AGB allometric models were collected across Sweden by Marklund (1987,1988). We used data collected only from the central and southern parts of Sweden (Table 3).

Table 3 Tree-level data; descriptive statistics (Marklund 1987, 1988)

Since heights were not available for all trees on the NFI plots (Fridman et al. 2014), we developed two AGB allometric model types for each tree species: one with DBH (cm) and height (m) as regressors (type I), and the other with DBH (cm) only (type II). In our study we used AGB allometric model forms by similar to Repola (2008) for birch and Repola (2009) for pine and spruce. Table 4 presents the models forms by species.

Table 4 AGB allometric model forms by species

In Table 4, yk is a tree-level AGB (kg) for the kth tree based on data from Marklund (1987, 1988); \(d_{tr_{k}}=2+1.5 \times DBH_{k}\) is a transformation of DBH (cm) to approximate the stump diameter (Laasasenaho 1982), hk is the tree height (m), and β0,β1 and β2 are AGB allometric model parameters to be estimated.

The R function gnls() (Pinheiro et al. 2016) was used to fit model parameters accounting for heteroskedasticity. Table 5 presents the estimated model parameters.

Table 5 Estimated parameters in the AGB allometric models

In Table 5, \(\widehat {y}_{k}\) is a tree-level predicted AGB (kg) using the corresponding allometric model for the kth tree, V(εk) is individual random error variance modelled through a nonlinear power model with two parameters ω2 and γ.

Figure 5 shows residual scatterplots, and Fig. 6 presents scatterplots for AGB versus predicted AGB at tree level.

Fig. 5
figure5

Standardized residuals versus predicted tree-level AGB

Fig. 6
figure6

Predicted tree-level AGB versus measured tree-level AGB

Generalized hierarchical model-based estimation with nonlinear models

HMB inference is based on the model-based inference philosophy (Saarela et al. 2018). The target population (of map units with AGB as the population attribute in our example) is seen as a random realization shaped by a superpopulation model involving some auxiliary information. If there is more than one superpopulation model involved and if the estimation procedure employs the models in a hierarchical order, we can speak of HMB inference (Saarela et al. 2018). We denote the first superpopulation model, linked to the AGB allometric models (as will be shown in more detail below), as F

$$ \text{Model F:} \ \mathbf{y} = \mathbf{f}(\mathbf{x};\boldsymbol{\beta}) + \boldsymbol{\epsilon}, \boldsymbol{\epsilon}\sim N(\mathbf{0}, \boldsymbol{\Omega}). $$
(1)

The model describes the relationship between AGB (y) and p field-measured variables at plot level, i.e. x=(1,x1,x2,...,xp). The dataset, used to estimate the model parameters, is denoted S.

AGB predictions for the NFI plots are denoted \(\widehat {\mathbf {y}}_{F_{Sa}}\), i.e. a vector of predicted AGB using the F model. Here, Sa denotes the 504 NFI field plots, for which LiDAR metrics also available. The dataset Sa was used to train the AGB-LiDAR model described in “Swedish NFI data and AGB-LiDAR regression model” section. The model is the second model in our modelling chain, and its general form is:

$$ \text{Model G:} \ \mathbf{y} = \mathbf{g}(\mathbf{z};\boldsymbol{\alpha}) + \boldsymbol{\upsilon}, \boldsymbol{\upsilon}\sim N(\mathbf{0}, \boldsymbol{\Sigma}). $$
(2)

Here, AGB (y) is regressed on q LiDAR metrics, z=(1,z1,z2,...,zq). However, instead of the unknown actual AGB (y) the predicted AGB (\(\widehat {\mathbf {y}}_{F_{Sa}}\)) is used in the analysis to estimate the model parameters in the G model (Eq. (2)).

The estimated parameters \(\widehat {\boldsymbol {\alpha }}_{Sa}\) are then used to predict AGB for each map unit using wall-to-wall LiDAR data for the target population U, i.e.

$$ \widehat{y}_{G_{i}} =g(\mathbf{z}_{i};\widehat{\boldsymbol{\alpha}}_{Sa}), $$
(3)

where \(\widehat {y}_{G_{i}}\) is the predicted AGB using the G model for the map unit i, and zi is a (q+1)-length vector of LiDAR metrics for the map unit.

Under the assumption that \(\widehat {y}_{G_{i}}\) is model-unbiased, the RMSE of the predicted AGB is (Cassel et al. 1977, Eq. 3.4, p. 94.)

$$ \text{RMSE}(\widehat{y}_{G_{i}}) = \sqrt{\widetilde{\mathbf{z}}_{i}\text{Cov}({\widehat{\boldsymbol{\alpha}}_{Sa}})\widetilde{\mathbf{z}}_{i}^{\intercal} + \mathrm{V}({\upsilon_{i}})}, $$
(4)

where \(\widetilde {\mathbf {z}}_{i}\) is a (q+1)-length vector of partial derivatives of \(g(\mathbf {z}_{i};\widehat {\boldsymbol {\alpha }}_{Sa})\) with respect to \(\widehat {\boldsymbol {\alpha }}_{Sa}\), and V(υi) is the variance of the random error υi. By replacing \(\text {Cov}({\widehat {\boldsymbol {\alpha }}_{Sa}})\) and V(υi) with the corresponding estimators, we obtain the RMSE estimator

$$ \widehat{\text{RMSE}}(\widehat{y}_{G_{i}}) = \sqrt{\widetilde{\mathbf{z}}_{i}\widehat{\text{Cov}}({\widehat{\boldsymbol{\alpha}}_{Sa}})\widetilde{\mathbf{z}}_{i}^{\intercal} + \widehat{\mathrm{V}}({\upsilon_{i}})}. $$
(5)

From Eq. (5) it can be seen that the RMSE estimator consists of two components, the estimated uncertainty due to the estimated model parameters \(\widetilde {\mathbf {z}}_{i}\widehat {\text {Cov}}({\widehat {\boldsymbol {\alpha }}_{Sa}})\widetilde {\mathbf {z}}_{i}^{\intercal }\) and the estimated uncertainty due to the individual random errors \(\widehat {\mathrm {V}}({\upsilon _{i}})\). We now focus further on the \(\widehat {\text {Cov}}({\widehat {\boldsymbol {\alpha }}_{Sa}})\) estimator.

Covariance matrix of estimated model parameters when applying two models in hierarchical order

The covariance matrix of estimated model parameters \(\widehat {\boldsymbol {\alpha }}_{Sa}\) is a core component model-based inference (e.g., McRoberts 2006; Ståhl et al. 2011, 2016, Saarela et al. 2015b). Since the \(\widehat {\boldsymbol {\alpha }}_{Sa}\) estimator is dependent on the AGB predictions \(\widehat {\mathbf {y}}_{F_{Sa}}, \text {Cov}({\widehat {\boldsymbol {\alpha }}_{Sa}})\) can be expressed conditionally on \(\widehat {\mathbf {y}}_{F_{Sa}}\) using the law of total covariance (e.g., Rudary 2009)

$$ \text{Cov}({\widehat{\boldsymbol{\alpha}}_{Sa}}) = \mathrm{E}[\text{Cov}(\widehat{\boldsymbol{\alpha}}_{Sa}|\widehat{\mathbf{y}}_{F_{Sa}})] + \text{Cov}(\mathrm{E}[\widehat{\boldsymbol{\alpha}}_{Sa}|\widehat{\mathbf{y}}_{F_{Sa}}]). $$
(6)

This is a new way of deriving HMB estimators, compared to Saarela et al. (2016) and Saarela et al. (2018), in which cases only linear models could be applied. The new approach, through conditional covariance, simplifies the theoretical procedure and allows use of nonlinear models within the HMB estimation framework. Details are given in Additional file 2.

It can be observed that the first term on the right-side part of (6) can be expressed as the model-based, generalized nonlinear least squares (GNLS) covariance of estimated model parameters (e.g. Davidson and MacKinnon 1993) conditionally on \(\widehat {\mathbf {y}}_{F_{Sa}}\), i.e.

$$ \mathrm{E}[\text{Cov}(\widehat{\boldsymbol{\alpha}}_{Sa}|{\widehat{\mathbf{y}}_{F_{Sa}}})] = (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}. $$
(7)

The \(\widetilde {\mathbf {Z}}_{Sa}\) is a matrix of partial derivatives of \(\mathbf {g}_{Sa}(\mathbf {z};\widehat {\boldsymbol {\alpha }}_{Sa})\) with respect to \(\widehat {\boldsymbol {\alpha }}_{Sa}\) based on dataset Sa; and ΣSa is the covariance matrix of individual errors (υSa) for the dataset Sa. In our study, the form of the matrix is diagonal; each element is estimated as shown in Table 2 for V(υi). This follows since NFI plots are located far from each other and thus the spatial autocorrelation is negligible (hence the off-diagonal elements of ΣSa are zero). As in the previous HMB study (Saarela et al. 2018), the uncertainty due to the estimated ΣSa was not accounted for, following common practice in this type of studies (e.g., Davidson and MacKinnon 1993; Melville et al. 2015).

The second term on the right-side of expression (6) is the core expression in HMB inference and shows how uncertainty due to the predicted AGB using the F model, \(\widehat {\mathbf {y}}_{F_{Sa}}\), is propagated through the estimation

$$ {\begin{aligned} \text{Cov}(\mathrm{E}[\widehat{\boldsymbol{\alpha}}_{Sa}|{\widehat{\mathbf{y}}_{F_{Sa}}}]) =& (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}\widetilde{\mathbf{Z}}_{Sa}^{\intercal}\boldsymbol{\Sigma}_{Sa}^{-1}\text{Cov}({\widehat{\mathbf{y}}_{F_{Sa}}}) \boldsymbol{\Sigma}_{Sa}^{-1}\widetilde{\mathbf{Z}}_{Sa}(\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}. \end{aligned}} $$
(8)

Thus, the covariance of the estimated model parameters \(\widehat {\boldsymbol {\alpha }}_{Sa}\) can be expressed as

$$\begin{array}{*{20}l} {\begin{aligned} \text{Cov}({\widehat{\boldsymbol{\alpha}}_{Sa}}) & = (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}\\ &+ (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}\widetilde{\mathbf{Z}}_{Sa}^{\intercal}\boldsymbol{\Sigma}_{Sa}^{-1}\text{Cov}({\widehat{\mathbf{y}}_{F_{Sa}}}) \boldsymbol{\Sigma}_{Sa}^{-1}\widetilde{\mathbf{Z}}_{Sa}(\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}. \end{aligned}} \end{array} $$
(9)

This is a general form of the covariance; if the model G is linear, then \(\widetilde {\mathbf {Z}}_{Sa}\) will become ZSa, i.e. a matrix of predictor variables with a column of units (e.g., Davidson and MacKinnon 1993).

By replacing \(\text {Cov}({\widehat {\mathbf {y}}_{F_{Sa}}})\) with its estimator, we obtain an approximately unbiased estimator of \(\text {Cov}({\widehat {\boldsymbol {\alpha }}_{Sa}})\), i.e.

$$ {\begin{aligned} \widehat{\text{Cov}}({\widehat{\boldsymbol{\alpha}}_{Sa}}) & = (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1} \\&+ (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}\widetilde{\mathbf{Z}}_{Sa}^{\intercal}\boldsymbol{\Sigma}_{Sa}^{-1}\widehat{\text{Cov}}({\widehat{\mathbf{y}}_{F_{Sa}}}) \boldsymbol{\Sigma}_{Sa}^{-1}\widetilde{\mathbf{Z}}_{Sa}(\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}, \end{aligned}} $$
(10)

where

$$ \widehat{\text{Cov}}({\widehat{\mathbf{y}}_{F_{Sa}}}) = \widetilde{\mathbf{X}}_{Sa}\widehat{\text{Cov}}({\widehat{\boldsymbol{\beta}}_{S}})\widetilde{\mathbf{X}}_{Sa}^{\intercal}, $$
(11)

with \(\widetilde {\mathbf {X}}_{Sa}\) being a matrix of partial derivatives of \(\mathbf {f}_{Sa}(\mathbf {x};\widehat {\boldsymbol {\beta }}_{S})\) with respect to \(\widehat {\boldsymbol {\beta }}_{S}\) based on the dataset Sa; evidently, if the model F is linear, then \(\widetilde {\mathbf {X}}_{Sa}\) will be the XSa matrix of predictor variables. The covariance matrix of estimated β was estimated using the GNLS estimator (e.g., Davidson and MacKinnon 1993)

$$ \widehat{\text{Cov}}({\widehat{\boldsymbol{\beta}}_{S}}) = (\widetilde{\mathbf{X}}_{S}^{\intercal}\boldsymbol{\Omega}_{S}^{-1}\widetilde{\mathbf{X}}_{S})^{-1}, $$
(12)

where ΩS is a covariance matrix of individual errors εS over the dataset S. The ΩS matrix is a diagonal matrix; each element is estimated using the model form presented in Table 5 for V(εk). This follows since Marklund (1987, 1988) collected tree-level data to avoid spatial autocorrelation. For similar reasons as in Eq. (10), we do not account for uncertainty due to estimating ΩS.

Detailed derivations of (6), (9) and (10) are presented in Additional file 2.

Aggregation of tree-level AGB predictions to plot level

So far, the plot level model F has been presented in a generic way. We now show how it was based on an aggregation of tree-level AGB predictions, and how this aggregation affects the uncertainty.

Individual tree-level predictions of AGB were summed to plot-level values [kg] and converted to tonnes per hectare values (Mg ·ha−1), i.e.

$$ \widehat{y}_{F_{i}} = 10 \times \sum_{k=1}^{t_{i}}\frac{1}{\text{RefArea}_{k_{i}}}\widehat{y}_{k_{i}}^{*}, $$
(13)

where \(\widehat {y}_{k_{i}}^{*}\) is the predicted tree-level AGB value, using one of the six allometric models (predictions based on a tree-level model are denoted with ) for the kth tree from the ith NFI plot; \(\frac {10}{\text {RefArea}_{k_{i}}}\) is a factor converting AGB expressed as kg per plot to Mg per hectare; “RefArea" is the NFI plot reference area for different values of DBH (Fridman et al. 2014); ti is the number of trees on the ith NFI plot. Through Eq. (13) the plot-level AGB predictions \(\widehat {\mathbf {y}}_{F_{Sa}} = \{\widehat {y}_{F_{1}}, \widehat {y}_{F_{2}},..., \widehat {y}_{F_{M}}\}\), over the Sa dataset were generated (504 NFI plots).

Since several models for individual tree AGB were developed and applied in the study, there was a need to distinguish between different models in the formulas. For this purpose we introduced a star notation with one (*) or two (**) stars attached to different variables, parameters and datasets to generically distinguish between them. The covariance matrix estimator of estimated β is then

$$ \begin{aligned} \widehat{\text{Cov}}({\widehat{\boldsymbol{\beta}}_{S^{*}}; \widehat{\boldsymbol{\beta}}_{S^{**}}}\!) &\,=\, (\widetilde{\mathbf{X}}_{S^{*}}^{\intercal}\boldsymbol{\Omega}_{S^{*}}^{-1}\widetilde{\mathbf{X}}_{S^{*}}\!)^{-1}\widetilde{\mathbf{X}}_{S^{*}}^{\intercal}\boldsymbol{\Omega}_{S^{*}}^{-1}\widehat{\text{Cov}}({\boldsymbol{\epsilon}_{S^{*}};\boldsymbol{\epsilon}_{S^{**}}}\!) \boldsymbol{\Omega}_{S^{**}}^{-1}\widetilde{\mathbf{X}}_{S^{**}}(\widetilde{\mathbf{X}}_{S^{**}}^{\intercal}\boldsymbol{\Omega}_{S^{**}}^{-1}\widetilde{\mathbf{X}}_{S^{**}}\!)^{-1}. \end{aligned} $$
(14)

If model () is the same as (), the right-side part in the expression Eq. (14) is reduced to \((\widetilde {\mathbf {X}}_{S^{*}}^{\intercal }\boldsymbol {\Omega }_{S^{*}}^{-1}\widetilde {\mathbf {X}}_{S^{*}})^{-1}\), which corresponds to (12). Since the model fitting was performed independently for each model, the covariance between model errors (\(\phantom {\dot {i}\!}\text {Cov}({\boldsymbol {\epsilon }_{S^{*}};\boldsymbol {\epsilon }_{S^{**}}})\)) from different models is zero, which simplifies the estimation procedure.

The estimated covariance matrix \(\phantom {\dot {i}\!}\widehat {\text {Cov}}({\widehat {\mathbf {y}}_{F_{Sa}}})\) of AGB predictions using the F model over the Sa dataset of NFI plots is a quadratic matrix. Its dimension is given by the number of NFI field plots (i.e., 504 ×504); each entity of the matrix was estimated as

$$ 100 \times \sum_{k=1}^{t_{i}}\sum_{l=1}^{t_{j}}\frac{1}{\text{RefArea}_{k_{i}}}\widetilde{\mathbf{x}}^{*}_{k_{i}}\widehat{\text{Cov}}({\widehat{\boldsymbol{\beta}}_{S^{*}}; \widehat{\boldsymbol{\beta}}_{S^{**}}})\widetilde{\mathbf{x}}_{l_{j}}^{{**}^{\intercal}}\frac{1}{\text{RefArea}_{l_{j}}}, $$
(15)

where, ti and tj is the number of trees in the ith and the jth NFI plots; \(\widetilde {\mathbf {x}}_{k_{i}}\) is a (p+1)-length vector of partial derivatives of the \(f(\mathbf {x}^{*}_{k_{i}}, \widehat {\boldsymbol {\beta }}_{S^{*}})\) model with respect to \(\widehat {\boldsymbol {\beta }}_{S^{*}}\) for the kth tree on the ith NFI plot; \(\widetilde {\mathbf {x}}_{l_{j}}^{**}\) is a (p+1)-length vector of partial derivatives of the \(f(\mathbf {x}^{**}_{l_{j}}, \widehat {\boldsymbol {\beta }}_{S^{**}})\) function with respect to \(\widehat {\boldsymbol {\beta }}_{S^{**}}\) for the lth tree on the jth NFI plot.

Summary of applied estimators

In summary, the target population mean was estimated as

$$ \widehat{\mu} =\frac{1}{N}\sum_{i=1}^{N}\widehat{y}_{G_{i}}, $$
(16)

where N is the number of map units.

The corresponding RMSE estimator was

$$ \widehat{\text{RMSE}}(\widehat{\mu}) \approx \frac{1}{N}\sqrt{\sum_{i=1}^{N}\sum_{j=1}^{N}\widetilde{\mathbf{z}}_{i}\widehat{\text{Cov}}({\widehat{\boldsymbol{\alpha}}_{Sa}})\widetilde{\mathbf{z}}_{j}^{\intercal}}. $$
(17)

In our study, the target population U was large, i.e. the target population mean \(\bar {y} = \frac {1}{N}\sum _{i=1}^{N}y_{i}\) is approximately equal to the superpopulation mean \(\bar {y} \approx \mu \) and, thus, instead of predicting the random population mean we estimated the fixed superpopulation mean (e.g., Ståhl et al. 2016). This implies that \(\text {MSE}(\widehat {\mu }) \approx \mathrm {V}({\widehat {\mu }})\).

The relative RMSE (RelRMSE) was estimated as

$$ \widehat{\text{RelRMSE}}(\widehat{\mu}) \approx 100\times \frac{\widehat{\text{RMSE}}(\widehat{\mu})}{\widehat{\mu}}. $$
(18)

And the relative contribution of allometric model ucertanity was estimated as

$$ {\begin{aligned} &\text{RelAllometryUncert}(\widehat{\mu})\,=\, 100\!\times\!\! \frac{\sum_{i=1}^{N}\sum_{j=1}^{N} \widetilde{\mathbf{z}}_{i}\left(\! (\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}\widetilde{\mathbf{Z}}_{Sa}^{\intercal}\boldsymbol{\Sigma}_{Sa}^{-1}\widehat{\text{Cov}}({\widehat{\mathbf{y}}_{F_{Sa}}})\boldsymbol{\Sigma}_{Sa}^{-1}\widetilde{\mathbf{Z}}_{Sa}(\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1} \right)\widetilde{\mathbf{z}}_{j}^{\intercal}}{ \sum_{i=1}^{N}\sum_{j=1}^{N} \widetilde{\mathbf{z}}_{i}\widehat{\text{Cov}}({\widehat{\boldsymbol{\alpha}}_{Sa}})\widetilde{\mathbf{z}}_{j}^{\intercal} }. \end{aligned}} $$
(19)

LiDAR data, available for the entire study area, were used to predict AGB for each map unit. Equation (5) makes it possible to estimate RMSE for every AGB prediction, and, thus, to create a map of estimated uncertainty. Additionally to these two maps, we produced a map of relative RMSE (RelRMSE)

$$ \widehat{\text{RelRMSE}}(\widehat{y}_{G_{i}}) = 100\times \frac{\widehat{RMSE}(\widehat{y}_{G_{i}})}{\widehat{y}_{G_{i}}}. $$
(20)

And a map showing the relative contribution of allometric model uncertainty to the total MSE (RelAllometryUncert), i.e.

$$ {\begin{aligned} &\text{RelAllometryUncert}(\widehat{y}_{G_{i}}) \,=\, 100 \!\times\! \frac{\widetilde{\mathbf{z}}_{i}\Big(\!(\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa})^{-1}\widetilde{\mathbf{Z}}_{Sa}^{\intercal}\boldsymbol{\Sigma}_{Sa}^{-1}\widehat{\text{Cov}}({\widehat{\mathbf{y}}_{F_{Sa}}})\boldsymbol{\Sigma}_{Sa}^{-1}\widetilde{\mathbf{Z}}_{Sa}(\widetilde{\mathbf{Z}}^{\intercal}_{Sa}\boldsymbol{\Sigma}^{-1}_{Sa}\widetilde{\mathbf{Z}}_{Sa}\!)^{-1} \!\Big)\widetilde{\mathbf{z}}_{i}^{\intercal}}{\widetilde{\mathbf{z}}_{i}\widehat{\text{Cov}}({\widehat{\boldsymbol{\alpha}}_{Sa}})\widetilde{\mathbf{z}}_{i}^{\intercal} + \widehat{\mathrm{V}}({\upsilon_{i}})}. \end{aligned}} $$
(21)

Results

The mean and total AGB in the study area were estimated through HMB estimation, thus combining tree level allometric modelling, aggregation of tree level predictions to NFI plot level, developing AGB-LiDAR models from the aggregated predictions, and applying the AGB-LiDAR models to all non-masked map units in the study area. The resulting estimates and uncertainty estimates are presented in Table 6. In the table the proportion of the total MSE that is due to the tree-level allometric model uncertainty is also shown. It can be observed that it accounts for a large portion, about 75%, of the total MSE.

Table 6 Estimated population mean and total, and corresponding uncertainties, and the allometry uncertainty contribution to overall MSE

As an add-on to these basic estimates for the study area, HMB also allows for mapping of AGB and the RMSE of the AGB estimates at the level of 18 m ×18 m map units. In Fig. 7, maps of predicted AGB, estimated RMSE, estimated RelRMSE, and the relative contribution of allometric modelling variance to the total MSE are shown for a small portion of the study area. The map products for the entire study area are available via the link: Supplementary Material.Footnote 1

Fig. 7
figure7

Map products

It can be noted that RelRMSE was mostly large at small predicted AGB values, which is normal due to the denominator being a small number. Exploring this relationship further, in Fig. 8 the RMSE is displayed vs predicted AGB (left), the relative RMSE vs predicted AGB (right), and the proportion of the total MSE due to the allometric modelling (below). The proportion of the total MSE due to the allometric modelling reached a minimum when the AGB predictions were about 55 tonnes per hectare, corresponding to the estimated population mean.

Fig. 8
figure8

Predicted AGB versus RMSE, relative RMSE and the allometry uncertainty contribution: the black solid vertical line marks the estimated population mean (55 Mg·ha−1)

Discussion

In this study we have demonstrated the correspondence between mapping and formal model-based estimation of aboveground forest biomass for a large study area in south-central Sweden. While a large number of RS-based studies focus on biomass mapping (e.g., Wallerman et al. 2010; Santoro et al. 2012; Nilsson et al. 2017), fewer studies address statistically sound estimation for large areas (e.g., McRoberts 2010; Ståhl et al. 2011; Saarela et al. 2015a; Gregoire et al. 2016), and even fewer studies the link between mapping and large-area estimation, although, e.g., Esteban et al. (2019) is an exception.

In our study we show how the integration of mapping and estimation can be further elaborated, through mapping not only the biomass as such but also of its uncertainty. Uncertainties from two modelling steps were accounted for, i.e. both the uncertainty due to AGB allometric models at tree level and due to the model linking plot-level biomass with LiDAR metrics, the AGB-LiDAR model. We showed that the relative uncertainty (RelRMSE) was largest at small biomass values, but decreased rather quickly and remained more or less constant at biomass predictions for biomass values of 75 Mg·ha−1. and greater. Compared to Esteban et al. (2019), who studied biomass change in Norway and Spain, our estimated uncertainties appear to be larger, most likely as a result of our inclusion of two modelling steps in the uncertainty estimation.

An interesting pattern was observed when the proportion of the total MSE due to the allometric model uncertainty was displayed versus predicted AGB (Fig. 8). The pattern was U-shaped, with large contributions from the allometric modelling at small and large biomass predictions, but with rather limited contribution around the average predicted value. The likely reason for this pattern is that regression models tend to be precise around mean observed values but less precise towards the extremes, with regard to the observations (Davidson and MacKinnon 1993). Thus, this pattern is likely due to the pattern of tree-level data from the tree biomass dataset in Marklund (1987, 1988) combined with the AGB-LiDAR model.

Another interesting pattern is visible in the scatterplots of Fig. 8. The point clouds appear to be forked with a large number of observations at the lower and upper extremes. A likely reason for this is the combined effect of species-specific models and the two allometric model types developed for each species. Obviously, the AGB allometric models using only DBH as a predictor variablewill result in larger prediction uncertainty than models based on both tree height and DBH.

The development of a theoretical framework for incorporating nonlinear models into HMB estimation was an important part of the study, which required a new approach for developing HMB estimators compared to previous studies Saarela et al. (2016, 2018). With the proposed decomposition of the covariance matrix estimator any nonlinear models could be applied in HMB, in case the objective is to estimate the variance of an estimator of the expected value of the superpopulation mean or total. However, in case the objective is to estimate the MSE of the actual superpopulation mean or total, or the MSE for predictions at the level of map units, only nonlinear models with additive error terms can be applied. Since our objective was to construct uncertainty maps at the level of individual map units a nonlinear model with an additive error term was selected.

The new framework makes it possible to trace uncertainties due to several modelling steps. In our case two modeling steps were included. However, the covariance decomposition approach makes it straightforward to derive variance estimators also for cases with three or more modelling steps. Thus, the new HMB framework would also allow for targeted improvement of AGB maps, since it can be evaluated what modelling step(s) would be most beneficial to improve from the point of view of obtaining small overall uncertainty.

Maps of stand characteristics, such as biomass and volume, constructed using LiDAR data, can be expected to be increasingly demanded for planning forestry operations in the future. Our results indicate that the type of models involved provides accurate results, in terms of RelRMSE, in old and middle-aged stands, i.e. stands with intermediate and large AGB values. However, in young forests (small biomass values) the uncertainties were large, suggesting that other types of data should be applied. In future applications it might be possible to bring additional modelling details into the HMB framework, making it possible to distinguish uncertainties not only between different estimated biomass levels, but also between different species and site conditions. This would require new types of data, such as tree species data derived from multispectral LiDAR (Axelsson et al. 2018). The uncertainty maps would then provide additional information to forest planners.

An interesting next step regarding uncertainty mapping would be to move from map units to stands. This would require aggregation of map units through segmentation (e.g., Olofsson and Holmgren 2014). Stand level uncertainty maps would also require information about the spatial autocorrelation of model errors within stands, which would be a challenge. However, several approaches for this type of small area estimation are available (e.g., Breidenbach and Astrup 2012; Magnussen et al. 2014) and should be evaluated for application in the HMB framework.

A final remark concerns the difference between hierarchical model-based inference and similar techniques such as conventional model-based inference (e.g., McRoberts 2006), hybrid inference (e.g., Ståhl et al. 2011), and design-based inference through model-assisted estimation (e.g., Gregoire et al. 2011). A large number of studies performed during the last decades focus on assessment of biomass and carbon, and related uncertainties, applying techniques that may appear similar to the technique applied in this study. Examples include Petersson et al. (2017); McRoberts and Westfall (2016) and McRoberts et al. (2016). In the latter study, Monte-Carlo simulation is applied to assess the overall uncertainty in growing stock volume estimation schemes involving several modelling steps.

Conclusions

In this study a new approach to developing HMB estimators was derived, making it possible to apply nonlinear models, as well as combinations of linear and nonlinear models, in the HMB estimation framework. The estimators were applied to a study area in south-central Sweden, and it was shown that the relative uncertainties in the biomass predictions were largest when the predicted biomass was small. Further it was demonstrated how biomass mapping, uncertainty mapping, and estimation of the mean biomass across a large area could be combined through HMB inference. Uncertainty maps of the kind presented in this study allow for judgments about whether or not the forest attribute map is accurate enough for the intended decision making.

Availability of data and materials

The data are available upon a reasonable request to the Authors.

Notes

  1. 1.

    It is a ‘tif’ file, which can be opened in any GIS software or in R Statistical Environment.

References

  1. Andersen, H-E, McGaughey RJ, Reutebuch SE (2005) Estimating forest canopy fuel parameters using LIDAR data. Remote Sens Environ 94(4):441–449.

    Google Scholar 

  2. Axelsson, A, Lindberg E, Olsson H (2018) Exploring multispectral ALS data for tree species classification. Remote Sens 10(2):183.

    Google Scholar 

  3. Bellassen, V, Luyssaert S (2014) Carbon sequestration: Managing forests in uncertain times. Nat News 506(7487):153.

    Google Scholar 

  4. Breidenbach, J, Astrup R (2012) Small area estimation of forest attributes in the Norwegian National Forest Inventory. Eur J For Res 131(4):1255–1267.

    Google Scholar 

  5. Cassel, C-M, Särndal CE, Wretman JH (1977) Foundations of inference in survey sampling. Wiley. https://doi.org/10.2307/2287794.

  6. Davidson, R, MacKinnon JG (1993) Estimation and inference in econometrics. Oxford University Press.

  7. Dubayah, R, Goetz S, Blair JB, Fatoyinbo T, Hansen M, Healey SP, Hofton M, Hurtt G, Kellner J, Luthcke S, Swatantran A (2014) The global ecosystem dynamics investigation. AGU Fall Meeting Abstracts.

  8. Esteban, J, McRoberts RE, Fernández-Landa A, Tomé JL, Næsset E (2019) Estimating forest volume and biomass and their changes using random forests and remotely sensed data. Remote Sens 11(16):1944.

    Google Scholar 

  9. Forest Europe (2015) State of Europe’s forests 2015. Status and trends in sustainable forest management in Europe.

  10. Franco-Lopez, H, Ek AR, Bauer ME (2001) Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Remote Sens Environ 77(3):251–274.

    Google Scholar 

  11. Fridman, J, Holm S, Nilsson M, Nilsson P, Ringvall A, Ståhl G (2014) Adapting National Forest Inventories to changing requirements — the case of the Swedish National Forest Inventory at the turn of the 20th century. Silv Fenn 48:1–29.

    Google Scholar 

  12. Gobakken, T, Næsset E, Nelson R, Bollandsås OM, Gregoire TG, Ståhl G, Holm S, Ørka HO, Astrup R (2012) Estimating biomass in Hedmark County, Norway using national forest inventory field plots and airborne laser scanning. Remote Sens Environ 123(0):443–456.

    Google Scholar 

  13. Grafström, A, Schnell S, Saarela S, Hubbell S, Condit R (2017a) The continuous population approach to forest inventories and use of information in the design. Environmetrics 28(8). https://doi.org/10.1002/env.2480.

  14. Grafström, A, Zhao X, Nylander M, Petersson H (2017b) A new sampling strategy for forest inventories applied to the temporary clusters of the Swedish national forest inventory. Can J For Res 47(9):1161–1167.

  15. Gregoire, TG (1998) Design-based and model-based inference in survey sampling: appreciating the difference. Can J For Res 28(10):1429–1447.

    Google Scholar 

  16. Gregoire, TG, Næsset E, McRoberts RE, Ståhl G, Andersen H-E, Gobakken T Ene, Nelson R (2016) Statistical rigor in LiDAR-assisted estimation of aboveground forest biomass. Remote Sens Environ 173:98–108.

  17. Gregoire, TG, Ståhl G, Næsset E, Gobakken T, Nelson R, Holm S (2011) Model-assisted estimation of biomass in a LiDAR sample survey in Hedmark County, Norway. Can J For Res 41(1):83–95.

    Google Scholar 

  18. Gregoire, TG, Valentine HT (2008) Sampling strategies for natural resources and the environment, Vol. 1. CRC Press. https://doi.org/10.1201/9780203498880.

  19. Haakana, H, Heikkinen J, Katila M, Kangas A (2019) Efficiency of post-stratification for a large-scale forest inventory—case Finnish NFI. Ann For Sci 76(1):9.

    Google Scholar 

  20. Hansen, MC, DeFries RS, Townshend JR, Carroll M, DiMiceli C, Sohlberg RA (2003) Global percent tree cover at a spatial resolution of 500 meters: First results of the MODIS vegetation continuous fields algorithm. Earth Interac 7(10):1–15.

    Google Scholar 

  21. Hudak, AT, Crookston NL, Evans JS, Hall DE, Falkowski MJ (2008) Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data. Remote Sens Environ 112(5):2232–2245.

    Google Scholar 

  22. Katila, M, Tomppo E (2002) Stratification by ancillary data in multisource forest inventories employing k-nearest-neighbour estimation. Can J For Res 32(9):1548–1561.

    Google Scholar 

  23. Ku, HH (1966) Notes on the use of propagation of error formulas. J Res Natl Bur Stand 70(4):263–273.

    Google Scholar 

  24. Laasasenaho, J (1982) Taper curve and volume functions for pine, spruce and birch [Pinus sylvestris, Picea abies, Betula pendula, Betula pubescens]. The Finnish Forest Research Institute.

  25. Lantmäteriet (2019) Lantmaterieẗ. https://www.lantmateriet.se/sv/. Accessed 15 Feb 2019.

  26. Lindberg, E, Olofsson K, Holmgren J, Olsson H (2012) Estimation of 3D vegetation structure from waveform and discrete return airborne laser scanning data. Remote Sens Environ 118:151–161.

    Google Scholar 

  27. Magnussen, S (2015) Arguments for a model-dependent inference?For Int J For Res 88(3):317–325.

    Google Scholar 

  28. Magnussen, S, Mandallaz D, Breidenbach J, Lanz A, Ginzler C (2014) National forest inventories in the service of small area estimation of stem volume. Can J For Res 44(9):1079–1090.

    Google Scholar 

  29. Marklund, LG (1987) Biomass functions for Norway spruce (Picea abies (L.) Karst.) in Sweden, Vol. 43 of Report. Swedish University of Agricultural Sciences, Department of Forest Survey.

  30. Marklund, LG (1988) Biomassafunktioner för tall, gran och björk i Sverige, Vol. Rapport of 45. Sveriges lantbruksuniversitet, Institutionen för skogstaxering.

  31. McGaughey, R (2012) FUSION/LDV: Software for LIDAR Data Analysis and Visualization, Version 3.01. http://forsys.cfr.washington.edu/fusion/fusionlatest.html. Accessed 24 Aug 2017.

  32. McRoberts, RE (2006) A model-based approach to estimating forest area. Remote Sens Environ 103(1):56–66.

    Google Scholar 

  33. McRoberts, RE (2010) Probability- and model-based approaches to inference for proportion forest using satellite imagery as ancillary data. Remote Sens Environ 114(5):1017–1025.

    Google Scholar 

  34. McRoberts, RE, Chen Q, Domke GM, Ståhl G, Saarela S, Westfall JA (2016) Hybrid estimators for mean aboveground carbon per unit area. Forest Ecol Manag 378:44–56.

    Google Scholar 

  35. McRoberts, RE, Næsset E, Gobakken T, Chirici G, Condés S, Hou Z, Saarela S, Chen Q, Ståhl G, Walters BF (2018) Assessing components of the model-based mean square error estimator for remote sensing assisted forest applications. Can J For Res 48(6):642–649.

    Google Scholar 

  36. McRoberts, RE, Tomppo E, Schadauer K, Vidal C, Ståhl G, Chirici G, Lanz A, Cienciala E, Winter S, Smith WB (2009) Harmonizing national forest inventories. J For 107(4):179–187.

    Google Scholar 

  37. McRoberts, RE, Wendt DG, Nelson MD, Hansen MH (2002) Using a land cover classification based on satellite imagery to improve the precision of forest inventory area estimates. Remote Sens Environ 81(1):36–44.

    Google Scholar 

  38. McRoberts, RE, Westfall JA (2016) Propagating uncertainty through individual tree volume model predictions to large-area volume estimates. Ann For Sci 73(3):625–633.

    Google Scholar 

  39. Melville, G, Welsh A, Stone C (2015) Improving the efficiency and precision of tree counts in pine plantations using airborne LiDAR data and flexible-radius plots: model-based and design-based approaches. J Agric Biol Environ Stat 20(2):229–257.

    Google Scholar 

  40. Næsset, E (2002) Predicting forest stand characteristics with airborne scanning laser using a practical two-stage procedure and field data. Remote Sens Environ 80(1):88–99.

    Google Scholar 

  41. Nelson, R, Krabill W, Tonelli J (1988) Estimating forest biomass and volume using airborne laser data. Remote Sens Environ 24(2):247–267.

    Google Scholar 

  42. Nilsson, M, Folving S, Kennedy P, Puumalainen J, Chirici G, Corona P, Marchetti M, Olsson H, Ricotta C, Ringvall A, Ståhl G, Tomppo E (2003) Combining remote sensing and field data for deriving unbiased estimates of forest parameters over large regions. In: Corona P, Kohl M, Marchetti M (eds)Advances in forest inventory for sustainable forest management and biodiversity monitoring, 19–32.. Springer.

  43. Nilsson, M, Nordkvist K, Jonzén J, Lindgren N, Axensten P, Wallerman J, Egberth M, Larsson S, Nilsson L, Eriksson J, Olsson H (2017) A nationwide forest attribute map of Sweden predicted using airborne laser scanning data and field data from the National Forest Inventory. Remote Sens Environ 194:447–454.

    Google Scholar 

  44. Olofsson, K, Holmgren J (2014) Forest stand delineation from lidar point-clouds using local maxima of the crown height model and region merging of the corresponding Voronoi cells. Remote Sens Lett 5(3):268–276.

    Google Scholar 

  45. Patterson, PL, Healey SP, Ståhl G, Saarela S, Holm S, Andersen H-E, Dubayah RO, Duncanson L, Hancock S, Armston J, Kellner JR, Cohen WB, Yang Z (2019) Statistical properties of hybrid estimators proposed for GEDI—NASA’s Global Ecosystem Dynamics Investigation. Environ Res Lett 14(6):065007.

    Google Scholar 

  46. Petersson, H, Breidenbach J, Ellison D, Holm S, Muszta A, Lundblad M, Ståhl GR (2017) Assessing uncertainty: sample size trade-offs in the development and application of carbon stock models. For Sci 63(4):402–412.

    Google Scholar 

  47. Pinheiro, J, Bates D, DebRoy S, Sarkar D (2016) R Core Team nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-127. http://CRAN.R-project.org/package=nlme/.

  48. Qi, W, Dubayah RO (2016) Combining Tandem-X InSAR and simulated GEDI lidar observations for forest structure mapping. Remote Sens Environ 187:253–266.

    Google Scholar 

  49. Qi, W, Saarela S, Armston J, Ståhl G, Dubayah RO (2019) Forest biomass estimation over three distinct forest types using TanDEM-X InSAR data and simulated GEDI lidar data. Remote Sens Environ 232:111,283.

    Google Scholar 

  50. Repola, J (2008) Biomass equations for birch in finland. Silv Fenn 42(4):605–624.

    Google Scholar 

  51. Repola, J (2009) Biomass equations for scots pine and norway spruce in finland. Silv Fenn 43(4):625–647.

    Google Scholar 

  52. Rudary, MR (2009) On Predictive Linear Gaussian Models. PhD thesis. Ann Arbor, MI, USA.

    Google Scholar 

  53. Saarela, S, Grafström A, Ståhl G, Kangas A, Holopainen M, Tuominen S, Nordkvist K, Hyyppä J (2015a) Model-assisted estimation of growing stock volume using different combinations of LiDAR and Landsat data as auxiliary information. Remote Sens Environ 158:431–440.

  54. Saarela, S, Holm S, Grafström A, Schnell S, Næsset E, Gregoire TG, Nelson RF, Ståhl G (2016) Hierarchical model-based inference for forest inventory utilizing three sources of information. Ann For Sci 73(4):895–910.

    Google Scholar 

  55. Saarela, S, Holm S, Healey SP, Andersen H-E, Petersson H, Prentius W, Patterson PL, Næsset E, Gregoire TG, Ståhl G (2018) Generalized Hierarchical model-based estimation for aboveground biomass assessment using GEDI and Landsat data. Remote Sens 10(11):1832.

    Google Scholar 

  56. Saarela, S, Schnell S, Grafström A, Tuominen S, Nordkvist K, Hyyppä J, Kangas A, Ståhl G (2015b) Effects of sample size and model form on the accuracy of model-based estimators of growing stock volume. Can J For Res 45:1524–1534.

  57. Saatchi, SS, Harris NL, Brown S, Lefsky M, Mitchard ET, Salas W, Zutta BR, Buermann W, Lewis SL, Hagen S, Petrova S, White L, Silman M, Morel A (2011) Benchmark map of forest carbon stocks in tropical regions across three continents. PNAS 108(24):9899–9904.

    CAS  PubMed  Google Scholar 

  58. Santoro, M, Pantze A, Fransson JE, Dahlgren J, Persson A (2012) Nation-wide clear-cut mapping in Sweden using ALOS PALSAR strip images. Remote Sens 4(6):1693–1715.

    Google Scholar 

  59. Särndal, CE, Thomsen I, Hoem JM, Lindley DV, Barndorff-Nielsen O, Dalenius T (1978) Design-based and model-based inference in survey sampling [with discussion and reply]. Scand J Stat:27–52.

  60. Ståhl, G, Holm S, Gregoire TG, Gobakken T, Næsset E, Nelson R (2011) Model-based inference for biomass estimation in a LiDAR sample survey in Hedmark County, Norway. Can J For Res 41(1):96–107.

    Google Scholar 

  61. Ståhl, G, Saarela S, Schnell S, Holm S, Breidenbach J, Healey SP, Patterson PL, Magnussen S, Næsset E, McRoberts RE, Gregoire TG (2016) Use of models in large-area forest surveys: comparing model-assisted, model-based and hybrid estimation. Forest Ecosyst 3(5):1–11.

    Google Scholar 

  62. Tomppo, E, Gschwantner T, Lawrence M, McRoberts R, Gabler K, Schadauer K, Vidal C, Lanz A, Ståhl G, Cienciala E, Chirici G, Winter S, Bastrup-Birk A, Tomter S, Kandler G, McCormick M (2010) National Forest Inventories. Pathways for Common Reporting. Springer.

  63. Tomppo, E, Haakana M, Katila M, Peräsaari J (2008a) Multi-source National Forest Inventory: Methods and Applications, Vol. 18. Springer, New York.

  64. Tomppo, E, Olsson H, Ståhl G, Nilsson M, Hagner O, Katila M (2008b) Combining national forest inventory field plots and remote sensing data for forest databases. Remote Sens Environ 112(5):1982–1999.

  65. Wallerman, J, Fransson JE, Bohlin J, Reese H, Olsson H (2010) Forest mapping using 3D data from SPOT-5 HRS and Z/I DMC 2010. IEEE International Geoscience and Remote Sensing Symposium. IEEE.

  66. Wulder, M, White J, Fournier R, Luther J, Magnussen S (2008) Spatially explicit large area biomass estimation: three approaches using forest inventory and remotely sensed imagery in a GIS. Sensors 8(1):529–560.

    PubMed  Google Scholar 

  67. Zald, HS, Wulder MA, White JC, Hilker T, Hermosilla T, Hobart GW, Coops NC (2016) Integrating Landsat pixel composites and change metrics with lidar plots to predictively map forest structure and aboveground biomass in Saskatchewan, Canada. Remote Sens Environ 176:188–201.

    Google Scholar 

Download references

Acknowledgements

The authors are thankful to the two anonymous Reviewers, whose comments helped to improve the clarity of the article.

Funding

Funding was provided by the Swedish NFI Development Foundation and the Swedish Kempe Foundation (SMK-1847). The computations were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC) at the High Performance Computing Center North (HPC2N).

Author information

Affiliations

Authors

Contributions

Conceptualization, Svetlana Saarela;

Data curation, Svetlana Saarela, André Wästlund, Alex Appiah Mensah, Mats Nilsson and Jonas Fridman;

Funding acquisition, Svetlana Saarela;

Investigation, Svetlana Saarela, André Wästlund, Emma Holmström, Alex Appiah Mensah, Sören Holm, Mats Nilsson, Jonas Fridman and Göran Ståhl;

Methodology, Svetlana Saarela, Sören Holm and Göran Ståhl;

Project administration, Svetlana Saarela;

Software, Svetlana Saarela, André Wästlund, Alex Appiah Mensah and Mats Nilsson;

Writing – original draft, Svetlana Saarela;

Writing – review & editing, Svetlana Saarela, André Wästlund, Emma Holmström, Alex Appiah Mensah, Sören Holm, Mats Nilsson, Jonas Fridman and Göran Ståhl. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Svetlana Saarela.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Supplementary information

Additional file 1

Supplementary material.

Additional file 2

Appendix.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Saarela, S., Wästlund, A., Holmström, E. et al. Mapping aboveground biomass and its prediction uncertainty using LiDAR and field data, accounting for tree-level allometric and LiDAR model errors. For. Ecosyst. 7, 43 (2020). https://doi.org/10.1186/s40663-020-00245-0

Download citation

Keywords

  • Aboveground biomass assessment
  • Forest mapping
  • Gauss-Newton Regression
  • Hierarchical Model-Based inference
  • LiDAR maps
  • National Forest Inventory
  • Uncertainty estimation
  • Uncertainty map