### GOMS model and inversion strategy

The GOMS model was constructed based on the Li-Strahler geometric-optical model (Li and Strahler 1992), which assumes that the reflectance of a pixel can be modeled as a sum of the reflectance of its individual scene components weighted by their respective areas within the pixel (Li and Strahler 1985) and that the vegetation canopy bidirectional reflectance distribution function (BRDF) characteristics at the pixel scale can be explained by the geometric-optical principle. The sensors receive the ground reflection and the crown reflection in the field of view A (“A” is the assumption that the area of the field of view is A).

Considering the 3-D forest canopy structural parameters, the influence of sky light, and multiple scattering, the received signal of A can be defined as a combination of the four area-weighted components:

$$ S={K}_{\mathrm{g}}G+{K}_{\mathrm{c}}C+{K}_{\mathrm{z}}Z+{K}_{\mathrm{t}}T $$

(1)

where *S* refers to bidirectional reflectance factor (BRF); *K*_{g}, *K*_{c}, *K*_{z}, and *K*_{t} are the proportions of sunlit background, sunlit crown, shaded background, and shaded crown, respectively; and *G*, *C*, *Z* and *T* are the contributions of the sunlit background, sunlit crown, shaded background, and shaded crown, respectively (Li and Strahler 1986).

Assuming that the tree crown shape is ellipsoidal (Fig. 3a), *K*_{g}, *K*_{c}, *K*_{z} and *K*_{t} can be expressed by a combination of the forest canopy structural parameters such as *R*, *b*, *h* and *n* (the number of crowns per unit area).

In the GOMS model, the ellipsoid model is simplified into a spheres model (Fig. 3b); then, *K*_{g}, *K*_{c}, *K*_{z} and *K*_{t} can be expressed as:

$$ {K}_{\mathrm{g}}=\exp \left(-n\times \left[{\tau}_{\mathrm{i}}+{\tau}_{\mathrm{v}}-O\left({\theta}_{\mathrm{i}},{\theta}_{\mathrm{v}},{\varnothing}_{\mathrm{i}}-{\varnothing}_{\mathrm{v}}\right)\right]\right) $$

(2)

where

$$ {\tau}_{\mathrm{i}}={\uppi R}^2/\cos {\theta}_{\mathrm{i}} $$

(3)

$$ {\tau}_{\mathrm{v}}={\uppi R}^2/\cos {\theta}_{\mathrm{v}} $$

(4)

and *O*(*θ*_{i}, *θ*_{v}, ∅_{i} − ∅_{v}) is the shaded area in Fig. 3b. ∅_{i} and ∅_{v} are the solar azimuth and view azimuth, respectively, and *θ*_{i} and *θ*_{v} are the revised solar zenith angle and view zenith angle, respectively:

$$ {\theta}_{\mathrm{i}}={\tan}^{-1}\left(\left(b/R\right)\tan {\theta_{\mathrm{i}}}^{\prime}\right) $$

(5)

$$ {\theta}_{\mathrm{v}}={\tan}^{-1}\left(\left(b/R\right)\tan {\theta_{\mathrm{v}}}^{\prime}\right) $$

(6)

where *θ*_{i}^{′} and *θ*_{v}^{′} are solar zenith angle and view zenith angle, respectively.

$$ {K}_{\mathrm{c}}=1-\exp \left(-n\times \left[\frac{1}{2}\left(1+\left\langle \overrightarrow{\mathrm{i}},\overrightarrow{\mathrm{v}}\right\rangle \right){\tau}_{\mathrm{v}}\right]\right) $$

(7)

$$ {K}_{\mathrm{t}}=\exp \left(-n\times \left[\frac{1}{2}\left(1+\left\langle \overrightarrow{\mathrm{i}},\overrightarrow{\mathrm{v}}\right\rangle \right){\tau}_{\mathrm{v}}\right]\right)-\exp \left(-n\times {\tau}_{\mathrm{v}}\right) $$

(8)

$$ {K}_{\mathrm{z}}=1-{K}_{\mathrm{g}}-{K}_{\mathrm{c}}-{K}_{\mathrm{t}} $$

(9)

Then, the GOMS model can be expressed by the function below:

$$ S=\mathrm{f}\left({\theta}_{\mathrm{i}},{\varnothing}_{\mathrm{i}},{\theta}_{\mathrm{v}},{\varnothing}_{\mathrm{v}},{\theta}_{\mathrm{s}},{\varnothing}_{\mathrm{s}},{nR}^2,b/R,h/b,\Delta h/b,G,C,Z,T\right) $$

(10)

where *nR*^{2} represents the crown coverage condition per unit area in the nadir observation, *b/R* affects the crown coverage density in the non-nadir direction; *h*/*b* affects the outward width of the hot spot; and ∆*h*/*b* describes the discrete degree of the crown height distribution and affects the bowl-shape of the BRDF (∆*h* is the variance of the *h* distribution in one pixel) (Li et al. 2015). *θ*_{s} and ∅_{s} are the local slope and aspect, respectively. *θ*_{i}, ∅_{i}, *θ*_{v} and ∅_{v} are the solar zenith angle, solar azimuth, view zenith angle, and view azimuth, respectively (Fu et al. 2011; Ma et al. 2014). In this study, we assume that the reflected intensities of the shadow on the ground and on the canopy are the same (i.e., *Z* equals *T*). Thus, the model is simplified with three area-weighting components (*G*, *C* and *Z*).

The multi-stage, sample-direction dependent, target-decisions (MSDT) inversion method (Li et al. 1997) was adopted to segment invert the observation data and the parameters in the GOMS model. In this method, the most sensitive observation data were used to invert the most sensitive parameters; then, the previous inversion results were used as the prior knowledge in the next parameter inversion stage. The parameter inversion order is based on the uncertainty and sensitivity matrix (USM), which presents the sensitivity of the parameters to the observational data in different viewing directions. The USM function can be expressed as

$$ \mathrm{USM}\left(\mathrm{p},\mathrm{q}\right)=\frac{\Delta \mathrm{BRF}\left(\mathrm{p},\mathrm{q}\right)}{{\mathrm{BRF}}_{\mathrm{exp}}\left(\mathrm{p}\right)} $$

(11)

where ∆BRF(p, q) is the maximum difference of BRF calculated by the model when only parameter q changes in its uncertainty and other parameters remain fixed, and BRF_{exp}(p) is the BRF calculated by the model at the p_{th} geometry of illumination and viewed with all parameters at their expected values. Based on our previous study (Fu et al. 2011), the inversion order of all the parameters in the GOMS model is RC- > RG- > RZ and NIRC- > (*b*/*R*, NIRZ, ∆*h*/*b*)- > NIRG- *nR*^{2}. RC-RG-RZ refers to the BRF information of sunlit crown, sunlit background, and shaded area in the red band, and the NIRC-NIRG-NIRZ refers to the BRF information of the sunlit crown, sunlit background, and shaded area in the near-infrared (NIR) band. Then, the parameters in both the NIR and red bands were used to calculate *h*/*b*. From the inversion order results, *R* (*R* = CD/2) was not a very sensitive parameter in the GOMS model; thus, using the CD provided by other data sources as prior knowledge in the GOMS model inversion procedure to calculate tree height would not cause substantial error.

### Semi-variance model

The semi-variance model is a tool to build the relationship between the underlying scene and the image spatial properties and the image spatial properties can be measured by calculating the spatial variation of a spatial random variable. In a remote sensing image, each digital number (DN) is linked to a unique location on the ground and can be considered the realization of a spatial random function: DN_{i} = f(*x*_{i}), where DN_{i} is the digital number for the *i*_{th} pixel, *x*_{i} is the geographic location vector for the *i*_{th} pixel, and f is the random spatial function. The DN_{s} of a remotely sensed image can be treated as a spatial random variable. Therefore, the image spatial properties can be estimated by calculating the spatial variation in DN.

A semivariogram (Fig. 4) is a plot of semi-variance against the lag that separates the points used to estimate the semi-variance and can be used to study the spatial properties of the underlying scene (Song 2007).

A semivariogram contains three parameters: the sill, the range and the nugget effect. The sill is the maximum value of semi-variance that presents the total variance of the scene, and it can be calculated by the semi-variance model. The range is the distance at which the semi-variance reaches the sill value, which reflects the scale characteristics of the scene. When the distance between points in space is equal to or greater than the range, these points can be considered to be independent of each other. The nugget effect is the semi-variance at lag zero.

The semi-variance model is defined as follows:

$$ {\gamma}_{\mathrm{f}}(h)=\frac{1}{2}\mathrm{E}\left\{{\left(\mathrm{f}(x)-\mathrm{f}\left(x+h\right)\right)}^2\right\} $$

(12)

where *γ*_{f}(*h*) is the semi-variance for points with lag *h* in space, f(*x*) is the realization of a spatial random function at location *x*, f(*x* + *h*) is the realization of the same function at another point with lag *h* from *x*, and E(.) denotes the mathematical expectation (Song et al. 2010).

Based on the semi-variance model and the theory of Jupp et al. (1988, 1989), the disc scene model was developed, which simplifies the representation of a forest scene. The model assumes a scene that is composed of discs, and the brightness value of a disc does not change in overlapped areas. The model is constructed from the relationship between the scene structure and the spatial characteristics of image DNs. Based on the disc scene model, Song et al. (Song 2007; Song et al. 2002; Song and Woodcock 2003) developed a model that relates the ratio of the sill at two spatial resolutions to the diameter of the object as follows:

$$ \frac{C_{\mathrm{z}1}}{C_{\mathrm{z}2}}=\frac{\int_0^1t\mathrm{T}(t)\left({\mathrm{e}}^{\lambda {A}_{\mathrm{c}}\mathrm{T}\left(\frac{t{D}_{\mathrm{p}1}}{D_0}-1\right)}\right)\mathrm{d}t}{\int_0^1t\mathrm{T}(t)\left({\mathrm{e}}^{\lambda {A}_{\mathrm{c}}\mathrm{T}\left(\frac{t{D}_{\mathrm{p}2}}{D_0}-1\right)}\right)\mathrm{d}t} $$

(13)

where *D*_{p1} and *D*_{p2} are the pixel sizes of the two spatial resolutions; *D*_{0} is the diameter of the object (forest CD); and *C*_{z1} and *C*_{z2} are the sills of the regularized semivariograms at spatial resolutions *D*_{p1} and *D*_{p2}, respectively. *γ*_{z1z2} is used to denote the ratio (*C*_{z1}/*C*_{z2}) described in the latter part of the paper (e.g., *γ*_{12} denotes the ratio of the image semi-variance at a spatial resolution of 1 m to that at 2 m).

‘*A*’ represents the object area:

$$ A=\frac{\uppi {D_0}^2}{4} $$

(14)

T(*t*) represents the overlap function for the objects in the scene:

$$ T(s)=\left\{\begin{array}{c}1\\ {}0\end{array}\frac{1}{\uppi}\right.\left(t-\sin \right.{\displaystyle \begin{array}{c}h=0\\ {}\left.(t)\right)\\ {}h\ge {D}_0\end{array}}\kern0.5em h<{D}_0 $$

(15)

where

$$ s=\frac{h}{D_0} $$

(16)

$$ \cos \left(\frac{t}{2}\right)=s $$

(17)

In Eq. (13), the ratio of the sill of the regularized variogram of two different spatial resolutions would be solely determined by the scene structure, which is independent of the brightness value of the pixels. Therefore, the ratio of image variances can be used to estimate the tree crown size across sensors and sites.

### Flowchart of the methods

Figure 5 shows a flowchart of our method, which consists mainly of three parts: the first for the CD calculation process based on the semi-variance model, the second for the tree height estimation process using the CD results from part 1 along with the inversion results obtained from the GOMS model, and the third for the tree height accuracy validation process.

In the CD calculation process, we applied the CD estimation process of Song et al. (Song 2007; Song et al. 2002; Song and Woodcock 2003) to the Dayekou forest site using the regularized semi-variance model and high spatial resolution CCD imagery. The optimal fitting function between the sill and the field-measured CD was constructed based on the 16 super sample plots. We first cut the 16 sample plots out of the CCD image employing binarization, then resampled the binary results to different spatial resolutions (1, 2, … 6 m), and finally calculated the sill ratio value of the 16 images at a different spatial resolution. Second, we built the function between the field-measured CD and the sill ratio value and selected the best fitting relationship as the optimal fitting function. Using the supervised classification results for the SPOT-5 image, the method was applied first to the experimental small plot and then to the whole image. We also used the CD derived from the CHM data to analyze the accuracy of the CD data calculated based on the CCD image.

Canopy structural parameters could be inverted by the GOMS model, and in combination with the CD results described above, tree height can be estimated. Finally, we used the revised CHM data derived from LiDAR to validate the tree height accuracy calculated by the GOMS model.