Influence of voxel size on forest canopy height estimates using full-waveform airborne LiDAR data

Forest canopy height is a key forest structure parameter. Precisely estimating forest canopy height is vital to improve forest management and ecological modelling. Compared with discrete-return LiDAR (Light Detection and Ranging), small-footprint full-waveform airborne LiDAR (FWL) techniques have the capability to acquire precise forest structural information. This research mainly focused on the influence of voxel size on forest canopy height estimates. A range of voxel sizes (from 10.0 m to 40.0 m interval of 2 m) were tested to obtain estimation accuracies of forest canopy height with different voxel sizes. In this study, all the waveforms within a voxel size were aggregated into a voxel-based LiDAR waveform, and a range of waveform metrics were calculated using the voxel-based LiDAR waveforms. Then, we established estimation model of forest canopy height using the voxel-based waveform metrics through Random Forest (RF) regression method. The results showed the voxel-based method could reliably estimate forest canopy height using FWL data. In addition, the voxel sizes had an important influence on the estimation accuracies (R2 ranged from 0.625 to 0.832) of forest canopy height. However, the R2 values did not monotonically increase or decrease with the increase of voxel size in this study. The best estimation accuracy produced when the voxel size was 18 m (R2 = 0.832, RMSE = 2.57 m, RMSE% = 20.6%). Compared with the lowest estimation accuracy, the R2 value had a significant improvement (33.1%) when using the optimal voxel size. Finally, through the optimal voxel size, we produced the forest canopy height distribution map for this study area using RF regression model. Our findings demonstrate that the optimal voxel size need to be determined for improving estimation accuracy of forest parameter using small-footprint FWL data.


Background
Forest ecosystems are key carbon sinks and can affect the climate change and global carbon cycle (Tsui et al. 2013;Zhang et al. 2019). One of the important forest structure parameters is forest canopy height, which is an essential input parameter of ecosystems models and forest management (Popescu and Zhao 2008;Wang et al. 2011). Forest canopy height is highly correlated with forest above-ground biomass and can be used to quantify the terrestrial carbon cycle (Balzter et al. 2007), and therefore, accurate estimates of forest canopy height are critical to improve ecological modelling and estimates of forest biomass, volume, and productivity. Traditional field-measured methods can obtain accurate tree heights; however, field measurements are laborious and time-consuming with low efficiency. Moreover, field-based methods have difficulty in obtaining continuous tree height data over large areas. Remote sensing techniques can periodically acquire spatial information of forest over large areas and are the only feasible approach to estimate vegetation parameters at a series of spatial and temporal scales (García et al. 2010). Remote sensing data are increasingly used to estimate vegetation parameters (e.g., Chopping et al. 2011;Eisfelder et al. 2012; Barrachina et al. 2015;Nie et al. 2017;García et al. 2018). Nevertheless, passive optical remote sensing data, such as multi-and hyperspectral imageries cannot directly obtain vertical structural information of vegetation and are susceptible to weather and complex terrain conditions (Stojanova et al. 2010). In addition, the signal saturation problem of optical remote sensing data easily occurs in dense vegetation areas or high biomass level (Solberg et al. 2017). Although vegetation parameters can be estimated using optical remote sensing data, the estimation precisions usually decline in vegetated areas with high biomass or densely vegetated areas (Duncanson et al. 2010).
LiDAR (Light Detection and Ranging) is an active remote sensing, and LiDAR data have been widely used for predicting vegetation structural parameters, such as canopy height (Maguya et al. 2015;Alexander et al. 2018;Matasci et al. 2018;Mielcarek et al. 2018;Iverson et al. 2019), leaf area index (Alonzo et al. 2015;Zheng et al. 2017), and biomass (Frazer et al. 2011;Montagnoli et al. 2015;Luo et al. 2017;Dalponte et al. 2018;Silva et al. 2018). LiDAR data have a unique ability to predict vegetation canopy height because LiDAR pulses can penetrate the vegetation canopy (Lefsky et al. 2005). LiDAR can be classified as discrete-return LiDAR (DRL) and full-waveform LiDAR (FWL). DRL sensor can record a few (usually less than 5 returns) returns per laser pulse. A series of LiDAR metrics can be calculated using DRL data, which are widely used to predict vegetation structural parameter. However, DRL techniques possess a dead zone about 2.0 or more meters (Rogers et al. 2015). In contrast, FWL systems are able to record all signal returned by illuminated objects (Hermosilla et al. 2014a;Lai and Zheng 2015;Qin et al. 2015), and can overcome the problem of dead zone in DRL systems (Ballhorn et al. 2009). FWL sensors can acquire richer information about ground objects and can precisely characterize the vegetation vertical structure (Sumnall et al. 2016). By processing and analysis of waveform LiDAR data, additional information on forest structure can be extracted, which is not available in discrete-return LiDAR data (Rogers et al. 2015;Pablo et al. 2018).
To take advantage of the information contained in DRL data and improve the applications of LiDAR data, previous researchers have created pseudo-waveform data utilizing DRL data (e.g., Muss et al. 2011;Popescu et al. 2011;Zhou and Qiu 2015;Luo et al. 2019b). The pseudo-waveforms are created by a voxel-based method. In this method, the LiDAR point clouds are partitioned along the vertical and horizontal directions to form the voxels (Pearse et al. 2019). The results indicated that the pseudo-waveforms generated utilizing DRL data are very similar to the authentic LiDAR waveforms. Moreover, the estimation performances of vegetation parameter utilizing pseudo-waveforms are comparable to or better than discrete-return LiDAR data (Muss et al. 2011;Lindberg et al. 2012). With recent development in LiDAR techniques, increasing small-footprint FWL data can be available. There are two approaches to process full-waveform airborne LiDAR data. One approach is extraction of discrete returns from waveform LiDAR data using waveform decomposition method and produces denser point clouds, and then a range of LiDAR statistical metrics are calculated using discrete-return point clouds (Gao et al. 2015;Milenković et al. 2017;Sumnall et al. 2016). Another one is a voxel-based approach. In this approach, all waveforms are converted into a three-dimensional voxel to generate larger scale waveforms (Stelling and Richter 2016). Previous researchers have extracted a series of waveform metrics from voxel-based waveform data and successfully estimated vegetation parameters (Hermosilla et al. 2014b;Li et al. 2016;Nie et al. 2017;Pablo et al. 2018). Hancock et al. (2017) used a voxel-based approach for predicting canopy cover and the results showed that the voxel-based waveforms can obtain more details on understory vegetation and within-canopy structure. Cao et al. (2014) estimated forest biomass utilizing waveform LiDAR metrics and compared predictive power of biomass utilizing DRL metrics and FWL metrics, and they found that fusing DRL metrics and FWL metrics improved prediction accuracy of forest biomass. Pablo et al. (2018) predicted Mediterranean understory vegetation height, cover and volume utilizing FWL and the results indicated that FWL data can accurately estimate understory vegetation parameters.
The scan angle of small-footprint FWL systems is usually off-nadir. Therefore, waveform data for the same position have different paths, which can cause waveforms to fail to precisely depict vertical structural information of vegetation canopy (Hermosilla et al. 2014a). However, the voxel-based approach can decrease the influence of offnadir scan angle. In addition, the voxel-based approach can take full advantage of the information provided by waveforms, which has potential for improving the prediction precision of vegetation parameter. An increasing number of researchers have used a voxel-based method to process FWL data and estimated vegetation parameters. Pang et al. (2011) studied the influence of footprint size of spaceborne waveform LiDAR on the estimation accuracy of canopy height. The results show that the footprint size had a distinct influence on the estimation accuracy of canopy height. Similarly, the voxel sizes of FWL data have an important influence on estimation accuracy of vegetation parameters. As a consequence, it is of great significance to obtain the optimal voxel size for improving prediction precision of vegetation parameter. Nevertheless, little research has investigated the influence of voxel sizes on the prediction precision of canopy height. The main objectives of this study were to: (1) create voxel-based LiDAR waveforms with different voxel sizes; (2) estimate forest canopy height using voxel-based LiDAR metrics; and (3) assess the influence of voxel sizes on prediction precison of canopy height and determine the optimal voxel size.

Study area
This study area is located in Huailai County of Hebei Province, North China, which is adjacent to Beijing, the Capital of China. The study area and locations of field plots are shown in Fig. 1. Huailai County is in the semi-arid region. The average precipitation and temperature are 396 mm·year − 1 and 9.6°C, respectively. In our study area, the average altitude is 540 m above sea level with flat terrain. The trees are planted forest and are dominated by poplar (Populus spp.), and the staple crops are corn and soybeans.

Field measurements
Field-measured data were collected from July 7th to 13th, 2014. Height (h) of each tree in a plot was measured. The center coordinates of all the plots were determined using an RTK-GPS. A total of 34 square plots (20 m × 20 m) were collected. Canopy height (H) with a plot scale was calculated using Eq. 1. H was between 5.01 and 26.17 m with mean of 12.49 m (standard deviation = 5.47 m).
where n is the number of the tree in a plot; h i is the h of the ith tree in a plot.

Full-waveform LiDAR data acquisition
Full-waveform airborne LiDAR data were collected on July 27th, 2014 utilizing the Leica ALS70-HA LiDAR sensor. The absolute flight height was 2800 m with scan angle of ±12°. The flight stripe side lap was 50% with an average pulse density of 4.1 pulses·m − 2 . LiDAR data acquisition parameters are shown in Table 1. In addition, discrete-return LiDAR data were also provided by the LiDAR vendor. Discrete-return point clouds were classified as ground and non-ground laser points through Terrasolid software. And then, ground laser points were LiDAR data processing Pre-processing of full-waveform LiDAR data Waveform data processing includes denoising, attenuation correction, waveform heights normalization, and so on. Figure 2 provides an overview of full-waveform LiDAR data processing and forest canopy height estimation. The main purposes of data pre-processing are to remove the noise in waveform data and correct the attenuation of waveform signal. To enhance the signalto-noise ratio (SNR) of waveform data and get the real waveform signal, we removed the noise using a noise threshold method. The threshold was determined using the mean plus four standard deviations in our study (Lefsky et al. 2007), and all the waveform signals below the threshold were removed as the noise. And then, the waveform data were smoothed using a Gaussian filter.
Moreover, the heights of LiDAR waveforms were normalized using the DTM. To reduce the effects of attenuation produced by the upper canopy, all waveform intensities were rectified utilizing an approach presented by Richter et al. (2014).

Waveform voxelization
The voxel-based waveforms can reduce the spatial displacement caused by the off-nadir scan angles (Pablo et al. 2018) and can characterize the fine-scale vertical structural information of vegetation canopy. In our study, we voxelized the pre-processed waveform LiDAR data using a series of voxel sizes. The horizontal sizes of voxel were from 10.0 to 40.0 m interval of 2 m and the vertical size of voxel was set to a constant of 0.15 m. And then, all voxels were assigned the mean intensity of all waveforms falling within a voxel space.

Voxel-based waveform metrics
To estimate forest canopy height, a series of waveform metrics were extracted utilizing voxel-based waveform LiDAR data. Commonly used waveform metrics include the waveform distance (WD), height of median energy (HOME) (Drake et al. 2002), HTRT (ratio of WD to HOME), vertical distribution ratio (VDR), front slope angle (FS) (Ranson et al. 2004), etc. See reference (Luo et al. 2019a) for more waveform metrics and descriptions. To obtain accurate waveform feature information, voxel-based waveforms were decomposed using a Gaussian decomposition approach. In  (Popescu et al. 2011).

Statistical analyses and modelling
In the research, Random Forest (RF) algorithm was used to estimate forest canopy height. RF does not need any assumption about data distribution and can solve complex relationships between waveform metrics and canopy height (Ahmed et al. 2015). RF algorithm has the ability to overcome overfitting and improve prediction precision of forest canopy height (Pearse et al. 2019). In addition, RF regression produces unbiased estimates and does not need independent validation for estimation model (Breiman 2001;Naidoo et al. 2012), and this is especially beneficial when no additional validation dataset is available in a study. Therefore, RF regression has been broadly applied for prediction of vegetation parameters in previous studies (Gleason and Im 2012; Ramoelo  Zhao et al. 2018). In this study, coefficient of determination (R 2 ), residual (Eq. 2), root mean squared error (RMSE) (Eq. 2), and relative RMSE (RMSE%) (Eq. 4) were calculated to validate the estimation accuracy of canopy height.
where y i andŷ i represent the observed and predicted forest canopy height on the sample plot i, respectively; y is the average value of observations and n is the total number of plots.

Results
LiDAR waveform data with 16 different voxel sizes were produced through aggregating all the waveforms in a voxel. Figure 3 presents the LiDAR waveforms of 16 voxel sizes for the same field plot. The results show that the voxel-based method could successfully produce new LiDAR waveforms utilizing full-waveform LiDAR data. In addition, there were some differences among LiDAR waveforms for the different voxel sizes (Fig. 3), which could cause the difference of prediction precision of vegetation structure parameter. Figure 4 shows the voxel-based waveform, five Gaussian curves derived using the Gaussian decomposition approach and their center positions. The voxel-based waveform was decomposed into five Gaussian components (four canopy components and one ground component) (Fig. 4). And then, the voxel-based waveform metrics were calculated using the parameters of Gaussian curves. Prediction precisions of forest canopy height with different voxel sizes are shown in Fig. 5. Figure 6 shows the influences of voxel sizes (from 10.0 to 40.0 m interval of 2 m) on prediction precisions of canopy height. In this study, the voxel sizes from 16 to 38 m can be used as appropriate voxel size to estimate forest canopy height, and all R 2 values were greater than or equal to 0.768. The highest R 2 was 0.832 with RMSE = 2.57 m (RMSE% = 20.6%), which was produced at the voxel size of 18 m. The scatter plot of field-measured forest canopy height against estimated height using the optimal voxel size is presented in Fig. 7b. Orange area represents 95% confidence interval and red solid line is fit line. Compared with the R 2 values derived from the minimum voxel size (10 m) (R 2 = 0.625, Fig. 7a) and the maximum voxel size (40 m) (R 2 = 0.75, Fig. 7c), the R 2 value derived from the optimal voxel size of 18 m improved by 33.1% and 10.9%, respectively. In short, forest canopy height could be reliably estimated by a voxel-based method utilizing full-waveform data. In addition, voxel sizes had important influences on prediction accuracy of canopy height. Therefore, when predicting canopy height utilizing LiDAR waveform metrics, it is necessary to determine the optimal voxel size to improve estimation accuracy. In this study, we also estimated canopy heights using discrete-return metrics with the optimal voxel size. Figure 8 shows the scatterplot of field-measured canopy heights against estimated values. To highlight the advantage of waveform data for estimating canopy height, we compared the results derived from waveform metrics with the results derived from discrete-return metrics. The results show that waveform metrics can produce better estimation accuracy than discrete-return metrics, although the R 2 only improved by 3%. The map of canopy height in the study area was produced using waveform data based on the optimal voxel size of 18 m (Fig. 9).

Discussion
This research demonstrated how to predict canopy height utilizing a voxel-based approach from waveform data. The voxel-based approach has previously been applied to predict vegetation structure parameter. Our results showed that the voxel-based method is an effective method to estimate vegetation parameters, which could precisely estimate forest canopy height using fullwaveform LiDAR data. Similar findings were observed in previous studies (Popescu and Zhao 2008; Allouis et al. ). However, little research has explored the effect of voxel sizes on the estimation accuracy of vegetation parameter, especially for waveform data. Our research primarily focused on exploring the influence of voxel size on forest canopy height estimates from waveform LiDAR. We used a series of voxel sizes (from 10.0 to 40.0 m interval of 2 m) to produce voxel-based waveform data. The results indicated that voxel-based waveforms with different voxel sizes could be reliably created utilizing mean value of all waveform intensities falling in each voxel. Although 16 waveforms with different voxel sizes were generally similar (Fig. 3), there were small differences among these waveforms. Therefore, waveform metrics extracted from voxel-based waveforms with different voxel sizes also had some differences, which could result in different estimation accuracies of canopy height.
In the research, waveform metrics were extracted from voxel-based waveform to predict canopy height. In our study, all 34 field-measured plots were used for modelling, and there are no additional plots for separate validation. Because RF regression algorithm can avoid the need for independent validation using out-of-bag method, we used RF regression to successfully establish 16 estimation models through voxel-based waveform metrics with different voxel sizes. We found that voxel-based waveform metrics were able to accurately estimate forest canopy height (the best result was R 2 = 0.832 with RMSE = 2.57 m) (Fig. 7b). Figure 10 shows residuals distribution of estimated forest canopy height against field-measured height. In general, residuals for canopy height estimates are reasonable, although the canopy heights were underestimated for the canopy heights greater than 17 m.  The variations of R 2 values with voxel sizes are shown in Fig. 6, and the results showed that the voxel size had an important effect on estimation accuracy. This finding is consistent with that of Zheng et al. (2017), who performed a sensitivity analysis of voxel sizes (0.1-26 m) on directional gap fraction, and the results showed that voxel size is a key factor in estimating the directional gap fraction using LiDAR data. Similar results also were reported by Cifuentes et al. (2014), who found that voxel size has noticeable effect on estimation result of gap fraction. In the research, the prediction accuracy of canopy height did not increase or decrease monotonously with an increase of voxel size. When the voxel size was less than 16 m, the estimation accuracy was relatively low. This is because the voxel size is too small to effectively characterize true forest canopy height. Therefore, voxel size should be appropriately increased to produce reliable estimation results. When the voxel size was greater than or equal to 16 m, the estimation accuracy was not significantly different. Therefore, the larger the voxel size was used, the more stable and precise estimation accuracy was produced. This is due to the fact that larger voxel size contains more waveforms, which can more effectively characterize the vertical structure of vegetation. Similar findings were reported by Crespo-Peremarch and Ruiz (2018), and they found that the voxel size should be appropriately increased to keep reliable biomass estimation accuracy, especially using low pulse density LiDAR data.
This study showed that voxel sizes from 16 to 38 m could be identified as appropriate voxel sizes to estimate forest canopy height, and all R 2 values were greater than or equal to 0.768. Nevertheless, there was an optimal voxel size of 18 m, which produced the best estimation result (Fig. 7b). We found that the optimal voxel size was not consistent with field-measured plot size. This is because the voxel size of 18 m could better describe the characteristics of forest canopy height in the study area. Red solid line is fit line Fig. 8 Scatterplot of field-measured forest canopy heights against estimated values using discrete-return data. Orange area represents 95% confidence interval. Red solid line is fit line Kim et al. (2016) found that prediction accuracy of forest biomass could improve by 27.8% using the voxel size of 30 m × 30 m than using the voxel size of 20 m × 20 m. Almeida et al. (2019) studied the influences of voxel sizes (1, 2, 5, 10, 25, 50 and 100 m) on leaf area profile estimates, and the results showed that appropriate voxel size could provide reliable estimates of leaf area profile. Compared with the lowest R 2 value (0.625, produced at the voxel size of 10 m), the R 2 value derived from the optimal voxel size improved by 33.1% in this study. Therefore, our results further demonstrate that the appropriate voxel size can improve estimates of vegetation parameters. There are two reasons why the optimal voxel size should be considered while predicting vegetation structure parameter from waveform LiDAR. The first reason is that the optimal voxel size is able to improve prediction precision of vegetation structure parameter. The second is that the optimal voxel size can be used as the guidance for the choice of plot size in field measurements, which helps to improve the efficiency of forest survey. As a consequence, the optimal voxel size should be applied for improving prediction precision of vegetation structure parameter. However, the optimal voxel size may be different for the different research objectives and study areas. Therefore, the optimal voxel size should be determined based on specific vegetation type, estimation parameter, waveform density, terrain, etc. To conclude, the method developed in our study has potential to improve prediction precision of vegetation structure parameter from waveform data.

Conclusions
In this study, forest canopy heights were predicted from small-footprint full-waveform airborne LiDAR through the voxel-based approach. The results show that the voxel-based method can reliably estimate canopy height from waveform LiDAR (the highest accuracy was R 2 = 0.832 with RMSE = 2.57 m). The influence of voxel sizes on prediction precision of canopy height was carried out. This research demonstrates that the voxel sizes have an important influence on prediction precision of canopy height, and the highest prediction precision was produced when using the voxel size of 18 m. As a consequence, appropriate voxel size can effectively improve estimates of forest parameters, and the optimal voxel size should be determined to obtain the most accurate estimation accuracy of vegetation parameters. Further studies are needed to conduct the effect of voxel sizes on the estimation accuracy of other vegetation types and structural parameters. In addition, in our study the vertical size of voxel was set to a constant of 0.15 m. However, the vertical size of voxel has an influence on prediction precision of vegetation structure parameters. Therefore, different vertical sizes of voxel should be also considered when using a voxel-based approach for predicting vegetation structure parameters.