Early detection of pine wilt disease in Pinus tabuliformis in North China using a field portable spectrometer and UAV-based hyperspectral imagery

Background: Pine wilt disease (PWD) is a major ecological concern in China that has caused severe damage to millions of Chinese pines (Pinus tabulaeformis). To control the spread of PWD, it is necessary to develop an effective approach to detect its presence in the early stage of infection. One potential solution is the use of Unmanned Airborne Vehicle (UAV) based hyperspectral images (HIs). UAV-based HIs have high spatial and spectral resolution and can gather data rapidly, potentially enabling the effective monitoring of large forests. Despite this, few studies examine the feasibility of HI data use in assessing the stage and severity of PWD infection in Chinese pine. Method: To fill this gap, we used a Random Forest (RF) algorithm to estimate the stage of PWD infection of trees sampled using UAV-based HI data and ground-based data (data directly collected from trees in the field). We compared relative accuracy of each of these data collection methods. We built our RF model using vegetation indices (VIs), red edge parameters (REPs), moisture indices (MIs), and their combination. Results: We report several key results. For ground data, the model that combined all parameters (OA: 80.17%, Kappa: 0.73) performed better than VIs (OA: 75.21%, Kappa: 0.66), REPs (OA: 79.34%, Kappa: 0.67), and MIs (OA: 74.38%, Kappa: 0.65) in predicting the PWD stage of individual pine tree infection. REPs had the highest accuracy (OA: 80.33%, Kappa: 0.58) in distinguishing trees at the early stage of PWD from healthy trees. UAV-based HI data yielded similar results: the model combined VIs, REPs and MIs (OA: 74.38%, Kappa: 0.66) exhibited the highest accuracy in estimating the PWD stage of sampled trees, and REPs performed best in distinguishing healthy trees from trees at early stage of PWD (OA: 71.67%, Kappa: 0.40). Conclusion: Overall, our results confirm the validity of using HI data to identify pine trees infected with PWD in its early stage, although its accuracy must be improved before widespread use is practical. We also show UAV-based data PWD classifications are less accurate but comparable to those of ground-based data. We believe that these results can be used to improve preventative measures in the control of PWD.


Background
The pine wood nematode (PWN; Bursaphelenchus xylophilus) is a hazardous invasive species that infests multiple species of pine (Vicente et al. 2012;Douda et al. 2015). Pine wilt disease (PWD), caused by the PWN, is widespread throughout East Asia (Mamiya 1988;Hyun et al. 2007;Ye 2019). Previously isolated to southern China, PWD is now found throughout the country, including Northeast China (Pan et al. 2019;Yu et al. 2019). In 2016, PWD first appeared in Dalian, Liaoning Province, then in May 2017, it happened in Dandong City, Fushun City, Benxi City and other places (National Forestry Administration 2018). In addition, Monochamus saltuarius was identified as a new vector of PWD in Liaoning Province of China (Yu et al. 2018). In the process of spreading northwards, PWD has infected and caused severe damage to the Chinese pine (Pinus tabulaeformis), Korean pine (P. koraiensis), and larch (Larix spp.) populations. This has resulted in significant economic losses and ecological damage to Chinese pine forests (e.g., Li et al. 2011;Lin 2015;Hui 2018).
To effectively control PWD, it is necessary to identify infected trees in the early stage of infection. This is a difficult task because most trees progress from initial infection to the serious infection stage within 5 weeks (Umebayashi et al. 2017). Consequently, current PWD management strategies emphasize the control of infected trees after the onset of an outbreak by means of fumigation, burning, and tree felling (Shin 2008;Kim et al. 2018). What is lacking is a methodology that monitors pine populations that can quickly and efficiently detect the early signs of PWD (Ma et al. 2011). In addition, many efforts have been made in early detection of PWD (Kim et al. 2018;Syifa et al. 2020;Tao et al. 2020), but not in Chinese pine. In this paper, we present a method aimed at detecting PWD in Chinese pine in early stage.
A major obstacle in the management of pines infected by PWD is that the forests they persist in are very large communities. This can make classical ground identification and sampling methods impractical. To solve this problem, recent studies have used remote sensing (RS) to examine the impact of PWD on the physiological and biochemical changes after infection (e.g., Shen et al. 2001;Li et al. 2004;Wang et al. 2007). Advancements in RS technology increasingly support the prediction efficiency by reducing inherent spatial and temporal constraints (Ahmed et al. 2020). Similarly, hyperspectral remote sensing (HRS) can obtain continuous spectral information of objectsthis has been used to detect changes in the spectral characteristics of needles on infected-trees in the process of discoloration caused by PWN infection (Pan 2011;Kuai 2012).
Previous studies show the presence of PWD is significantly linearly correlated with water and chlorophyll content. Therefore, water and chlorophyll content could be used as indicators of PWD (Huang 2020). This is important because RS and HRS methods can be used to estimate water and chlorophyll content. For example, using a field portable spectrometer to measure the spectral characteristics of P. thunbergii and P. massoniana at different stages of PWN infection, Xu et al. (2011) found the reflectance spectrum curve in the mid-infrared band may indicate the early stage of PWD with the analysis of the spectral characteristic parameters and changes in chlorophyll levels. Similarly, Xiang et al. (2018) used a field portable spectrometer, analyzing the relationship between spectral properties and chlorophyll, showing that the chlorophyll content of pine decreases with the stage of PWD (later, more severe stages are associated with lower chlorophyll content). In addition, the position of red edge, the wavelength of red edge, the height of green peak, and the depth of red band absorption all strongly correlate with chlorophyll content (e.g., Xiang et al. 2018). Correspondingly, the area surrounded by the first-order differential spectrum in the 490-530 nm range and that in the 680-760 nm range was found to be a significant hyperspectral feature indicating the occurrence of PWD (e.g., Huang et al. 2012). These studies all used a field portable spectrometer, which cannot be applied in a large-scale area.
Past studies used satellite imagery such as Landsat, IKONOS, Quick Bird, and GF-2 images to detect forest pest disease (e.g., Franklin et al. 2003;White et al. 2005;Hicke and Logan 2009;Zhan et al. 2020). However, due to limitations in spatial, temporal, spectral resolution as well as weather complications, satellite imagery cannot obtain real-time data (Santoso et al. 2016). Because of these limitations, the detection scale of forest pest disease has shifted to Unmanned Airborne Vehicle (UAV) remote sensing, which offers the advantages of low consumables and operating costs, high ground resolution data collection, and more precise accuracy (Tang et al. 2015). For example, Huang et al. (2018) used a fixedwing unmanned aerial vehicle to monitor dead pine trees caused by PWD, successfully monitoring pine tree mortality with over 80% accuracy. Li et al. (2020) used UAVs to acquire remote sensing images of forest areas to assess the presence of PWD, successfully recognizing infection with 90.4% accuracy. Huang (2020) used UAV multispectral data to draw a conclusion that the first derivative of healthy and infected P. thunbergii changed markedly at 710 nm. Except RGB and multispectral camera, hyperspectral imagery was also applied in detecting forest pest diseases. Abdel-Rahman et al. (2014) used airborne hyperspectral data, random forest and support vector machines classifiers to distinguish amongst healthy, Sirex noctilio grey-attacked and lightningdamaged pine trees. Zhang et al. (2018) utilized the ISIC-SPA-P-PLSR framework based on UAV-based hyperspectral image to identify the degree of damage trees caused by Dendrolimus tabulaeformis. Iordache et al. (2020) acquired airborne multispectral and hyperspectral data, and used Random Forest algorithms to compare the classification accuracies of the two datasets in detecting PWD, finding that both datasets performed well in identifying the infected, suspicious, and healthy trees. Importantly, however, in detecting the PWD, most studies focus on distinguishing between healthy and infected trees using RGB (red, green and blue bands) camera, multispectral data, and ground hyperspectral data, but UAV-based hyperspectral data were not widely studied. In addition, few studies emphasized the identification of trees in each stage of PWD infection in Chinese pine, which we focus on in this study. In our study, we systematically divided the infection stage into four stages, making the detection more accurate. Because high spatial and spectral resolutions, and feasibility of large-scale area application are needed to distinguish the subtle difference between healthy and the early stage of infected trees, we consider UAV-based hyperspectral imagery.
Spectral indices, such as Vegetation indices (VIs), red edge parameters (REPs), and moisture indices (MIs), can reflect the infection condition of PWD (e.g., Kim et al. 2018;Huang 2020). VIs is a combination of different remote sensing spectral bands, which can be regarded as a sign of relative abundance and activity of green vegetation (Jones and Vaughan 2010). Over the past years, VIs had been widely applied to extract sensitive estimates of plant biochemical characteristics (e.g., He et al. 2015;De Klerk and Buchanan, 2017), such as the normalized difference vegetation index (NDVI) that decreases with increasing tree PWD stage severity (Kim et al. 2018), and the presence of PWD can be detected by calculating the VIs based on ground, aerial, and satellite data (e.g., White et al. 2007;Pan et al. 2014;Jung and Park 2019;Iordache et al. 2020). The REPs are derived from Red edge (680-780 nm), which is the most obvious feature of plant spectral curve. As an indicator of plant stress and often used to study the growth and health of plants (Boochs et al. 1990;Dawson and Curran 1998), it also had been well studied in detecting the PWD (Du et al. 2009;Huang et al. 2012). Additionally, pine trees killed by PWD by blocking transmission of water (Yang 2002), and the water content of pine needles decreased with increasing PWD infection severity (Chen, 2005). Thus, changes in MIs (also derived from radiometric data) can be used to detect the presence of trees infected with PWD (Xu et al. 2012;Song et al. 2018).
Although spectral indices were widely used to detect the PWD, there is no study that yet provides good parameters to predict each stage (healthy, early, middle, and serious stages) of PWD infection in Chinese pine trees. Additionally, analyses of PWD simultaneously considering ground and UAV-based hyperspectral data have not been widely conducted.
Therefore, to fill this gap, in this study, our objective is to explore the capacity of ground and airborne hyperspectral data using VIs, REPs, and MIs to classify the stage of PWD infection in Chinese pine at the tree level. Furthermore, we also aim to provide a useful and fairly accurate method of distinguishing between trees in the early stage of PWD infection from healthy trees.

Study area and ground survey
We conducted our study in Cangshi Village, located in Fushun County, Liaoning Province, in northeastern China (124°21′-124°24′ E, 41°53′-41°57′ N; Fig. 1). In the study area, the species of plantation forests is dominated by Chinese pine (P. tabulaeformis), and the age of them is approximately 40-50 years. The total area of forest cover in Fushun County is approximately 12.43 × 10 4 ha, of which P. tabulaeformis makes up > 30%. In addition, the broadleaf tree species and understory vegetation in the study site mainly include, Quercus acutissima, Quercus mongolica, grass, et al. The area is situated in the Middle Temperate Zone. It has a continental monsoon climate and experiences approximately 804.2 mm of precipitation per year. The mean annual air temperature is approximately 6.6°C.
According to local Forestry Administration records, PWD has resulted in the death of tens of thousands of pine trees since the onset of outbreaks in 2016 in Liaoning Province (National Forestry Administration 2018).
Field measurements were conducted in 12-18 August 2019. We established three 30 m × 30 m plots located northeast of Cangshi Village (Fig. 1). The coordinates of the plot boundary and the location of each tree were recorded using a handheld differential global positioning system (DGPS, Version S760) with sub-meter accuracy. In each plot, we recorded tree growth state including tree height (H), diameter at breast height (DBH), crown diameter (CD), and PWD infection stage. In addition, we measured biochemical parameters: the leaf chlorophyll content (Cab) and water content (WC) of each tree. Cab was derived by averaging the Cab of needles from four different directions using a calibrated CCM-300 Chlorophyll Content Meter. The Cab of seriously damaged trees was 0 measured by the CCM-300. Meanwhile, WC of each tree was determined by the fresh weight (FW) minus the dry weight (DW) divided by the FW: (WC = (FW -DW)/FW × 100%). Finally, a total of 218 pine trees (healthy: 76; early stage: 54; middle stage: 47; serious stage: 41) were measured. Summary statistics of three plots are given in Table 1.
Additionally, we randomly selected 20 discolored pine trees as samples from each plot, and took them back to the laboratory for testing by Behrman funnel method. The result showed that they all carried pine wood nematode.

Infected stage division
On the basis of previous studies (Xu et al. 2011;Santos and de Vasconcelos, 2012), we combined needle, ground tree, and UAV images to categorize PWD infection into four stages: (1) Healthy, (2) Early stage, (3) Middle stage, and (4) Serious stage (Fig. 2). Stages were defined by color of needles, growth vigor, and resin secretion (Table 2). We had four people classified each tree, and took the majority's opinion as final results to reduce subjective errors. Finally, we used the following definitions: "Healthy" trees were defined as having dark green needles, normal resin secretion, and vigorous growth. "Early stage" trees were defined by slightly yellowed needles, with decreased resin secretion and grow rates. "Middle stage" trees were defined by yellow-brown needles, wilt, and weak growth. Dry trees with reddish-brown needles were defined as the "Serious stage".

Remote sensing data acquisition and preprocessing Ground spectrum acquisition
From the ground (physically measuring trees in the field), we measured the spectrum of sampled trees using ASD Field Spec 4 Hi-Res NG (Analytical Spectral Devices, Boulder, CO, USA). The spectral range is 350-2500 nm and the spectral resolution is 3 nm in the 350-1000 nm wavelength range and 6 nm in the 1001-2500 nm  wavelength range. We selected and measured branches roughly representative of the average spectrum of each tree. The selected branches were cut from the east, south, west, and north directions from the upper, middle, and lower layers . We calculated the spectrum of each sampled tree by averaging the spectrum of the selected branches. The ground spectrums were gathered from 10:00 to 14:00 every day, from August 12 to August 17. We obtained the ground spectrum for comparison with UAV-based data and auxiliary radiometric correction.

UAV-based hyperspectral imagery
Hyperspectral Imagery (HI) data were obtained by using a DJI Matrice 600 UAV (DJI, Shenzhen, China) equipped a Pika L hyperspectral camera (Resonon, USA). The main parameters of the Pika L are listed in Table 3. GNSS (Global Navigation Satellite System) and IMU (Inertial Measurement Unit) modules are integrated into UAV, and its horizontal and vertical position errors are approximately 2.0 and 5.0 m, respectively, with an orientation precision of approximately 1 degree. The overall UAV-based system is shown in Fig. 3. UAV-based hyperspectral data acquisition was carried out in the test areas of Cangshi Village from 12:00-12: 30, on 18 August 2019. The weather was sunny during the flight. Standard white board and white tarp were placed on the ground within the flying area. The flying height was set at 120 m, the image forward and side overlaps were set to 50%, and the flight speed is 2 m•s − 1 . The imagery consisted of 281 spectral channels (spectral resolution of 2.1 nm) from visible to near infrared (NIR) regions (400-1000 nm). Reflectance correction and radiometric calibration were performed using 3 m 2 carpet reference (standard white board) and the Spectronon software. Image geometric corrections were performed using 4 ground control points (GCPs). The positions of GCPs were recorded by a DGPS device with sub-meter accuracy. The ground resolution of HI was produced to be 0.4 m.

Tree crowns extraction from hyperspectral imagery
We conducted tree crown segmentations from HI by combining the object-based segmentation method with manually drawing ROIs (regions of interest). First, by use of ENVI 5.3, we used the object-based segmentation method on the HIs using combined spectral and texture features to separate trees crowns from the grass background and shadows (e.g., Yuan et al. 2013). The objectbased segmentation method successfully separated tree  crowns from the grass background and shadow components. However, it was difficult to separate overlapping crowns. Second, based on the result of object-based segmentation, we drew the ROIs manually. We determined the location of every individual sampled trees by use of the DGPS information. The ROIs of each tree were shaped by manually drawing the crown range on the RGB image. Then, the ROIs were added to the preprocessed HIs, and the spectrum of an individual tree was calculated by averaging the reflectance of the corresponding ROI extracted by ENVI 5.3. The average spectrum information of each ROI was used in the subsequent analysis (Fig. 4). Finally, the shadow components and overlapped crowns were discarded. Overall, 121 trees (healthy: 39; early stage: 27; middle stage: 29; serious stage: 26) were segmented from HI hyperspectral imagery.

Features extraction
In order to eliminate instrument errors and noises, while maintaining the original spectral characteristics, a Savitzky-Golay filter with 7 points (we tested 3-15 points and finally chose 7 points) was used to smooth the original spectrums of ground and UAV-based hyperspectral data (Mullen 2016). Based on previous research, we calculated 37 spectral variables including 12 VIs (Table 4), 20 REPs showed in Table 5 (Horler et al. 1980;Curran et al. 1990;Yao et al. 2009;Liu et al. 2010), and 5 MIs (Table 6) from spectral data.
We estimated the Cab and WC using a RF (Breiman 2001) regression using a bagging method based on the CART regression tree model. In the regression application, each tree was built by choosing a random sample and a random set of variables from the training dataset by a deterministic algorithm (Mutanga et al. 2012). All 121 samples were used for model training, and we then used a 10-fold cross-validation method (Waske et al. 2009) to assess model accuracy. The process of regression was conducted using the R package "randomForest". The coefficient of determination (R 2 ), RMSE (Root Mean Square Error), and RRMSE (Relative RMSE) between measured and estimated values were used to compare different indices in predicting the accuracy of Cab and WC. After selecting the variables which performed best in predicting the Cab and WC, we used the Cab or

Classification based on VIs, REPs, and MIs
We then used the selected VIs, REPs, and MIs correlated with Cab and WC to classify trees based on PWD infection. We used a RF classification model to assess the infection stage of sampled trees. In a RF algorithm, the variable importance is a metric of how much the "out-of-bag" (OOB) error of estimate increases due to the removal of a single variable from the data (Prasad et al. 2006;Verikas et al. 2011). The mean decrease accuracy (MDA) index of each variable is obtained when calculating the OOB error: the higher the MDA value of a variable is, the more important it is (e.g., Liu et al. 2017;Shi et al. 2018). The selected VIs, REPs, MIs and combining all variables were separately input into RF classification model, and the MDA of all selected variables were determined. All 121 samples were used for model training. We then used a 10-fold cross-validation method to estimate model accuracy. The process of classification was carried out using the R package "randomForest". The overall accuracy (OA), producer's accuracy (PA), user's accuracy (UA), and Kappa coefficient resulting from confusion matrices (Congalton 1991) were used to evaluate classification accuracy. Kappa coefficient is a popular statistic   Penuelas et al. (1997) for measuring agreement (Meddens et al. 2011). A Kappa value from < 0.4 indicates a "poor" agreement, Kappa 0.4-0.8 is defined as having moderate agreement, and Kappa > 0.80 indicates a "strong" agreement.
Using the overall and individual accuracies for all four PWD infection stages, we examined the paired accuracies of Healthy, Early stage, Middle stage, and Serious stage pine trees to examine the feasibility of discriminating between different stages.

Estimation of cab and WC
Leaf Cab and WC decreased with the severity of PWD infection (Fig. 5). We estimated the Cab and WC of all 121 sampled trees using the RF regression model with the three input parameters (VIs, REPs, and MIs) separately input. We examined the performance of Cab and WC estimation of the input parameters using both ground spectrum data and UAV-based spectral data (Figs. 6 and 7). Cab estimation accuracy was slightly greater when using REPs than using VIs for both ground data (REPs: R 2 = 0.78, RMSE = 82.34 g•m − 2 , RRMSE = 27.44%; VIs: R 2 = 0.74, RMSE = 89.80 g•m − 2 , RRMSE = 29.92%) and UAV-based data (REPs: R 2 = 0.75, RMSE = 87.34 g•m − 2 , RRMSE = 29.11%; VIs: R 2 = 0.72, RMSE = 94.11 g•m − 2 , RRMSE = 31.36%). For WC predictions in which MIs were used as input parameters, the predictions from ground data were considerably more accurate than UAV-based data. The results summarized in Table 7.
It showed that the model tended to overestimate Cab below 200 g•m − 2 and underestimate Cab above 300 g•m − 2 ( Fig. 6a and b; Fig. 7a and b), the RF regression   Verikas et al. (2011) model provided unsatisfactory predictions for Cab and WC in pine trees when VIs, REPs, and MIs were taken as input parameters.
In addition, Cab estimated by REPs derived from ground data (the optimum variables) were chosen to assess the PWD infection stages directly. Finally, the results showed that using Cab estimated by RF based on the optimum variables did not perform well in classifying the PWD infection stages (OA = 47.11%, Kappa = 0.29; Table 8). It means that estimated Cab cannot be directly used to accurately the PWD infection stages.

Feature analysis
The spectral reflectance of trees declined as a function of PWD stage severity (Fig. 8). The difference of spectral reflectance was obvious near the green peak (500-600 nm), red edge (680-760 nm), and NIR (750-950 nm; Fig. 9). VIs, REPs, and MIs exhibited differing responses to the severity of infection. While some variables such as NDVI (810, 680), Kg, and NDWI decreased with the increasing infection stage, others (e.g. MSI, PRI and Sg) significantly increased with the increasing of the infection stage (Fig. 10). Therefore, almost all the selected variables exhibited statistically significant responses to PWD severity, indicating their potential for detecting the stage of PWD. Generally, the spectral variables were sensitive to changes in biochemical characteristic.

Comparisons of classifications using different variables from ground and UAV-based data
The MDA index for the ground data and UAV-based data strongly differed among variables. Importance rankings indicated REPs to be more important than most VIs and MIs (Fig. 11). The most important variables were REPs, and VIs were generally more important than MIs. GH was the most important variable for ground data and SDR was the most important variable for UAVbased data.
OA (overall accuracy) assessment using the 10-fold cross-validation method indicated that REPs performed best. For ground data REPs yielded an OA of 79.34%, VIs 75.21%, and MIs 74.38%. Combined all variables, it yielded an accuracy of 80.17% (Tables 9 and 10). UAVbased data provided less accurate results for all variables: 72.73% for REPs, 70.25% for VIs, 63.64% for MIs, and 74.38% for combined all variables (Tables 9 and 10). Kappa values yielded similar qualitative results for both ground data and UAV-based data. For ground data, Kappa was calculated to be 0.67 for REPs, 0.66 for VIs, and 0.65 for MIs. For combining all variables, Kappa improved to 0.73. For UAV-based data, the values of Kappa for REPs, VIs, and MIs were 0.63, 0.60, and 0.51, respectively. For combining all variables, Kappa again improved (0.66). Therefore, for each data type (ground data and UAV-based data), REPs yielded the most accurate results, followed by VIs and MIs respectively. Additionally, ground data provided more accurate results than UAV-based data in all cases.
PA (producer's accuracy) values were high for the middle and serious stage of infection regardless of the data source and the parameters used. UA (user's accuracy) was relatively high for middle and serious stage of infection, while healthy and early stages had lower UAs (Table 11).
Pairwise comparisons of healthy, early stage, middle stage, and serious stage indicated the OAs of all stage pairs to be considerably greater than 80% in most cases (Figs. 12 and 13). Lower accuracies resulted when healthy pine trees and early stage of infected pine trees were compared based on the VIs (75.41%), REPs (80.33%), MIs (70.97%), and combined all variables (79.03%) from ground data, as well as VIs (68.33%), REPs

Discussion
In this paper, we employed VIs, REPs, MIs, and combining all variables, to examine the capacity of ground and UAV-based hyperspectral data in PWD infection stages estimation at individual tree level. The results reveal that combining all variables performed best and yielded a considerably accurate classification with OA of 80.17% for ground data and 74.38% for UAV-based data (Tables  9 and 10).
When we look at the capacity of identifying pine trees in the early infected stage of PWD, the REPs exhibited the best performance with OA of 80.33% and 71.67% from ground data and UAV-based data, respectively (Figs. 12 and 13).
Overall, it is understandable that: (1) the REPs are more responsive to stage changes of PWD infection than VIs and MIs, indicating that REPs may be more sensitive to the biochemical conditions; (2) UAV-based data performed considerable accuracy in monitoring the PWD stage at individual tree level, especially REPs, showing its good accuracy, which were slightly lower than ground data and can be applied in a large-scale forest area.

Error analysis
Previous studies show hyperspectral data to be effective in examining forest health (e.g., Pontius et al. 2008;Näsi et al., 2015). However, we encountered several difficulties, obstacles, and sources of error in precisely estimating leaf Cab, WC, and the stage of PWD in pine trees.
(1) The stage of PWD of each sampled pine tree was judged by visual observation. These measurements were fairly subjective and possibly inaccurate.
(2) The acquisition of ground and UAV-based hyperspectral data are both easily affected by the weather, especially light. Because data were collected during light hours, this may have biased results. (3) The results of individual tree crown segmentation using UAV-based hyperspectral data were somewhat inaccurate. This increased the uncertainly of extracting tree hyperspectral features and, consequently, it was difficult to distinguish pine trees from understory trees and separate overlapping crowns from HIs using the image classification algorithm. Manually drawing and visual interpretation can reduce the interference of mixed pixels, but there was a problem that it cannot be efficiently applied when the sample size was large. Nevertheless, in the actual situation, we can hardly meet two requirements at the same time: obtaining pure pixels and those that completely cover the whole crown. (4) We collected Cab and WC data on 12-18 August 2019, while we acquired the UAV-based data on 18 August 2019. During the interval, the biochemical conditions may have changed. Because it only took 30-60 min for the drone to complete the data collection, but the artificial ground survey took at least 1 week. In this study, we cut each tree branch and then measured the spectrum, Cab, and WC of each tree. Therefore, the workload is relatively heavy, the ground survey cannot be synchronized with the drone data collection, and we can only keep the time as close as possible. (5) The results of our study may be affected by small sample size.

Possible application of UAV-based hyperspectral data in detecting PWD
Overall, the PWD infection stage classification of ground data was more accurate than that of UAV-based airborne hyperspectral data (Tables 9 and 10). There are several possible sources of this discrepancy. Firstly, ground data consisted of samples from the entire tree while the airborne data only measured canopy spectral data. Therefore, ground data samples may more accurately reflect the tree condition. Additionally, airborne data acquisition is easily affected by weatherthis may have induced measurement errors. PA and UA of the four PWD infection classes using RF based on VIs, REPs, MIs, and combining all variables also suggest ground data performed better than airborne data (Table 11). However, when the RERs and combining all variables were used from UAV-based data, predictions were comparably accurate to those of ground data (Tables 9 and 10). Importantly, the acquisition of airborne data is simple, convenient, and much faster than ground data acquisition. Therefore, there is a trade-off between the accuracy and efficiency of data acquisition: ground data acquisition is accurate but time consuming to obtain while UAV-based airborne data is less accurate but much  easier to obtain. Because PWD potentially affects trees in many large forest areas, ground data acquisition is not a feasible management strategy. UAV-based data provides only slightly less accurate classifications than ground-based data and is thus a more practical candidate for future large-scale forest management.
The potential of identifying trees in the early stage of PWD Our results show that it is relatively simple to distinguish healthy trees and trees in early stage of PWD infection from trees in the middle and serious stage of PWD infection. This is because the biochemical characteristics (e.g. leaf Cab) of healthy trees and trees at early stage of PWD are very different from those of trees in middle and serious stage (Fig. 5). In contrast, it is difficult to distinguish healthy trees from trees in early stage of PWD because the difference in their spectral responses cannot be detected easily. REPs performed relatively well in distinguishing trees in early stage of PWD infection from healthy trees (ground data OA: 80.33%, Kappa: 0.58; and airborne data OA: 71.67%, Kappa: 0.40); however, overall, UAV-based data yielded moderately low accuracy (Fig. 13). Therefore, in practical application, especially in a large-scale forest area, it is still a challenge to use UAV-based hyperspectral data to Fig. 11 The mean decrease accuracy (MDA) of each selected variable from ground data (a) and UAV-based (b) data for estimating the disease stage of pine trees precisely identify trees at early infected stage of PWD. In conclusion, the main focus of our next study is to improve the accuracy by some effective approaches (e.g., using multi-temporal UAV hyperspectral data).

Classification algorithms
Machine learning algorithms, such as Random forest (RF), support vector machine (SVM), Classification and Regression Tree (CART), have been widely conducted in classifying damaged trees by forest pest in previous studies (Abdel-Rahman et al. 2014;Iordache et al. 2020;Syifa et al. 2020;Zhan et al. 2020). In our study, RF algorithm was used.
In RF algorithm, the mean decrease accuracy (MDA) index of each variable is determined when calculating the out-of-bag (OOB) error, which measures the importance of the variables by comparing how much OOB error of estimate value increases when excluding one variable and keeping others unchanged (Archer and Kimes, 2008;Verikas et al. 2011;Abdel-Rahman et al. 2013). Thus, the higher the MDA values of a variable, the greater its importance (Immitzer et al. 2012;Liu et al. 2017), we can thereby determine the most important variable. Additionally, compared with other algorithms, RF is more insensitive to multicollinearity, and its results are relatively robust to missing and unbalanced data, and it can well predict the effect of thousands of explanatory variables (Breiman 2001). Therefore, RF have been widely used in monitoring forest disturbance, especially for detecting wood borer in pine forest (Abdel-Rahman et al. 2014;Lin et al. 2019;Iordache et al. 2020).
Currently, deep learning algorithms, such as convolutional neural network (CNN), have been showing its great potential in plant health monitoring (Yuan et al. 2017;Nagasubramanian et al. 2019;Wu et al. 2021). However, it still has some dependencies. Firstly, when the data is small, deep learning algorithms do not perform well. Furthermore, deep learning is like a black box, it does not reveal why it given the result, so it is lack of interpretability (Ling et al. 2018;Silaparasetty 2020). On the other side, with its rigorous calculations and great flexibility (Schmidhuber 2015;Hao et al. 2016), it could improve our classification accuracy. In our next study, deep learning algorithms will be employed on PWD diagnose using multi-temporal UAV-based hyperspectral data.

The possible application of Lidar
In this study, the classification model, predictions for Cab and WC, and the results of individual tree crown segmentation were obtained based on hyperspectral data alone. However, the results were not satisfactory, especially the tree crown segmentation (only delineated 121 from 218). Another potential method of data collection is Lidar (light detection and ranging). Lidar can directly, quickly, and accurately obtain three-dimensional geographic coordinates of objects (Vierling et al. 2008). Much progress has been made in the application of Lidar technology in the fields of geology, forestry and ecology, such as the establishment of digital elevation model (DEM), the extraction of forest structure parameters, and the inversion of forest ecosystem function parameters (e.g., Watt et al. 2014;Huang and Lian 2015;Saarela et al. 2020;Xie et al. 2020). This makes Lidar a possible candidate to improve measurement accuracy. Although Lidar data failed to accurately reflect the biochemical condition of tree crowns (e.g., Liu et al. 2017;Shi et al. 2018), it can be used as measure auxiliary data that produces three-dimensional tree canopy structures (e.g., Shendryk et al. 2016). Thus, combining Lidar with hyperspectral data for individual tree segmentation could improve accuracy (e.g., Junttila et al. 2019;Lin et al. 2019). Furthermore, crown structure and other tree structural information are likely to change throughout PWD infection. Therefore, variables based on the return intensity information from Lidar data might be useful in estimating the stage of PWD in pine trees, and it will be our next study.

Conclusion
In this paper, we compared the relatively accuracies of using ground-based data and UAV-based hyperspectral data in predicting the stage of PWD infection in pine  trees. To do this, we selected VIs, REPs, MIs, and combining all variables as input parameters in a RF classification model. We found that combining all variables generally perform the best for estimating the stage of PWD infection of pine trees, and that REPs exhibit the highest accuracy in distinguishing between the healthy trees and trees in early stage of PWD infection. The classification accuracy of REPs based on UAV (airborne) data had slightly poorer performance in distinguishing trees at early stage of PWD and healthy trees (OA: 71.67%, Kappa: 0.40), but is still a feasible method. Therefore, UAV-based hyperspectral imaging is a promising candidate for measuring forest health. Relative to methods that use ground data, UAV-based hyperspectral imaging has the potential to substantially reduce labor and time costs. Future studies should aim to improve the accuracy of UAV-based data. One possible direction is the use of supplemental data acquisition practices such as UAV-based Lidar data to improve classification accuracy.