Influence of sampling intensity on performance of two-phase forest inventory using airborne laser scanning

Background: Forest inventories have always been a primary information source concerning the forest ecosystem state. Various applied survey approaches arise from the numerous important factors during sampling scheme planning. Paramount aspects include the survey goal and scale, target population inherent variation and patterns, and available resources. The last factor commonly inhibits the goal, and compromises have to be made. Airborne laser scanning (ALS) has been intensively tested as a cost-effective option for forest inventories. Despite existing foundations, research has provided disparate results. Environmental conditions are one of the factors greatly influencing inventory performance. Therefore, a need for site-related sampling optimization is well founded. Moreover, as stands are the basic operational unit of managed forest holdings, few related studies have presented stand-level results. As such, herein, we tested the sampling intensity influence on the performance of the ALSenhanced stand-level inventory. Results: Distributions of possible errors were plotted by comparing ALS model estimates, with reference values derived from field surveys of 3300 sample plots and more than 300 control stands located in 5 forest districts. No improvement in results was observed due to the scanning density. The variance in obtained errors stabilized in the interval of 200–300 sample plots, maintaining the bias within +/− 5% and the precision above 80%. The sample plot area affected scores mostly when transitioning from 100 to 200 m. Only a slight gain was observed when bigger plots were used. Conclusions: ALS-enhanced inventories effectively address the demand for comprehensive and detailed information on the structure of single stands over vast areas. Knowledge of the relation between the sampling intensity and accuracy of ALS estimates allows the determination of certain sampling intensity thresholds. This should be useful when matching the required sample size and accuracy with available resources. Site optimization may be necessary, as certain errors may occur due to the sampling scheme, estimator type or forest site, making these factors worth further consideration.


Forest inventory
A comprehensive description of the state and structure of woodlands is crucial information for forest owners and decision-makers in particular. Details on the availability of current resources form the basis for sustainable planning of forest use and its development. Hence, a number of requirements concerning forest management has been imposed on forest administrators. The Forest Act (1991) is an official state document that requires the development of a forest management plan (FMP) every 10 years for each forest holding in Poland. Similar binding documents are in effect in other countries . For instance, the creation of an FMP, or at least local forest inventory data, is mandatory in the following European countries: Bosnia and Herzegovina, Bulgaria, Croatia, the Czech Republic, Estonia, France, Hungary, Latvia, Macedonia, Poland, Portugal, Romania, Slovakia, Slovenia and Switzerland (Nichiforel et al. 2018). These examples show how crucial knowledge of forest resources is.

Growing stock volume
The growing stock volume (GSV) is one of the most important stand characteristics to be determined as a result of forest inventories (Tonolli et al. 2010). The GSV is a traditional indicator of wood resources, carbon stock, management efficiency, and sustainability in the forest sector (Jung and Mui 2010;EEA 2017). GSV-related indices, e.g., biomass or carbon, must also be reported according to international agreements, e.g., in national forest inventories. Many forest inventories use a designbased approach, where survey crews are contracted to collect data by means of field measurements on sample plots, entailing the use of increasingly expensive human resources (Eurostat 2018). This approach is widely applied across many European countries (FAO 2004;McInerney et al. 2011;FMM 2012;Redmond et al. 2016;Vidal et al. 2016). Although the design-based approach may be sufficient to describe attributes of an entire population, e.g., the forest district, it may not be precise enough to sufficiently describe attributes of elements or small areas, for instance, single trees or stands, especially in areas with a low sample coverage (McRoberts et al. 2013). This issue is becoming more relevant as detailed knowledge of stand-level resources becomes increasingly required (Johnson et al. 2004;Mäkelä & Pekkarinen 2004;Kauranne et al. 2017).

Forest stand
A common definition of a forest stand is as follows: a basic operational unit that is considered to be a spatially consistent part of the forest, homogeneous in terms of species composition, age, site type, tree origin, canopy cover, etc. (Helms 1998;Koivuniemi and Korhonen 2006;Pasalodos-Tato 2010;FMM 2012;Bolton et al. 2018). In Poland, single-stand GSV estimation may be performed for two reasons: once every 10 years for longterm forest planning and 1 or 2 years prior to any cutting operations for a given stand, mostly to evaluate the potential harvest. Such estimation can be conducted by means of a total survey, an approximation based on a similar stand, a visual assessment, or the angle-count method (FMM 2012;DGLLP 2015). None of the abovelisted methods appear to provide a balanced trade-off between the accuracy, precision and cost, especially in regard to surveying vast areas. Therefore, there seems to be a well-founded need for more universal solutions, which would provide objective and detailed forest inventory data at large scales (Wulder 1998;Mäkelä & Pekkarinen 2004;Mozgeris 2008;Stereńczak 2010;Tonolli et al. 2010;Holopainen et al. 2014;White et al. 2016;Kankare et al. 2017;Zygmunt et al. 2017).

Remote sensing
The potential of remote sensing (RS) techniques for forest management has been considered since the start of the twentieth century (Hugershoff 1911;Wilson 1920;Gieruszyński 1948). RS techniques are capable of providing continuous information over vast and hardly accessible sites. Nevertheless, the level of technology at that time and awareness of needs versus possibilities in forest communities did not facilitate either research or its practical implementation. The spread of computers and increased computational capacity have led to notable progress in the development of methods related to forest inventories. Aerial images (Bolduc et al. 1999), satellite data (Tomppo 1991) and later laser scanning (Naesset 1997;Naesset et al. 2004) have been intensively tested in Scandinavia and North America, with promising results in terms of their operational implementation, as is the case today in Nordic countries (Naesset et al. 2004;Maltamo and Packalen 2014;Vauhkonen et al. 2014;Kangas et al. 2018), Canada (Woods et al. 2011), the USA (Evans et al. 2006), New Zealand (Pont et al. 2012;Coomes et al. 2018) and Australia (Turner et al. 2011;Pont et al. 2012). In Poland, a joint project is being conducted by the state forests and scientific units to provide RS data as a mandatory part of inventories and forest management planning.
Among RS techniques, airborne laser scanning (ALS) is known to be particularly applicable for the assessment of wood resources (Even et al. 2015). By testing the accuracy of ALS for biomass estimation, Ene et al. (2013) showed that ALS-aided surveys can be an economical alternative to conventional inventories. ALS can provide a 3D representation of an entire forest district in the form of a point cloud (PC). Although such a PC might be a relatively accurate visualization of the forest structure, its quantification (e.g., the GSV) is usually determined through statistical modelling (Wulder et al. 2008;Leeuwen & Nieuwenhuis 2010;Balenović et al. 2013) instead of direct PC measurements. A common design applied to forest attribute extraction from ALS data is two-phase sampling with regression estimators or the area-based approach (Naesset & Bjerknes 2001;Naesset 2002;Köhl et al. 2006;Naesset 2014, Even et al. 2015. In the first phase of this sampling design, a complete wall-towall representation of the surveyed area is provided by ALS metrics (auxiliary variables). In the second phase, a limited set of ground sample plots is employed to establish a statistical relationship between the auxiliary variables and target variable (here, the GSV), such that the latter variable can be estimated over the entire area of interest, e.g., the forest district, or its part, viz. the stand.

Sampling intensity
The performance of ALS-enhanced forest inventories may depend on several factors, such as the sampling design, sample size, estimator, PC parameters, and variation in the target variable within the population (Köhl et al. 2006;Yang et al. 2019). There are many theoretical studies concerning the estimation of the required sample size, which chiefly depends on the goal and means of analysis. Recommendations for predictive multiple linear regression modelling (as used in this research) may be found in Knofczynski (2017), who applied a series of Monte Carlo simulations to artificial data to determine the minimum sample size needed to obtain regression parameters close to population parameters. He linked the required sample size to the number of predictors and correlation strength between the most reliable predictor and criterion. Similar findings were obtained by Bujang et al. (2017), who employed power analysis of real data to determine the minimum sample size for multiple linear regression. There are also certain rules of thumb for rapid sample size evaluation. Most of them are based on the principles of power analysis. With regard to regression, Voorhis and Morgan (2007) examined N > 50 observations. Harris (1985) advocated 50 + m (where m is the number of predictors) as the minimum sample size. In contrast, Green (1991) criticized the use of constant values (e.g., 100) and instead supported N > 50 + 8 m as the minimal sample size required to verify the overall fit of a regression model.
A priori sample size considerations undoubtedly constitute a firm base for planning either a research project or a commercial inventory. However, the complex forest structure may not always match all the theoretical assumptions. Related issues have already been considered (Adams et al. 2011;White et al. 2013White et al. , 2017 based on empirical data published in good-practice guidelines for the generation of inventory attributes from ALS data. Despite the use of similar methods, different researchers have reported fairly different results in this regard. Notable discrepancies in the number of sample plots utilized have occurred in other studies, starting from 15 plots (Tompalski et al. 2015), up to almost 800 plots (White et al. 2014). General trends regarding the influence of the number and area of sample plots have been presented in Gobakken and Naesset (2008) and Stereńczak et al. (2018). Saarela et al. (2015) tested the effects of the estimator and sample size on the precision of GSV predictions when light detection and ranging (LiDAR) data are adopted in a two-phase sampling design, with the simulated study area resembling the structure of Finnish forests. Having tested various models, they found minor to moderate effects of this forest inventory element. Moreover, they reported that the variance in the model error remains the same regardless of the number of first-phase sample plots; however, the error variance is sensitive to the number of second-phase plots. Similar tests were performed by Fassnacht et al. (2018) to evaluate the effect of the sampling intensity on ALS-based forest biomass prediction (closely related to the GSV). Having simulated Scots pine and European beech stands, they reported little influence of the area and number of sample plots on the root mean square error (RMSE), i.e., 4-7 Mg•ha − 1 , but quite a notable effect on the bias. Ruiz et al. (2014) performed a trial to assess the influence of the sample plot area. Having tested plots from 100 to 3600 m 2 , they proposed 500 m 2 as the minimum area for GSV estimation. The opposite conclusion was drawn by Watt et al. (2013), who found LiDAR models for GSV prediction to be insensitive to the plot size and PC density. Regarding ALS PCs, Montealegre et al. (2016) stated that even densities of circa 1 pulse•m − 2 seemed to generate relatively accurate GSV estimates, provided that no systematic error occurred caused by an insufficient number of sample plots (Naesset et al. 2004). On the other hand, Smreček and Danihelová (2013) noted that low-PC density data should be carefully assessed. Although these examples show results obtained with not precisely the same methods and for various study areas, the discrepancies remain noteworthy.

Aim and scope
The earlier discrepancies provided the main foundations of this research, particularly since there are few studies concerning Scots pine-dominated stands in Europe. Moreover, if RS methods of forest inventory are to be widely applied in forestry practices, further studies should be prioritized, given their substantial impact on inventory costs. Furthermore, access to extant large datasets urged us to test the methods described in the literature. At our disposal were field measurements from 3305 sample plots and 305 control stands along with ALS PCs acquired from 5 different forest districts. Our sole target variable in this study was the GSV. Of course, many other variables can be determined as a result of forest inventories, e.g., the tree/stand height, diameter at breast height (DBH), basal area, tree density, and species. Many of these variables can be directly measured using RS data, unlike the GSV, which is usually a product of a statistical model incorporating one or more of the above-listed variables. Due to this complexity, estimation of the GSV requires higher sampling intensity, thus making this variable a robust study case. Moreover, the GSV is a universal index of the state of forest ecosystems (Jung and Mui 2010;EEA 2017). It is also a commonly reported index to both state and international environmental agencies.
Given the above, this study aimed to test the influence of the sampling intensity (i.e., the number and area of sample plots, as well as the ALS PC density) on the performance of the two-phase forest stand inventory using an ALS regression estimator.

Study areas
The following forest districts served as investigation areas: Milicz, Suprasl, Katrynka, Piensk and Herby. Most of the stands were pine-dominated (Pinus sylvestris) stands. Nevertheless, the above-listed districts vary in site quality and environmental conditions, which ensures a more reliable result validation. The Katrynka forest district is situated in a lowland region comprising coniferous sites, where Pinus sylvestris is the dominant tree species. Suprasl is located in the same region, but there are more broadleaved sites. The other districts, such as Herby and Piensk, occur in a strip of the Polish highlands. Pinus sylvestris-dominated stands constitute approximately 70% of the Herby sites. A more diverse share of mixed coniferous and broadleaved sites (approximately 30% − 40%) distinguishes Milicz and Piensk, from the other sites. Table 1 contains a quantitative summary of the listed districts. Their locations are shown in Fig. 1.

Field data
Two types of conventional forest inventories were established for each district: (i) circular sample plots and (ii) a total survey of the control stands. Field measurements were collected in the summer of 2015 for all objects, except Katrynka and Herby, for which surveys were conducted in 2016.
In the first design, a regular network of circular sample plots was deployed across each forest district (Fig. 1). The total number of sample plots varied between the districts (Table 1). In the field, all plots had a fixed area of 500 m 2 . The following properties were determined for those trees with a DBH exceeding 7 cm: species, DBH, height, age, and position. The DBH was measured with callipers, and the tree height was measured with rangefinders. The position of each tree was determined by the azimuth and distance relative to the centre point of a given sample plot. The coordinates of each centre point were recorded for 20 min utilizing global navigation satellite system (GNSS) static measurements. Having captured all of the above properties, the volume of every single tree was estimated according to equations commonly applied by the Polish state forests (Bruchwald et al. 2000). The volume of each sample plot was then determined via summation of tree volumes inside that plot and normalized to per hectare values, according to the level of factor II, i.e., the sample plot area (refer to the section: Factors). Bruchwald (1999) claimed that at the stand level, the expected mean error of those applied equations should not exceed 8% with a 95% confidence interval. Field-based GSV estimates from this type of inventory were utilized to regress the relationship between the ALS metrics and GSV. In addition to the circular sample plots, a portion of the control stands was surveyed in each forest district (Table 1, Fig. 1). As the number of control stands was considerably smaller than that of the sample plots, the former were chosen by stratified sampling, with the use of the stand type as a division criterion (Köhl et al. 2006). The approximate distribution of the various forest types was obtained from previous inventories. The reasoning behind this approach was to capture the most common forest types, given the limited resources in terms of the available number of control stands to be surveyed. The same attributes were determined for the trees within the control stands as for those measured within the sample plots. The only difference was the methodology adopted to determine the tree height. Having measured the DBH of all trees within a given control stand, the height of at least 20 trees per every major species was captured in order to cover the entire DBH range. Eventually, the height of every tree was estimated based on Näslund's DBH-height curve (Siipilehto 2000). As the investigated control stands generally exhibited a one-layer vertical structure, no substantial error was expected from this simplification. Moreover, Bouvier et al. (2019) stressed that DBH/height measurement errors impose a minor to negligible effect upon LiDAR-based biomass estimation for even-aged pine stands. The stand boundaries were first delineated with a GNSS receiver and (if required) manually adjusted based on the canopy height model derived from the ALS data. The mean area of the control stands was approximately 1 ha. The reference variable of interest for the i th stand-GSV REFi (m 3 •ha − 1 )-was calculated as the sum of the single-tree volumes within the stand and normalized to a per hectare value. The data acquired from this type of inventory served as a validation set for the ALS-based GSV estimation.
Airborne laser scanning data ALS PCs were acquired in August 2015 for all study areas except Katrynka and Herby, where flight missions were performed 1 year later. The PCs were acquired with a Riegl LiteMapper LMSQ680i scanner operating at frequencies ranging from 300 to 400 kHz. The average flight altitude above ground level was 550 m, and the PC tile overlap was 30%. These settings provided a PC density of approximately 12-13 pulse•m − 2 . Although the ALS data did not vary much between the study areas, standardization measures were applied to minimize the influence of ALS data variation. For more details concerning the standardization routines, refer to the section below-Factors.

Factors
The first factor analysed was the number of circular sample plots used to calibrate the model for GSV estimation (Eq. 1). The second factor studied was the area of a single sample plot. Except for a fixed radius, the age-dependent radius (ADR) approach was tested. In the ADR, the area of a given sample plot is fixed by the age of the stand in which the plot is located. The following age-area intervals were assumed: 20-40 (100 m 2 ), 40-60 (200 m 2 ), 60-80 (300 m 2 ), 80-100 (400 m 2 ), and > 100 (500 m 2 ). The ADR approach is commonly applied by the Polish state forests (FMM 2012), as the lower variation in the tree dimensions within younger stands questions the need for extensive sample plots, which in turn are more expensive to survey.
The third factor analysed was the PC density (PCd). Regarding this factor, two sets of various ALS metrics were computed for all the plots and control stands. The first set was calculated using a density of 7 pulse•m − 2 , and the second set was computed using a density of 1 pulse•m − 2 . Thinning routines were conducted using the authors' original R script. First, the pulses were ordered by the sending time. The loop was designed in such a manner that in a single iteration, every second pulse (by time) was removed, whereupon the PC density was checked. Subsequent iterations were executed until the desired PC density was reached for a given plot. Afterwards, both sets of PCs were independently classified onto ground and non-ground returns and normalized with a 1-m digital terrain model (DTM) that was interpolated from corresponding ground returns. The inverse distance weighting k-nearest neighbour approach from the lidR package (Roussel et al. 2018) was adopted for DTM interpolation.

General concept
The designed methodology relied on a series of scenarios (simulations), which were unique in terms of the levels of the factors. After the levels had been set for a given scenario, two-phase sampling (Köhl et al. 2006;Miścicki and Stereńczak 2013;White et al. 2013;Naesset 2014) was applied to estimate the GSV of the single stands with the regression estimator (Eq. 1). For example, single-scenario estimates were derived from the model calibrated on 100 sample plots, each 300 m 2 in size, where the ALS metrics were calculated at a PC density of 1 pulse•m − 2 . The estimated stand-level GSV values from a single scenario were then compared with reference values (GSV REF ) derived from ground measurements. As sample plots can be deployed across the investigation area in an infinite number of configurations, 1000 random plot draws were performed under each scenario. Finally, the error distributions were established based on the scores received from every single draw per scenario. Figure 2 shows the above concept. All analyses and data processing steps were performed in the R programming environment (R Core Team 2016).

GSV estimation routine and error computation
The two-phase sampling design with the ALS regression estimator (i.e., the area-based approach or ABA) was applied to estimate the stand GSV. Our implementation of the ABA relied on the determination of the relationship between the dependent variable (here, the GSV) measured on the ground sample plots and LiDAR-derived sample metrics (independent/auxiliary variables), both representing the same spatial locations. Over 200 different ALS metrics calculated for a total of 3305 circular sample plots from all investigation sites were analysed to create a general form of the model (Eq. 1). First, a boosted regression trees (BRT) algorithm (Elith et al. 2008) was applied to reduce the high initial number of predictors by depicting those that were highly correlated and the most significant to the dependent variable. Next, the auto-correlated groups of predictors were eliminated by retaining the most frequently occurring ones in particular BRT iterations. With the reduced set of predictors, using stepwise regression, we developed a relatively simple model that would describe the general relationship and would not favour any of the analysed factors. Equation 1 presents the general form of the proposed estimator. In general, the model was capable of explaining approximately 70%-80% (R 2 ≈ 0.7-0.8) of the GSV variance among the sample plots. where: ŷ -dependent variable ð d GSV Þ (m 3 •ha − 1 ); a 0 , a 1 , a 2 , a 3 regression coefficients; X 1cubic mean of the height values from all returns over a plot; X 2ratio of the number of first returns above the 2nd height stratum * (For each plot/cell, the range from 2 m above ground level up to the 95th height percentile was divided into 10 equal strata) to the number of all first returns; X 3ratio of the number of last returns above the 10th height stratum * to the number of all last returns over plots, X 2 and X 3 akin to Naesset (2002) and Gobakken et al. (2012).
Having linked the relationship between the ALS metrics and the target variable with Eq. 1, the next step was to estimate the d GSV over the control stands under each scenario. A regular wall-to-wall grid was drawn over the boundaries of each control stand, as shown in Fig. 2, step 3. The size of a single grid cell under a particular scenario was identical to the size of the circular sample plots that were used to calibrate the model under this scenario. For instance, if under the s th scenario, the circular sample plots that were used for model calibration had an area of 400 m 2 each, the single grid element was then a 20-m side length square, etc. Next, the spatial extent of the edge cells was truncated to the borders of the stand in which they were located. Each cell's GSV ( d GSV c ) was estimated based on the model predictors (X 1 , X 2 , and X 3 ), which were computed according to the cell area (factor II), the level of the PC density (factor III) set under a given scenario, and the model parameters (a 0 , a 1 , a 2 , and a 3 ) calibrated for the sample plots, which were selected in a given draw (factor I). There were 1000 independent sample draws per scenario to account for the variability of a given forest district structure (simple random sampling without replacement from the grid). No plots from outside the forest district given the specific scenario were considered in model calibration. The final GSV estimate of the i th control stand under the s th scenario and d th draw ( d GSV isd Þ was the mean of all d GSV c values determined for that stand, weighted based on the cell area (Fig. 2, steps 3 and 4). Area weighting was performed to reduce the influence of the edge cells (the silver cells) (Naesset 1997), which often exhibit a tree structure different from that observed in the inner parts of the forest stand. Moreover, edge cells are commonly assigned a larger estimation error, as they usually border roads, meadows, or other different structures. Whenever applied in this study, area weighting decreased the magnitude of the estimation error. The estimated d GSV isd value was juxtaposed with the corresponding reference value-GSV REFi . Based on the obtained residuals, the errors expressed as Eqs. 2 and 3 were calculated for the d th draw and s th scenario. Having computed the errors from all the draws, their distribution under a given scenario could be easily generated, as shown in Fig. 2, step 7.
where: nRMSE dsnormalized root mean square error for the N control stands under the s th scenario and d th draw; nBIAS dsnormalized systematic error for the N control stands under the s th scenario and d th draw; Nnumber of control stands in a forest district; GSV REFireference growing stock volume of the i th control stand (m 3 •ha − 1 ); d GSV isdestimated growing stock volume of the i th control stand under the s th scenario and d th draw (m 3 •ha − 1 ); GSV REFmean reference growing stock volume from all the control stands in a forest district (m 3 •ha − 1 ).

Common trends
Figures 3 (nBIAS) and 4 (nRMSE) reveal the general influences of all 3 factors on the performance of the twophase ABA approach under the applied estimator. The range of the errors obtained is shown according to the levels of particular factors. The gain due to the number of sample plots is shown along the vertical axis of each graph. The gain due to their area is shown from left to right. The whiskers delineate the range where 95% of the best scores were found for a given scenario. The boxplots enclose the interquartile range of the errors, and the centreline indicates the median error. The colours denote the levels of factor III, i.e., the PC density. To ensure figure clarity, we show the results only up to the level of 600 sample plots, as we did not observe any notable improvement in performance when more sample plots were considered.
The most common trend visible in Figs. 3 and 4 is a lack of significant differences among the distributions according to factor III-the PC density. Therefore, this implies that this factor had little influence on the implemented methodology. Next, the number of sample plots (factor I) affected the error distributions more than the sample plot area (factor II). An increase in the number of sample plots improved both the precision, i.e., the range of errors obtained due to repeated application of the sampling procedure, and the accuracy of the estimates, i.e., small absolute error values. This gain was notable only up to a certain level. For instance, we did not observe nBIAS exceeding +/− 5% or a nRMSE value of 20%, above the level of 300 sample plots of 200 m 2 .
However, the results were generally slightly overestimated. The distributions across the sites were similar but not exactly the same. Excluding the most parsimonious scenarios, i.e., < 25 plots of 100 m 2 , the best results were observed for the Katrynka and Herby forest districts, whereas the largest errors occurred for Milicz and Piensk, regardless of the scenario. Moreover, for Milicz, the widest range of errors was observed, which might be due to the slightly more complex stand structure than that at the other sites (≈ 40% of the broadleaved stands).

Accuracy
In general, the nBIAS gradient was consistent across the investigated sites (Table 2, Fig. 3). A small overestimation was observed for all objects except Katrynka, where the transition from overestimation to underestimation occurred between the levels of 100 and 300 m 2 of the sample plot area. We did not observe any substantial change in the nBIAS distribution above the 200-m 2 plot level in the other objects. For the nBIAS ranges, the factor of the number of sample plots suppressed the range of the obtained errors the most. The narrowing effect was the most visible up to a level of approximately 300 sample plots, above which the gain due to this factor diminished. However, regarding this effect, one should bear in mind that according to Newton's binomial theorem, while increasing the number of elements chosen from a finite set of elements, the number of possibilities decreases after a certain subsample level. Therefore, the obtained results should be analysed with a certain degree of caution. However, by increasing the sample size, one should expect more biased results (e.g., > 8%) to be less probable.

Precision
Regarding nRMSE, the distributions (Table 3, Fig. 4) were not as coherent across the districts as was the case for nBIAS. Excluding certain scenarios, e.g., < 100 sample plots of 100 m 2 , the Katrynka forest district attained the lowest nRMSE (≤ 15%) with the narrowest range (≤ 2%), whereas Piensk achieved the highest error rate (≤ 20%), and Milicz was found to have the widest nRMSE range (≤ 5%). Again, the number of sample plots imposed the strongest impact on nRMSE until a level of 300 plots, above which we did not observe nRMSE values higher than 20% in any object, provided that sample plots of at least 200 m 2 were used. No remarkable change in the nRMSE distribution was observed above 300 m 2 . Negligible improvement (up to 1% under most scenarios) was observed with the transition from 1 to 7 pulse•m − 2 , in contrast to nBIAS, where no gain due to the PC density was observed.

Benefits
This study points to the wide benefits of the application of ALS data in conducting forest inventories. The method analysed enables the acquisition of practical information on single-stand wood resources over vast forest areas. The reduction in sampling intensity as a result of using ALS support could likely compensate for the cost of flight missions. Ene et al. (2013) stated that ALSaided inventories can be a cost-efficient alternative to conventional approaches. Lower inventory costs could result in more frequent data updates than in the case of conventional surveys, which are conducted with intervals of a few years. Moreover, traditional design-based methods aim to assess the mean/variance in the entire population (forest district) or specific strata (forest types) (Köhl et al. 2006;Ståhl et al. 2016), providing little or even no knowledge of the attributes of specific population elements. If detailed inventory data are needed, total field surveying might be conducted. Nevertheless, such an assignment must be limited to a particular stand or small areas for economic reasons. Therefore, the main step towards improvement shall be manifested in the possibility of acquiring knowledge of every single stand or even every single tree Packalén et al. 2008;Bergseng et al. 2015) across entire regions, including hardly accessible sites while incurring reasonable costs. Thus, the results of this study provide lookup tables representing exemplary commitment (the sampling intensity) and expected outputs in terms of the error distributions.

Factors
Our results show the broad possibility of maintaining a relatively low sampling intensity. There seems to be no point in increasing the sampling intensity above a certain threshold. First, increasing the PC density from 1 to 7 pulse•m − 2 seems to be pointless if the GSV is the target variable to be assessed at the stand level. Similar conclusions regarding this matter have been drawn by Watt et al. (2013) and Bouvier et al. (2019). Second, practical improvement due to the sample plot area was attained only up to a level of approximately 200/300 m 2 . One has to be careful when using smaller plots on Scots pine-dominated stands as the precision of the GSV estimates may decrease (Fig. 4, Table 3), even up to a nRMSE value of approximately 40% as in Naesset (1997). The ADR also seems to be a promising sampling design, as it can reduce some workload, and the errors obtained did not differ much from fixed-radius scenarios. The number of sample plots was found to be the factor with the strongest influence on the results. By exploiting 300 sample plots, it was possible to maintain nRMSE (Fig. 4) below 20% for Piensk and below 12% for Katrynka, with a minimal gain due to additional sample plots. The nRMSE value varied between the objects. All these findings suggest that the stand structure imposes a significant influence on the estimation precision. Referring to the level of 300 sample plots, the model produced systematic errors within the range of +/− 5%, indicating a slight overestimation tendency (Fig.  3). The obtained bias could have inter alia occurred due to the slightly lower GSV values characterizing the control stands than those characterizing the sample plots (Table 1). Although the bias issue is known and looms large over many model-based estimators (Köhl et al. 2006), we also obtained unbiased scores. As shown in Fig. 3, quite a few draws derived unbiased model parameters, particularly certain sparse scenarios, e.g., 50 plots of 200 m 2 . These results could be due to chance, but as simple random samplings were simulated based on a regular network of circular sample plots, in the second phase of the sampling design, a reduction in random scores is expected if ALS variables are used for stratification and deployment of sample plots, as in the case of Gobakken et al. (2013). Moreover, we do not expect any substantial change in error distribution across the analysed factors even if the validation dataset was a perfect representation of the training sample plots.

Relation to existing studies
How does our research correspond to the results published in other studies? According to Knofczynski (2017), n = 95 would be the minimum recommended sample size if the model used in this research was tested with his method, where sample size estimation involves a number of predictors (3 in this case) and correlation of the strongest predictor (X 1 in Eq. 1) with the criterion (r ≈ 0.8). As a rule of thumb, Bujang et al. (2017) stated that for multiple linear regression, the minimum sample size should be 300 to derive statistics that sufficiently represent population parameters in non-experimental study designs. Following the recommendations by Voorhis and Morgan (2007), Harris (1985) and Green (1991), the minimum recommended sample size in our research would have been 51, 53 and 74, respectively. Regarding empirical studies, Gobakken et al. (2013) reported that 40 sample plots of 250 m 2 allocated with the support of ALS data were enough to provide reliable GSV estimates, i.e., a nBIAS level between 2% and 6% and nRMSE ranging from 15%-18% (average values after 300 iterations). Similar results were found by Bouvier et al. (2019), who also recommended the use of at least 40 plots (but larger than 530 m 2 ) to obtain a robust model for pine plantation biomass estimation; however, as in our study, this led to slightly overestimated results. Junttila et al. (2010) applied non-parametric and Bayesian methods utilizing extra plots from previous missions. In that study, only 30-60 new plots of 250 m 2 were sufficient to provide satisfactory GSV estimates, i.e., a nBIAS SD of 3.5% and a mean nRMSE of 21% out of 50 iterations, at a PC density below 1 pulse•m − 2 . In Stereńczak et al. (2018), 100-200 plots of 500 m 2 produced nBIAS and nRMSE values below +/− 5% and 20%, respectively. Fassnacht et al. (2018) bridged theoretical and empirical approaches for sampling intensity evaluation in a novel manner, by combining real and simulated data. Despite the biomass  15 12 11 14 17 15 12 12 14 17 15 12 13 14 18 14 12 13 14 18 15 12 11 14 18 15 12 12 97.5% 18 21 26 17 16 16 21 21 15 12 16 20 20 13 14 16 21 19 13 14 16 20 19 13 12 16 20 20 14 13 200 Median 16 19 20 13 13 15 19 17 13 12 15 19 17 12 14 15 19 16 12 13 15 19 17 12 12 15 19 17 13 12 2.5% 14 18 16 12 12 14 18 15 12 12 14 18 16 12 13 14 18 15 12 13 15 18 15 12 11 14 18 16 12 12 97.5% 17 20 25 16 15 15 20 20 14 12 15 20 19 13 14 16 20 18 13 14 16 20 18 13 12 16 20 19 13 13   being the target variable, the trends reported were similar to those in our study. As is evident from the above examples, the reality is not too far removed from the theory. Many studies (including ours) have reported nRMSE values oscillating below 20% and nBIAS values up to 5% using a sample size derived from theoretical assumptions. The current research has yet to specify the acceptable intervals for these kinds of errors. The acceptable error magnitude should depend on the data quality, field specification, available resources, and, above all, the inventory goal. In the presented case, the goal was to estimate the standlevel GSV. Ideally, the aptness of this question should be clarified via a comparison to other methods. It is important to recognize the needs and alternative solutions. This study was solely focused on the GSV, as it is closely related to carbon and biomass stocks, which currently seem to be crucial indices for many activities related to climate change mitigation. Nevertheless, ALS-based inventory methods can provide a number of other variables, e.g., the tree height, stem density, basal area, and species identification, many of which may be important to foresters. Therefore, the end-users should also have a say in this regard.

Constraints
However, prior findings cannot be directly compared to ours because we investigated the error rate at the singlestand level, in contrast to many other studies, in which models were validated using only sample plots. Eventually, foresters are interested in stands, not merely sample plots. We recognize this issue as a novelty in the field. Moreover, not exactly the same sampling designs and estimators were applied in the cited examples. Some limitations of our test should also be mentioned. The presented results should be considered with a degree of caution as the following error components have not been excluded by the research: (i) only one instance of the general form of the model (note: Saarela et al. (2015) found the type of regression model to have a moderate effect on the precision) and (ii) only one aggregation function (the area-weighted mean) was used for the transition from the sample plot level to the stand level, (iii) the intrinsic error of dendrometric equations was considered to derive the reference data (Bruchwald 1999), (iv) only object-specific stand structures (mostly Pinus sylvestris-dominated stands) were studied, and (v) one sampling method. Therefore, the obtained distributions can only be considered expected results, although the entire dataset was large. The above-listed issues deserve to be further investigated in greater detail, for example, by testing non-parametric estimators such as random forest or regression trees, especially considering the notable results obtained by Yang et al. (2019), who found random forest imputations to be efficient for small sample sizes (e.g., below 50). The stratification of the area according to the stand structure should also be investigated, as we have observed its potential influence, which would be perhaps more relevant if one had to evaluate more species-diversified forest districts. Lastly, we assumed that applied sampling method would not favour any analysed factor, as the draws were random from a very dense grid of sample plots. It also enabled us getting many possible outcomes. We do not recommend any sampling method in this article, however this certainly should have some influence on the final performance. On the other hand it would dramatically expand the article, thus making it more convoluted. In this study we wanted to present only general trends and possibilities. However, accounting for these factors would probably more reliably validate the results and ensure that they were more generally applicable.

Conclusions
A considerable number of studies dedicated to the utilization of ALS data to aid of contemporary forest inventories have emphasized the relevance of RS techniques for environmental surveys. Knowledge of the relationship between the sampling intensity and possible accuracy may be relevant for decision-makers prior to survey campaigns. Having outlined the above relationship, we can draw the following conclusions. Even a low scanning density such as 1 pulse•m − 2 may be sufficient for the growing stock inventory. The analysed inventory method enables maintaining the sampling intensity at 200 − 300 sample plots ranging from 100 − 200 m 2 . Negligible score improvement is attained above these thresholds. Reduced inventory costs can result in more frequent data updates than in the case of conventional surveys, data from which expire quickly.
In this study, simple random sampling was applied to systematic network of plots. This means no prior knowledge of the surveyed object. However, as in the case of managed forest districts, such information should usually be available. Moreover, certain drawings at sampling levels below the above thresholds generated results comparable to those obtained at higher levels. This could be related to factors such as the sampling scheme and intensity, the estimator type, and their synergistic effect on the overall performance of the ALS-aided stock inventory. The abovementioned aspects support the need for further sampling optimization with respect to other sampling methods and local ecosystem conditions.