Optimal plot design in a multipurpose forest inventory
© Henttonen and Kangas. 2015
Received: 17 June 2015
Accepted: 13 December 2015
Published: 17 December 2015
We explore the factors affecting the optimal plot design (size and type as well as the subsample tree selection strategies within a plot) and their relative importance in defining the optimal plot design in amultipurpose forest inventory. The factors include time used to lay out the plot and to make the tree measurements within the plot, the between-plot variation of each of the variables of interest in the area, and the measurement and model errors for the different variables.
We simulate different plot types and sizes and subsample tree selection strategies on measuredtest areas from North Lapland. The plot types used are fixed-radius, concentric and relascope plots. Weselect the optimal type and size first at plot level using a cost-plus-loss approach and then at cluster level byminimizing the weighted standard error with fixed budget.
As relascope plots are very efficient at the plot level for volume and basal area, and fixed-radius plots for stems per ha, the optimal plot type strongly depends on the relative importance of these variables. The concentric plot seems to be a good compromise between these two in many cases. The subsample tree selection strategy was more important in selecting optimal plot than many other factors. In cluster level, the most important factor is the transfer time between plots.
While the optimal radius of plots and other parameters were sensitive to the measurement times and other cost factors, the concentric plot type was optimal in almost all studied cases. Subsample tree measurement strategies need further studies, as they were an important cost factor. However, their importance to the precision was not as clear.
Optimal inventory sampling design is a very important goal in National Forest Inventories (Mandallaz 2007). The inventory design is optimized in a sense that we wish to have the highest accuracy given a fixed budget or we wish to have the lowest cost for a given accuracy. Optimization is possible, if we make assumptions concerning the population. In an analytical setting, we need to be able to anticipate the population variance (Mandallaz & Ye 1999). It is even possible to optimize the measurements of trees in the plots, for instance to determine how many subsample trees (i.e. second-phase sample trees) to measure out of the total number of tally trees (i.e. first-phase sample trees), if we can anticipate the error in the volume estimates of the tally trees.
Defining optimal sample plot size and type analytically would require that we can anticipate the effects of the plot size and type on the population (or between-plot) variance. If the expected between-plot variation can be expressed as a function of plot size (see Freese 1961, Zeide 1980) the optimal plot size can be calculated analytically. However, such a function can only be an approximation of the between-plot variation as the relationship depends on the characteristics of the population such as spatial pattern of the trees, which cannot fully be described with a model.
In addition the expected costs, measured with time consumption as a function of plot size are needed for optimization. In fixed-radius plots the number of trees in a plot is proportional to plot area, but the time needed to check the borderline trees is proportional to the perimeter (Zeide 1980). In relascope plots, time consumption is inversely proportional to the fixed angle defined by the relascope factor. Although Kulow (1966) and Grosenbaugh & Stover (1957) compared the coefficient of variation using both fixed-radius and relascope plots, they did not compare the overall efficiencies of these two types of plots related to the time spent.
While many factors affecting the accuracy can be accounted for analytically, some aspects like the spatial pattern, are more difficult. The analytical calculations usually assume a random pattern (Mandallaz 2007). Likewise, the number of subsample trees and the selection of measurements taken from each tree (e.g. height and/or upper diameter) can be difficult to account for in detail in an analytical setting. Therefore, the optimal plot size and type has most often been defined by simulating sampling in an accurately measured and mapped forest area. In the earliest studies, simulation was carried out by measuring a grid of small cells and building larger sample plots as their combination (Johnson & Hixon 1952, Mesavage & Grosenbaugh 1956). In later studies, computer simulation based on mapped data has been utilized (e.g. Kulow 1966). In a simulation based on real data, the optimal plot size is heavily dependent on the forest conditions on the area, which makes definite conclusions difficult (Mesavage & Grosenbaugh 1956).
Optimal sampling design and optimal plot design (size and type) depends highly on the purpose of an inventory. It is easy in principle to define an optimal inventory for one variable of interest such as biomass or volume with regard to measurement costs and accuracy. When the number of characteristics of interest increases, the task gets more complicated as the optimal plot number, size and type are likely to be different for each characteristic. For instance, class variables such as land use and its changes could be determined from a very small plot or even point, but volume and biomass require a larger plot. Thus, prioritizing the forest characteristics is needed if an optimal plot is to be determined.
The estimation method is also likely to have an effect: if we assume a design that is based purely on field plots, the optimal plot size and type are likely to be very different from a case where auxiliary information such as remote sensing information is used in stratification (e.g. Tomppo et al. 2014), traditional regression estimation, model-assisted estimation or model-based estimation. In these cases, the variation between plots may not be the decisive factor, but rather the correlation between the forest characteristics and the remote sensing data.
The results may also depend on the specific criterion used for defining the optimum. One option is to minimize some criterion like standard error of the estimate for a given budget constraint such as amount of time (Johnson & Hixon 1952, Mesavage & Grosenbaugh 1956). Using this approach, Johnson & Hixon (1952) concluded that while long and narrow rectangular plots tended to have smaller between-plot variation, the time needed to lay out such plots was larger. Thus, the most efficient plots for a given amount of time were compact plots.
Another way to define the optimal plot size is to use a cost-plus-loss (CPL) approach (Hamilton 1978, Ståhl 1994). It means that the losses due to poor estimates (possibly resulting sub-optimal decisions) are calculated as a function of the uncertainty involved and these losses are added to the measurement costs described as a function of measurement time. This criterion would be ideal, if the losses due to poor estimates could be accurately defined. Often the losses are described as a function of the standard error or some other criterion (Barth & Ståhl 2012), but they could also be calculated for an actual decision problem (Eid et al. 2004). When the inventory is multipurpose, the cost-plus-loss method is more complicated (see Burkhart et al. 1978). If we were able to define the losses due to the poor estimates for each of the variables of interest (i.e. give relative weight to the errors of each variable), it is possible.
Total measurement costs can be calculated as a function of time used for each sample plot. The time depends on: 1) the time required to go to the plot and lay out the plot; 2) the total number of trees to be measured and 3) the measurements carried out for each tree. Laying out the plot means defining the plot center (or center for several sub-plots) and determining which trees belong to the plot(s). For circular or relascope plots that means checking the distance of borderline trees from the plot center with a measuring tape or an (optical) rangefinder (e.g. Loetsch et al. 1973).
The measurements needed for each tree depend on the characteristics of interest (e.g. volume, biomass, stems per ha). Typically not all characteristics needed are measured on all trees within a plot. The diameter at breast height (d1.3) is measured for all tally trees, but height, upper diameters, age, and growth are measured only for subsample trees. Thus, the measurement time also depends on the number of subsample trees within each plot, and the number of measurements carried out on each tree. As biomass and volume require additional subsample tree measurements compared to stems per ha, also the time consumption needs to be defined separately for each of the variables.
The precision of the sample plot measurements in describing the forest stand can be measured using the standard errors of the estimators of given forest characteristics, which depends, in part, on the spatial variation of the characteristics of interest within the forest. In general, the bigger the sample plot area, the larger the proportion of total variation that falls within the plot, and consequently the smaller the standard errors (e.g. Loetsch et al. 1973, Koivuniemi 2003).
Measurement and model errors for the variables used to calculate the characteristics of interest have also an effect on precision (e.g. Päivinen 1987, Ståhl et al. 2014). Their combined effect again depends on the number of subsample tree measurements and the models/methods available to generalize the subsample tree measurements to the tally trees. It may be assumed that the errors in volume / biomass for subsample trees are negligible, but not for the tally trees. It is quite possible that the model which is most efficient when all measurements are assumed error-free is not the most efficient when these errors are included (Eid 2003). Therefore, it would be best to select the models used for generalizing the subsample tree characteristics to tally trees simultaneously with deciding the number of subsample trees and the variables measured for each of them.
The aim of this study is to analyze optimal sample plot type and size with a simulation study and explore the relative effects of different factors on optimal plot measurement strategy in the special conditions of North Lapland. The study region is partially located close to the northern timberline, where clustered spatial patterns of trees challenge the planning of an efficient forest inventory. The studied plot types were fixed-radius plots with varying radii, a combination of two concentric plots with varying radii and varying diameter limits for the larger radius, and relascope plots with varying relascope factor and maximum radii. The forest characteristics concerned were volume, basal area and stems per ha. The class variables such as forest/non-forest were excluded from the study.
Analysis of the point patterns of trees on the 50 m x 50 m areas was carried out using the R package spatstat (Baddeley and Turner 2005). Our main interest was in assessing whether the point patterns could be considered random (Poisson). For this purpose, we carried out a simultaneous (simultaneous for different values of the distance r) Monte Carlo test for Ripley’s K–function and L-function, which is a variance stabilizing transformation of K. Inhomogeneity was taken into account by modelling trends as a function of coordinates. In areas divided into two different stands, the stand was used as an indicator variable in modelling inhomogeneity.
Analysis of the optimal plot design was carried out at two levels: plot level and cluster level. The cluster is interpreted here as a combination of m plots, but any specific spatial arrangement for the cluster is not determined. If the spatial arrangement were specified, the cluster could also be interpreted as a plot with m sub-plots.
where n 2 is the number of tally trees, n 1 is the number of borderline trees and n 3 he number of subsample trees (Päivinen 1987). A tree was defined as borderline tree if its distance from the plot center differed less than 0.5 m from the radius for a tree with a given size.
Mean volumes, stem per ha and basal areas of the 50 m x 50 m mapped test areas
where the within-area variation depends on the plot type and size but the between area variation does not.
The tested combinations of plots
factor RF m2
Diameter limit DL
3,4,5,6,7,8,9,10 and 11
5, 7.5,10,12.5,and 15
5, 7.5,10,12.5,and 15
5, 7.5,10,12.5,and 15
5, 7.5,10,12.5,and 15
5, 7.5,10,12.5,and 15
6,7,8,9,10 and 11
6,7,8,9,10 and 11
6,7,8,9,10 and 11
6,7,8,9,10 and 11
6,7,8,9,10 and 11
We assumed that diameter on tally trees is measured in two directions. Here we tested two subsample tree selection strategies. In the first one (fixed strategy or S1), tally trees with d1.3 > 25 cm were measured as subsample trees, along with all tally trees closer than 1 m to the plot center. The assumption here is that large trees are more important subsample trees than small trees, as they contribute more to the plot volume and the variance of their volume estimates is higher than that of small trees (see discussion below). In the second strategy (relascope strategy or S2) the subsample trees were selected using a relascope factor 5 m2/ha, also assuming that large trees are more important than small trees. We assumed that the volume of the subsample trees could be measured error free (in fact there is error but it is assumed negligible), while for tally trees we assumed an error.
If the weight of RMSE for stems per ha was tripled (ceteris paribus), the optimal radius for fixed-radius plots was 7 m. In this case, the smallest CPL was obtained with fixed-radius plots (27.40). Thus, when stems per ha is important enough, the fixed-radius plot is the most efficient. For the concentric plot the optimal diameter limit changed from 15 to 5 cm. If the weight of the volume RMSE was tripled (ceteris paribus), the maximum radius of relascope plots increased to 8 m. The optimal plot was a concentric sample plot with radii 9 / 6 m and diameter limit of 15 cm (CPL = 28.31).
A more marked change occurred when the relative importance of losses compared to costs was reduced to 0.01 for all variables (Equation 2). In that case, the optimal fixed-radius plot radius was 3 m, the optimal relascope factor 3 m2/ha with a maximum radius 6 m, and for concentric plots the optimal radii were 5 /3 m with the diameter limit of 15 cm. In this case, the concentric plot had the smallest CPL (6.14) but the relascope plot was very close (6.18). That means that for all plot types, the optimal plot size was the smallest considered. When the weight of losses was increased compared to costs (weight 0.2 for all variables), the optimal radius for the fixed-radius plot was 8 m, the optimal relascope factor was 1 m2/ha with a maximum radius of 10 m, and the optimal concentric plot had radii of 11 / 7 m with a diameter limit 15 cm. It also had the smallest CPL (36.69).
In this study, we analyzed the effect of plot type (fixed-radius plots, a combination of two concentric plots with a varying diameter limit, and relascope plots with varying maximum radius), different plot size (varying radii or relascope factor) and two different strategies for measuring subsample trees within plots (either all trees with d1.3 > 25 cm and all trees within 1 m from the plot center, or with relascope factor 5 m2/ha). We examined three different variables, volume, basal area and stems per ha in order to reach a compromise solution that would be suitable for many other variables as well. We did not include class variables such as forest/non-forest classification or forest site or type classification, although these are important variables in forest inventory. In plot-level considerations, a very small plot or even a point would be optimal for many of these variables. Thus, including these variables to the calculations would make more sense if the whole design were optimized rather than just the plot type and size.
Relascope plots were most efficient for volume and basal area, but not as efficient for stems per ha. For stems per ha, fixed-radius plot were optimal. When the weight for stems per ha is increased enough, the fixed-radius plot becomes optimal overall. If we considered an inventory purely for stems per ha or basal area, subsample trees would not be needed at all. In fixed sized plots, measuring the diameters would neither be necessary except for borderline trees. Subsample tree selection strategies would thus be irrelevant for such an inventory. However, if the subsample tree measurement costs were removed, and the measuring cost of each tally tree would be reduced (some time would be needed to record the species and check the borderline trees), the conclusion would still be the same: relascope plot type is the best for basal area and fixed-radius plot is the best for stems per ha. While in principle there are no advantages in using fixed radius over variable radius (Stage & Rennie 1994), in relascope plots the effective plot size for small trees constituting most of the stems per ha is so small that the relascope plot type is very inefficient for stems per ha. Concentric sample plots were a good compromise between efficiency and accuracy. It also turned out that the optimal plot radius in the tested area was somewhat smaller than the one used in current Finnish NFI, 9 m.
We studied two different subsample tree selection strategies. The strategy of measuring all large trees along with trees close to the plot center produced, on average, from 0.5 to 1.5 subsample trees in the different variations of the concentric plots, while the relascope selection strategy produced from 1.4 to 1.6 subsample trees. Although the difference may seem to be small, the strategies differed quite a lot with respect to measurement time. On the other hand, the differences in relative RMSE of volume were not large. Both the strategies acknowledge that the largest trees have largest variation in the volume estimates (Fig. 3), which makes them more attractive as subsample tree candidates. So, the relascope strategy with a larger relascope factor could have been more efficient still. The model was estimated from all the trees measured from the test area, and it therefore produced zero mean error for the whole area, but not necessarily within each diameter class. However, possible bias is implicitly accounted for in the simulations.
The results of the measurement strategy suggest that very few subsample trees would be needed. However, in this study volume was the only characteristics which required subsample tree measurements. Other variables, such as (total) biomass, might require more subsample trees as biomass models using only d1.3 as predictor are generally less precise than similar volume models. We also considered only temporary plots here. If we had analysed growth of the trees using permanent plots, more subsample tree measurements might prove to be needed, as the estimated growth per tree would be more reliable. These issues remain to be studied in the future.
The optimal plot size and number is quite sensitive to the assumed times to move from plot to plot or of measuring the trees. The concentric plot type was the best plot type for both plot-level and cluster-level calculations, and practically irrespective of the changes in the parameters in the cost function or the weights of different variables. On the other hand, this result can depend on the conditions in Lapland, and in other condition such as southern Finland or tropical areas some other plot type would be optimal.
We did not consider the effect of diameter distribution in this study, but it may also have an effect on the optimal plot type and size. In our northern data, 45 cm was the largest diameter at breast height (the maximum diameter within one test area varied from 19 to 45 cm). If the variation had been greater, the variation among the smaller sized plots would most likely have been higher. This also remains to be studied in the future.
We did a preliminary analysis about the effect of point pattern on accuracy. The areas with clustered patterns seemed to have higher between-stands variation in RMSE, although the size of the tree clusters was quite small compared with the tested plot radii. The RMSE on one study area with regular point pattern seemed to be less sensitive to the plot radii than the Poisson and clustered patterns. This might be of importance in the planning of inventories in the future, since the area and volumes of regular planted forests is rapidly increasing in Finland. The effect of point patterns on optimal sampling needs further study and modelling efforts.
If remote sensing material was used as auxiliary data and a model-assisted or model-based framework was employed instead of e.g. simple random sampling, a larger plot size might be optimal (see e.g. Hofstadt et al. 2015). This is because we assume the correlation between the remote sensing data and plot data to be higher with larger plots due to e.g. co-registration errors. Moreover, remote sensing registers crowns rather than stems, and crowns of trees included into the plot will often be partly outside the plot boundaries, and respectively the crowns of trees not included into the plot will be partly inside the plot boundaries. Within larger plots, the effect of crown overlapping should be smaller. This also remains to be studied in the future.
In this study, we searched for an optimal plot type and size at plot level, i.e. for the case when the number of plots is fixed, and for a case where the optimal number of plots in a cluster was defined simultaneously with the plot type and size, i.e. for the case where the number of clusters is fixed. The analysis is valid for a wide range of sampling designs used. However, the resulting optimal plot design could be sub-optimal if also the sampling design and total plot number were simultaneously optimized. For instance, it might be better to measure a large number of small clusters (like half day clusters) rather than a small number of large clusters. Or it might be better to measure less and larger plots if remote sensing material were used as auxiliary data. Unfortunately the data we had available is not large enough for such analysis.
The relative importance of the optimal plot type and size in defining the optimal sampling design has not been defined. Based on our results, we would recommend that the whole chain of decisions from measurements, plot type, plot size, number of plots (total and/or within a cluster), number of clusters, cluster design (spatial arrangement of the plots within a cluster), sampling design and the estimation method should be simultaneously defined. Such analysis would, however, require a very large area that is measured in detail, with very large costs. Nowadays, a simulated forest might be a better option (e.g. Päivinen 1987). The design has often been optimized using a forest map based on a satellite image (e.g. Tomppo et al. 2010, 2011), but while that approach allows for selecting the optimal cluster design and number of plots within a cluster, it does not include enough information for selecting the optimal plot type. A mapped forest area based on individual tree detection from a lidar data (Holopainen et al. 2013) might provide a good starting point for a data where total optimization is possible.
While the optimal radius of a plot and other design parameters were quite sensitive to the measurement time and other cost factors, the concentric plot type was optimal in almost all studied cases. It is important to select a plot size that would be near optimal in many different conditions. Here, for instance, a 6–7 m radius and 10 cm diameter limit was optimal or near optimal option in most calculations. Yet, it needs to be noted that the results were calculated for Northern Finland, and elsewhere a separate optimality analysis would be needed.
The more weight is given to the costs compared to the RMSEs of the variables of interest, the smaller the optimal plots with a fixed plot number. With fixed budget, having more, smaller plots is optimal, if the transfer time between the plots is short. However, the distance between the plots within a cluster and therefore also the transfer time needs to be selected long enough to avoid high autocorrelation between the plots.
Subsample tree selection and measurement strategies need further studies, as subsample trees are a quite important cost factor but their importance to the accuracy of the final results was not as clear. The errors for tally trees had little impact on the accuracy of volume, but when other variables such as volume growth is analyzed, the subsample tree measurements may be of greater importance.
This study was funded by Natural Resources Institute Luke.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Baddeley A, Turner R (2005) Spatstat: an R package for analyzing spatial point patterns. J Stat Software 12(6):1–42. ISSN: 1548–7660. URL: http://www.jstatsoft.org/article/view/v012i06
- Barth A, Ståhl G (2012) Determining sample size in national forest inventories by cost-plus-loss analysis: an exploratory case study. Eur J For Res 131:339–346View ArticleGoogle Scholar
- Burkhart HE, Stuck RD, Leuschner WA, Reynolds MA (1978) Allocating inventory resources for multiple-use planning. Can J Forest Res 8:100–110View ArticleGoogle Scholar
- Eid T (2003). Model validation by means of cost-plus-loss analyses. In: Amaro A, Reed D, Soares P (eds) Modelling forest systems. Cambridge USA: CABI Publishing. pp 295–305.Google Scholar
- Eid T, Gobakken T, Næsset E (2004) Comparing stand inventories for large areas based on photo-interpretation and laser scanning by means of cost-plus-loss analyses. Scand J For Res 19:512–523View ArticleGoogle Scholar
- Freese F (1961) Relation of plot size to variability: an approximation. J For 59:679Google Scholar
- Grosenbaugh LR, Stover WS (1957) Point-sampling compared with plot-sampling in Southeast Texas. For Sci 3:2–14Google Scholar
- Hamilton DA (1978) Specifying precision in natural resource inventories. In: Integrated inventories of renewable resources: proceedings of the workshop. Tuscon, Arizona Usa: USDA Forest Service, General technical report RM-55:276–281.Google Scholar
- Hofstadt EH, Gobakken T, Solberg S, Kangas A, Ene L, Mauya E, Næsset E (2015) Relative efficiency of ALS and InSAR for biomass estimation in Tanzanian rainforest. Remote Sens 7:9865–9885View ArticleGoogle Scholar
- Holopainen M, Kankare Vi, Vastaranta M, Liang, X, Lin Y, Vaaja M, Yu X, Hyyppä J, Hyyppä H, Kukko A, Tanhuanpää T, Alho P (2013) Tree mapping using airborne, terrestrial and mobile laser scanning – A case study in a heterogeneous urban forest. Urban Forestry & Urban Greening 12:546–553. doi:10.1016/j.ufug.2013.06.002 DOI:10.1016/j.ufug.2013.06.002#doilink
- Johnson FA, Hixon HJ (1952) The most efficient size and shape of plot to use for cruising in old-growth douglas-fir timber. J For 1:17–20Google Scholar
- Koivuniemi, J (2003) Metsiköihin ja paikannettuihin koealoihin perustuvan kuvioittaisen arvioinnin tarkkuus. Summary: The accuracy of the compartmentwise forest inventory based on stands and located sample plots. Doctoral thesis. Publications of the Department of Forest Resource Management 36. University of Helsinki, Helsinki, p 160 Google Scholar
- Kulow DL (1966) Comparison of forest sampling designs. Journal of Forestry July 469–474.Google Scholar
- Laasasenaho J (1982) Taper curve and volume functions for pine, spruce and birch. Commun. Inst. For. Fenn. 108. p 72Google Scholar
- Loetsch F, Zöhrer F, Haller KE (1973) Forest Inventory Volume 2. BLV Verlagsgesellschaft, München, p 469Google Scholar
- Mandallaz D, Ye T (1999) Forest inventory with optimal two-phase, two-stage sampling schemes based on the anticipated variance. Can J Forest Res 29:1691–1708View ArticleGoogle Scholar
- Mandallaz D (2007) Sampling techniques for forest inventories. Chapman & Hall. p 256Google Scholar
- Mesavage C, Grosenbaugh LR (1956) Efficiency of several cruising designs on small tracts in North Arkansas. Journal of Forestry September 569–576.Google Scholar
- Päivinen R (1987) Metsän inventoinnin suunnittelumalli. [A planning model for forest inventory, In Finnish]. University of Joensuu publications in Sciences, Joensuu, N:o 11, p 179 Google Scholar
- Stage AR, Rennie JC (1994) Fixed-radius plots vs. variable-radius plots. J For 92:20–24Google Scholar
- Ståhl G (1994) Optimizing the utility of forest inventory activities. Ph.D. thesis, Swedish University of Agricultural Sciences, Department of Biometry and Forest Management, Umeå.Google Scholar
- Ståhl G, Heikkinen J, Petersson H, Repola J, Holm S (2014) Sample-based estimation of greenhouse gas emissions from forests - a new approach to account for both sampling and model errors. For Sci 60(1):3–13Google Scholar
- Tomppo E, Gschwanter T, McRoberts RE, Lawrence M, Editors (2010). National forest inventories – pathways for common reporting. Springer. ISBN 978-90-481-3232-4.Google Scholar
- Tomppo E, Heikkinen J, Henttonen HM, Ihalainen A, Katila M, Mäkelä H, Tuomainen T, Vainikainen N (2011). Designing and conducting a forest inventory - case: 9th National Forest Inventory of Finland. Springer, Managing Forest Ecosystems 21, p 270. ISBN 978-94-007-1651-3Google Scholar
- Tomppo E, Malimbwi R, Katila M, Mäkisara K, Henttonen HM, Chamuya N, Zahabu E, Otieno J (2014) A sampling design for a large scale forest inventory: case Tanzania. Can J Forest Res 44:931–948View ArticleGoogle Scholar
- Zeide B (1980) Plot Size Optimization. For Sci 26:251–257Google Scholar