- Research article
- Open Access
A spatially-explicit count data regression for modeling the density of forest cockchafer (Melolontha hippocastani) larvae in the Hessian Ried (Germany)
- Matthias Schmidt^{1}Email author and
- Rainer Hurling^{1}
https://doi.org/10.1186/s40663-014-0019-y
© Schmidt and Hurling; licensee Springer. 2014
- Received: 10 March 2014
- Accepted: 15 September 2014
- Published: 8 October 2014
Abstract
Background
In this paper, a regression model for predicting the spatial distribution of forest cockchafer larvae in the Hessian Ried region (Germany) is presented. The forest cockchafer, a native biotic pest, is a major cause of damage in forests in this region particularly during the regeneration phase. The model developed in this study is based on a systematic sample inventory of forest cockchafer larvae by excavation across the Hessian Ried. These forest cockchafer larvae data were characterized by excess zeros and overdispersion.
Methods
Using specific generalized additive regression models, different discrete distributions, including the Poisson, negative binomial and zero-inflated Poisson distributions, were compared. The methodology employed allowed the simultaneous estimation of non-linear model effects of causal covariates and, to account for spatial autocorrelation, of a 2-dimensional spatial trend function. In the validation of the models, both the Akaike information criterion (AIC) and more detailed graphical procedures based on randomized quantile residuals were used.
Results
The negative binomial distribution was superior to the Poisson and the zero-inflated Poisson distributions, providing a near perfect fit to the data, which was proven in an extensive validation process. The causal predictors found to affect the density of larvae significantly were distance to water table and percentage of pure clay layer in the soil to a depth of 1 m. Model predictions showed that larva density increased with an increase in distance to the water table up to almost 4 m, after which it remained constant, and with a reduction in the percentage of pure clay layer. However this latter correlation was weak and requires further investigation. The 2-dimensional trend function indicated a strong spatial effect, and thus explained by far the highest proportion of variation in larva density.
Conclusions
As such the model can be used to support forest practitioners in their decision making for regeneration and forest protection planning in the Hessian Ried. However, the application of the model for predicting future spatial patterns of the larva density is still somewhat limited because the causal effects are comparatively weak.
Keywords
- Forest cockchafer
- Larvae
- Negative binomial distribution
- Poisson distribution
- Zero-inflated poisson distribution
- Systematic sample inventory
- Generalized additive model
- Spatial autocorrelation
- Randomized quantile residuals
1Background
The forests in southern Hessen in the vicinity of the Rhine-Main urban agglomeration are among the most problematic forest areas for forest management in Central Europe. Here extraordinary demands are made on forests and forest enterprises in view of the high population density, highly concentrated industrialization and dense road and traffic infrastructure. Urban agglomeration has led to the acquisition of considerable land area for development, which, in turn, has led to an unusually high fragmentation of the forest area. Furthermore environmental impacts, including an increase in emissions of air pollutants and the serious lowering of the groundwater table arising from high water usage, are evident. Forestry management options are severely restricted in this region due to the exceptional importance of forests, particularly old oak forests, for recreation and nature conservation (NW-FVA, Nordwestdeutsche Forstliche Versuchsanstalt Hrsg. [2013]): p. 32 ff.). The abiotic pressures and especially the wide-spread drop in groundwater levels have already degraded many forests. Additionally, the massive outbreaks of biotic pests like the forest cockchafer (Melolontha hippocastani), the European oak leafroller (Tortrix viridana) and the gypsy moth (Lymantria dispar) have in part led to a total destruction of forests in some areas. It is anticipated that site conditions will degrade further in the future if the groundwater depletion intensifies and if climate change leads to higher temperatures and lower precipitation as projected (NW-FVA, Nordwestdeutsche Forstliche Versuchsanstalt Hrsg. [2013]): p. 40 ff.). Already the Rhine-Main region is one of driest and warmest regions in Hessen.
One of the most serious regional biotic risks, especially in the forest regeneration phase, is the forest cockchafer (Melolontha hippocastani). The northern Upper Rhine region is populated by different tribes of the forest cockchafer each of which has a specific quadrennial life cycle and swarming year. The forest cockchafer population in the Hessian Ried belongs to the South Hessian tribe. Since the early 1980s a massive outbreak of this tribe has been observed in some parts of the Hessian Ried. The area of forest populated by the forest cockchafer has increased constantly and this trend is expected to continue. An area covering 13 000 hectares of the total forest area of 30 000 hectares is assumed to be populated by this species. Of this area, 4 000 hectares are assumed to be infested by extremely high larva densities.
- 1)
predict larva density by area across the Hessian Ried for decision support;
- 2)
identify significant causal variables and quantify their effects to enhance the generality of the model and to gain greater insight into the suitability of a site to serve as a habitat for larvae.
Moreover, the development of a generalized model approach that could be extended for the investigation of spatial population densities of the forest cockchafer larvae in the future as time series data become available was envisaged. The approach would ensure the future relevance of the model to forest managers in the planning and implementation of optimal spatial forest regeneration and pest control measures and would allow for the investigation of the change in the spatial pattern of larvae density over time.
2Methods
2.1 Data
In the summer of 2009 counts of larvae and additional potential covariate data were sampled over a regular quadratic grid with the sample plots arranged 500 m apart. In a small area of 994 hectares, a denser sample grid of 250 m × 250 m was employed to obtain additional data needed for further investigations in forest protection measures (Figure 1B). Each sample plot on the larger and smaller scale grid system consisted of 4 subplots each 50 cm × 50 cm (0.25 m^{2}) in size, which were excavated to count the number of larvae present. These subplots were located 10 m away from the plot center in the 4 cardinal directions.
The excavation of a subplot was only conducted if it was located in a forested area. In few cases one or more of the subplots in a sample plot were ignored because they were located in a bog, in settlement areas etc. In total, data from 1 276 sample plots could be used for constructing the model. The excavated soil was carefully searched for forest cockchafer individuals at all relevant life stages that were expected to swarm in 2010. Hence, not only the number of 3-year-old larvae, but also the number of pupae and imagines were recorded.
For modeling the sum of larvae in the 4 × 0.25 m^{2} subplots at each sample point, that is the larva density per square meter, was used. The subplots were treated as one sample plot for larva excavation 1 m^{2} in area since the distances between the 4 subplots were small compared to the distances between the sample points on the sample grid. The maximum depth of excavations was 1 m. However, at several subplots, the mechanical resistance of the soil prevented excavations reaching this depth. In this case, it was assumed that, if the mechanical resistance was too high for excavating, it was also too high to serve as a habitat for larvae.
Distribution of larva density at 1 276 sample plots recorded in the larva density inventory and assigned covariates distance to water table ( DWT ) and clay thickness ( CTH)
Larvae per m^{2} | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | >10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Number of sample points = 1276 | 664 | 100 | 82 | 62 | 56 | 53 | 33 | 28 | 23 | 27 | 13 | 148 |
Percentage (%) | 52.0 | 7.8 | 6.4 | 4.9 | 4.4 | 4.2 | 2.6 | 2.2 | 1.8 | 2.1 | 1.0 | 11.0 |
Min. | 1 ^{ st } Quartil | Median | Mean | 3 ^{ rd } Quartil | Max. | |||||||
Distance to water table October 2007 (m) | 0.29 | 3.24 | 5.20 | 7.56 | 11.45 | 35.16 | ||||||
Clay thickness (%) | 0.0 | 0.0 | 0.0 | 2.9 | 0.0 | 100.0 |
2.2 Methodology
2.2.1 Regression models for count data
In the Poisson distribution, the conditional mean equals the conditional variance and is determined by the multiplication of the exponentials of the predictor variables. By applying the inverse link-function, positive predictions result from the model. However in many applications, the empirical variance is found to be higher than the value assumed by the Poisson distribution. In this case, an overdispersion parameter is introduced, and the conditional variance is defined as follows: Var (y _{ i } |x _{ i } ) = ϕλ _{ i } with the unknown constant overdispersion parameter ϕ.
with g _{1}(.): link-function: natural logarithm
with g _{2}(.): link-function: natural logarithm
and g _{2}(.): link-function: logistic link function
2.2.2 Generalized additive models
f _{1}, f _{2}, f _{3},…, f _{ k }: 1-dimensional smooth functions
f _{ n }: 2-dimensional smooth function for modeling a spatial trend.
Assuming negative binomial and zero-inflated Poisson distributions, both linear predictors (Eqs. 2a/2b and 3a/3b) were checked for non-linear model effects. Subsequently it was proofed if significant non-linear effects could be adequately approximated by segmented linear effects. All additive regression models were fitted with a 2-dimensional smoothing function of the geographic location to test the data for spatial autocorrelation, which could not be described by observed causal predictors. To parameterize the model, the statistical language and environment R (R Development Core Team [2010]) was adopted using the two libraries gamlss (Rigby and Stasinopoulos [2005]) and mgcv (Wood [2006]). If possible the models were fitted using the library mgcv only. If the distribution assumptions or specific linear predictors could not be specified with the functions of the library mgcv, functions of both libraries were combined: functions of gamlss were used to specify the distribution functions for the response variable, and functions of mgcv were used to apply specific smoothing techniques, including the 2-dimensional smooth function. The library gamlss was adopted because it enables a variety of continuous and some discrete distributions to be modeled and extends the classical generalized linear (McCullagh and Nelder [1989]) and additive (Hastie and Tibshirani [1990]) models, which are otherwise limited to the distributions of the exponential family (Fahrmeir et al. [2007]): 218.
2.2.3 Validation
In the validation procedure randomized quantile residuals were calculated (Dunn and Smyth [1996]) to determine which distribution assumption and model specification described the pattern of larval counts best. Randomized quantile residuals have been found to be superior in the validation of regression models for continuous and discrete response variables to Pearson and deviance residuals (Dunn and Smyth [1996]).
Given the hypothesis that f(y _{ i }, μ _{ i }, ϕ) is the correct model for the observations y _{1},…,y _{ n }, all r _{ i } follow approximately a standard normal distribution. Deviations only result from sampling variability in ${\widehat{\mu}}_{i}$ and $\widehat{\phi}$. Hence standard methods such as the ‘normal quantile-quantile plot’ qqplot or the Kolmogorov-Smirnov test can be used to validate the regression models. An application of this approach in forestry is given by Zucchini et al. ([2001]).
where u _{ i } is a uniform random variable in the interval (a _{ i }, b _{ i }] with ${a}_{i}\phantom{\rule{0.25em}{0ex}}=\phantom{\rule{0.25em}{0ex}}\mathit{li}{{m}_{y}}_{\uparrow \mathit{yi}}F\left({y}_{i},{\widehat{\mu}}_{i},\widehat{\varphi}\right)$ and${b}_{i}=F\left({y}_{i},{\widehat{\mu}}_{i},\widehat{\varphi}\right)$.
Randomized quantile residuals are also approximately standard normal distributed if the model assumed is correct. Further insight into the adequacy of the models can be gained by wormplots that are detrended ‘normal quantile-quantile plots’ (qqplots) (Buuren and Fredriks [2001]). Here ‘detrended’ means that the empirical quantiles are subtracted from their corresponding standard normal quantiles. In a wormplot, these detrended quantiles are plotted against their corresponding (theoretical) standard normal quantiles. Wormplots highlight deviations from an assumed theoretical distribution more clearly than qqplots. Additionally, conditional wormplots were calculated for different intervals of DWT. Conditional wormplots are employed to detect potential intervals of predictor variables where the models do not fit adequately.
Furthermore the assumption of spatially independent model residuals and the methodology of 2-dimensional surface fitting for modeling spatially correlated data were validated. Therefore simple empirical semi-variograms of the randomized quantile residuals were calculated for various models with spatial trend functions of different complexity. To construct 95% confidence intervals for the null hypothesis of spatial independency, 1 000 random permutations of the coordinates were conducted. Subsequently the semi-variograms of the randomized quantile residuals were estimated and the pointwise 2.5% and 97.5% quantiles calculated.
Finally model computations were conducted for specific settings of predictor variables to illustrate the sensitivity of model predictions to varying conditions.
3Results
3.1 Model selection
with g(.) : link-function: natural logarithm, and GD _{ i } ~ Poisson (λ _{ i }) with E(GD _{ i }) = λ _{ i } and Var (GD _{ i }) = λ _{ i }; GD _{ i } = 0, 1, 2,….
GD _{ i } : larva density at plot i (n · m^{−2})
DWT _{ i } : simulated distance to water table in October 2007 at plot i (m)
CTH _{ i } : modeled clay thickness at plot i (%)
east _{ i }, north _{ i } : Gauß-Krüger east and north coordinates of plot i defined in relation to the 3^{rd} meridian
f _{1}, f _{2} : 1-dimensional smooth functions (penalized thin plate regression splines)
f _{3} : 2-dimensional smooth function (penalized thin plate regression spline)
Various Poisson regression models validated during model selection
Model | AIC | Dispersion parameter | |
---|---|---|---|
g(λ _{ i }) = β _{0} | 7.1 | 12263.0 | 11.71^{***} |
g(λ _{ i }) = β _{0} + f _{1}(DWT _{ i }) | 7.2 | 10881.7 | 9.64^{***} |
g(λ _{ i }) = β _{0} + f _{1}(DWT _{ i }) + f _{2}(CTH _{ i }) | 7.3 | 10692.3 | 9.31^{***} |
g(λ _{ i }) = β _{0} + f _{1}(DWT _{ i }) + f _{2}(CTH _{ i }) + f _{3}(east _{ i }, north _{ i }), edf for f _{3}(east _{ i }, north _{ i }) = 28.753 | 7.4 | 6193.4 | 4.34^{***} |
g(λ _{ i }) = β _{0} + f _{1}(DWT _{ i }) + f _{2}(CTH _{ i }) + f _{3}(east _{ i }, north _{ i }), edf for f _{3}(east _{ i }, north _{ i }) = 121.619 | 7.5 | 5394.4 | 2.75^{***} |
g(λ _{ i }) = β _{0} + f _{1}(DWT _{ i }) + f _{2}(CTH _{ i }) + f _{3}(east _{ i }, north _{ i }), edf for f _{3}(east _{ i }, north _{ i }) = 198.121 | 7.6 | 5128.1 | 2.31^{***} |
g(λ _{ i }) = β _{0} + f _{1}(DWT _{ i }) + f _{2}(CTH _{ i }) + f _{3}(east _{ i }, north _{ i }), edf for f _{3}(east _{ i }, north _{ i }) = 321.446 | 7.7 | 4745.3 | 1.90^{***} |
Various negative binomial regression models validated during model selection
Model | AIC | Explained deviance (%) | Dispersion parameter | |
---|---|---|---|---|
g _{1}(μ _{ i }) = β _{01} | 8.1 | 5335.4 | 0 | g _{2}(1/ϕ) = β _{02}; |
ϕ = 0.255 | ||||
g _{1}(μ _{ i }) = β _{01} + f _{11}(DWT _{ i }) | 8.2 | 5207.1 | 12 | g _{2}(1/ϕ) = β _{02}; |
ϕ = 0.308 | ||||
g _{1}(μ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) | 8.3 | 5171.9 | 15.7 | g _{2}(1/ϕ) = β _{02}; |
ϕ = 0.323 | ||||
g _{1}(μ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) + f _{31}(east _{ i }, north _{ i }), edf for f _{31}(east _{ i }, north _{ i }) = 25.96 | 8.4 | 4343.8 | 63.4 | g _{2}(1/ϕ) = β _{02}; |
ϕ = 1.051 | ||||
g _{1}(μ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) + f _{31}(east _{ i }, north _{ i }), edf for f _{31}(east _{ i }, north _{ i }) = 117.35 | 8.5 | 4161.3 | 75.1 | g _{2}(1/ϕ) = β _{02}; |
ϕ = 1.664 | ||||
g _{1}(μ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) + f _{31}(east _{ i }, north _{ i }), edf for f _{31}(east _{ i }, north _{ i }) = 117.35 | 8.6 | 4154.5 | 75.4 | g _{2}(1/ϕ _{ ι }) = |
β _{02} + f _{12}(DWT _{ i }) |
Various zero-inflated Poisson regression models validated during model selection
Model | AIC | Mixture parameter | |
---|---|---|---|
g _{1}(λ _{ i }) = β _{01} | 9.1 | 7584.1 | g _{2}(ω _{ i }) = β _{02}; ω _{ i } = 0.52 |
g _{1}(λ _{ i }) = β _{01} + f _{11}(DWT _{ i }) | 9.2 | 7262.3 | g _{2}(ω _{ i }) = β _{02}; ω _{ i } = 0.51 |
g _{1}(λ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) | 9.3 | 7217.5 | g _{2}(ω _{ i }) = β _{02}; ω _{ i } = 0.50 |
g _{1}(λ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) + f _{31}(east _{ i }, north _{ i }), edf for f _{31}(east _{ i }, north _{ i }) = 28.81 | 9.4 | 5708.3 | g _{2}(ω _{ i }) = β _{02}; ω _{ i } = 0.20 |
g _{1}(λ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) + f _{31}(east _{ i }, north _{ i }), edf for f _{31}(east _{ i }, north _{ i }) = 117.35 | 9.5 | 5092.2 | g _{2}(ω _{ i }) = β _{02}; ω _{ i } = 0.12 |
g _{1}(λ _{ i }) = β _{01} + f _{11}(DWT _{ i }) + f _{21}(CTH _{ i }) + f _{31}(east _{ i }, north _{ i }), edf for f _{31}(east _{ i }, north _{ i }) = 117.35 | 9.6 | 5051.3 | g _{2}(ω _{ i }) = β _{02} + f _{12}(DWT _{ i }) |
(For a description of abbreviations see legend for model 7)
The AIC values in the zero-inflated Poisson models were also considerably lower than in the Poisson regression models, but higher than in the associated negative binomial models (Table 4).
Statistical characteristics of the generalized linear model (Eq.8.51) for estimating larva density/m ^{ 2 }
Parameter | Standard error | z-Value | Pr (> |z|) | |
---|---|---|---|---|
β _{01} | −3.135839 | 1.414588 | −2.217 | 0.02664 |
β _{11} | −0.610538 | 0.088263 | −6.917 | 4.6e^{−12} |
β _{21} | −0.003227 | 0.001210 | −2.667 | 0.00765 |
β _{02} | −0.5006 | 0.07631 | −6.56 | 7.795e^{−11} |
ϕ = 1.649 | ||||
f _{31}(east _{ i }, north _{ i }), edf = 117.5 | p-value < 2e^{−16} | |||
R-sq.(adj) = 0.411, deviance explained = 74.9%, AIC = 4150.15 |
(For a description of abbreviations see legend for model 7)
3.2 Validation of the distribution assumption
The qqplots indicated that the negative binomial regression (Eqs. 8.5/8.51) models were best for quantifying larva density, and they confirmed the ranking resulting from the AIC-values (Tables 2, 3, 4 and 5). For both negative binomial regression models, the randomized quantile residuals lay on the bisecting line, indicating that they follow approximately a standard normal distribution, and hence they represent an almost optimal fit to the data. The distribution of quantile residuals of model 8.5 was caused only marginally by the linear approximations adopted (Eq. 8.51). The qqplots for the Poisson and zero-inflated regression models displayed major deviations from the bisecting line. Hence, neither the assumption that larva density follows a Poisson distribution nor the assumption that larva density follows a zero-inflated distribution could be validated in this case.
3.3 Validation of spatial independency
The randomized quantile residuals of the finally selected model (Eq. 8.51 Duchon) using 1^{st} order derivatives in the penalty and three other negative binomial regression models were compared and validated for spatial independency. The validation was conducted for the model including no spatial trend function (Eq. 8.3), the model with a spatial trend function of low dimension (Eq. 8.4) and the model with a trend function of optimal dimension concerning AIC (Eq. 8.51). Hence the finally selected model was compared to two models of lower complexity (Eqs. 8.3, 8.4) and one model of higher complexity (Eq. 8.51) of the spatial trend function. For comparison models 8.3 and 8.4 were refitted by employing approximated effects for the causal predictors DTW and CTH also (see Eq. 8.51).
3.4 Model computations
4Discussion
It is known that sites with a high groundwater table or high percentage of bed rock prevent the hibernation of forest cockchafer larvae at greater soil depths in cold winter climates (Schwerdtfeger [1981]). This is in part reflected in the model effects. A pure clay layer can be assumed to have similar negative effects on the suitability of a site as a habitat for larvae and an increasing proportion of clay layer leads to reduced numbers of predicted larvae per m^{2} (Figure 10). In this context the effect of CTH was found to be a weak indicator so far, since the confidence intervals were rather wide. However, the integration of CTH into the negative binomial regression model (Eq. 8.5) led to a reduction of the AIC, and hence improved the model (Table 3).
The predicted larva density within 1 m soil depth is affected by values of DWT up to 4 m (Figure 2A), which does not seem feasible since maximum depth for hibernation is thought to be approximately 1.1 m (Schwerdtfeger [1981]). However, DWT values used in this model were based on ground water data simulated for the month of October 2007. In spring, a lower DWT is usually observed, and single high water occurrences might result temporarily in even smaller DWT. Furthermore, depending on the specific soil substrates, a capillary ascension may reduce the capacity of the first meter of soil to serve as habitat for larvae as well. These interpretations are based on the assumption that single temporary high-water events affect the capacity of the larval habitat significantly. The constraint of a constant effect of DWT of more than 4 m (Eq. 8.51/Figure 3A) was imposed because it is biologically feasible that the effect of DWT is constant below a certain threshold. Higher values of DWT are at least partially the result of intensive groundwater withdrawals in the Hessian Ried (NW-FVA, Nordwestdeutsche Forstliche Versuchsanstalt Hrsg. [2013]): p. 30 ff.). Hence, the lowering of groundwater in areas where groundwater was formerly available to trees may also affect larva density indirectly. The degradation of forest is a direct result of a lowering of the groundwater table, and the partial dieback of single trees and forests may have improved habitat conditions for the forest cockchafer as well.
A comparison of models showed that the negative binomial distribution was superior to the Poisson and zero-inflated Poisson distributions, which is in accordance with other investigations of overdispersed animal count data (Gray [2005]; Sileshi [2008]; Vaudor et al. [2011]). In many investigations the negative binomial, the Poisson and the zero-inflated Poisson distributions have been compared. In some cases a zero-inflated modification of the negative binomial distribution results in minor, and somewhat doubtful improvements in the models (Gray [2005]; Vaudor et al. [2011]). Especially in cases with the occurrence and abundance of a species resulting from distinct processes, the zero-inflated modification of the negative binomial distribution can result in considerable model improvements (Wenger and Freeman [2008]).
Overdispersion is often assumed to result from a spatial or temporal heterogeneity of the habitat. However, even if conditional distributions are fitted by employing regression approaches (Sileshi [2008]) or stratification (Vaudor et al. [2011]), the negative binomial approach has been found to be superior in many investigations. In this investigation the modeling approach for determining the effect of spatial heterogeneity on overdispersion was much more flexible due to the complex spatial trend function (Table 2). However, even though the dimension of the spatial trend was increased considerably, a significant overdispersion was still evident (Table 2). Hence, at least for our investigation, it can be concluded that even a quite complex linear predictor is not sufficient to cover all sources of overdispersion.
Extended generalized regression models facilitate the estimation of the conditional mean, variance or mixture parameter as functions of covariates. The simplest structure of a linear predictor is to assume the effects of the covariates to be linear (Sileshi [2008]). Yet this assumption must be validated to guarantee unbiased predictions across the whole range of covariates (Hastie and Tibshirani [1990]). Therefore the extension of generalized additive models to overdispersed and zero-inflated count data, such as were used in this investigation, indicates a major advance in the methodology (Barry and Welsh [2002]; Rigby and Stasinopoulos [2005]; Wood [2006]).
Due to the heterogeneity of covariates that are spatially correlated, but unknown, or only insufficiently available, in many cases the response data are autocorrelated. Haining et al. ([2009]) presented a simple conditional autoregressive model to deal with autocorrelated count data which results in the estimation of spatially correlated random effects. Fahrmeir and Echavarrı ([2006]) introduce an extensive methodology using structured additive regression models STAR for overdispersed and zero-inflated count data. These models make it possible to model non-linear covariate effects, individual or cluster-specific uncorrelated random effects, spatially correlated random effects or 2-dimensional spatial trend surfaces simultaneously. The methodology employed in our investigation (Wood [2006]) offers similar technical possibilities for the Poisson and negative binomial distributions and, in combination with Rigby and Stasinopoulos’s ([2005]) methods, for the zero-inflated Poisson distribution. The inventory plots are located exactly via coordinates, and hence a 2-dimensional spatial trend function is fitted instead of spatially correlated random effects for distinct areas. The observations at the 4 subplots were aggregated due to their proximity, which makes the estimation of uncorrelated random effects at the plot level redundant. Overall, the approach adopted combines an appropriate distribution assumption for count data with non-linear effects of causal covariates and an advanced method for covering spatial autocorrelation.
Finally graphical representations of randomized quantile residuals are powerful validation tools that provide more detailed information about model characteristics than just comparing global statistics. However, so far, in most investigations of count data, validation has been limited to global statistics or simple comparisons of counts (Gray [2005]; Sileshi [2008]; Wenger and Freeman [2008]; Vaudor et al. [2011]).
5Conclusions
The developed negative binomial regression model can be used to predict the current spatial pattern of larva density in the Hessian Ried. The almost optimal fit by the model allows for the prediction of conditional expectation values but also of conditional quantiles to account for uncertainty in the process of silvicultural decision making. Based on the predictions local forest managers will be able to optimize the spatial pattern of regeneration and forest protection planning. Areas with different risks can be identified by combining the predictions with expert knowledge about critical larva density. Hence areas can be separated where forest protection measures are essential, reasonable or needless to ensure successful regeneration measures. A classification of forest stands can be conducted that will be mainly affected by their regional location within the Hessian Ried. Deviations from the overall spatial pattern will be affected by the spatial pattern of pure clay layer and distance to water table.
In the future inventories of larvae in the Hessian Ried will be conducted continually at an interval of 4 years in the year prior to the year of swarming. In the planning of control measures, it is of particular interest to know if the spatial distribution of the larvae varies over time or is more or less stationary. Hence, a major objective of the future inventories will be to gain insight into the spatio-temporal pattern of larva density. For this purpose the methodology developed can be used for time series data by estimating inventory-specific spatial trends or, in the case of a number of inventories, by integrating a space-time effect (Augustin et al. [2009]).
The overall sampling grid should be optimized in future inventories to enable more plots with a medium to high CTH and low DWT to be assessed. In this context the database could be improved considerably by recording the CTH when excavating instead of modeling it.
Some knowledge exists about the capacity of different stand structures to serve as habitat for forest cockchafer larvae. For example dense young stands provide a less suitable habitat (Schwerdtfeger [1981]). Hence additional information about stand structure and tree species composition should be recorded and its effects on larva density tested. Currently the model predictions might be biased for certain stand structures if relevant effects on larva density do exist. However it is likely that the strong spatial effect will be the most important effect in future model developments also. Finally the 2013 inventory will be used to test the effect of forest protection measures that have been implemented in a sub-area of the Hessian Ried.
Declarations
Acknowledgements
We are thankful to Johannes Sutmöller for providing the groundwater data and to Falko Engel for calculating the percentage clay thickness from soil substrate maps. We would like to thank Peter Gawehn and his team for their comprehensive and careful field work and data collection and Helen Desmond for her extensive language revision. We also thank two anonymous reviewers for valuable comments that improved the quality of the manuscript. We also thank the Hessian State Forest Enterprise, Hessen-Forst, the principal project partner.
Authors’ Affiliations
References
- Augustin NH, Musio M, Wilpert K, Kublin E, Wood SN, Schumacher M: Modeling spatiotemporal forest health monitoring data. J Am Stat Assoc 2009, 104(487):899–911. doi:10.1198/jasa.2009.ap07058 doi:10.1198/jasa.2009.ap07058 10.1198/jasa.2009.ap07058View ArticleGoogle Scholar
- Barry SC, Welsh AH: Generalized additive modelling and zero inflated count data. Ecol Model 2002, 157(2–3):179–188. 10.1016/S0304-3800(02)00194-1View ArticleGoogle Scholar
- Burnham KP, Anderson DR: Multimodel inference: understanding AIC and BIC in model selection. Sociol Method Res 2004, 33(2):261–304. doi:10.1177/0049124104268644 doi:10.1177/0049124104268644 10.1177/0049124104268644View ArticleGoogle Scholar
- Buuren S, Fredriks M: Worm plot: simple diagnostic device for modelling growth reference curves. Stat Med 2001, 20: 1259–1277. 10.1002/sim.746PubMedView ArticleGoogle Scholar
- Cameron AC, Trivedi PK: Regression-based tests for overdispersion in the Poisson model. J Econometrics 1990, 46(3):347–364. 10.1016/0304-4076(90)90014-KView ArticleGoogle Scholar
- Duchon J: Splines minimizing rotation-invariant semi-norms Sobolev spaces. Construction Theory of Functions of Several Variables. Springer, Berlin, In; 1977.View ArticleGoogle Scholar
- Dunn PK, Smyth GK: Randomized quantile residuals. J Comput Graph Stat 1996, 5: 236–244.Google Scholar
- Fahrmeir L, Echavarrı LO: Structured additive regression for overdispersed and zero-inflated count data. Appl Stochastic Models Bus Ind 2006, 22: 351–369. 10.1002/asmb.631View ArticleGoogle Scholar
- Fahrmeir L, Kneib T, Lang S: Regression. Springer, Berlin; 2007.Google Scholar
- Gray BR: Selecting a distributional assumption for modelling relative densities of benthic macroinvertebrates. Ecol Model 2005, 185: 1–12. 10.1016/j.ecolmodel.2004.11.006View ArticleGoogle Scholar
- Haining R, Law J, Griffith D: Modelling small area counts in the presence of overdispersion and spatial autocorrelation. Comput Stat Data Anal 2009, 53: 2923–2937. 10.1016/j.csda.2008.08.014View ArticleGoogle Scholar
- Hastie HJ, Tibshirani RJ: Generalized Additive Models. Monographs on Statistics and applied Probability 43. Chapman & Hall, London, New York, Tokyo, Melbourne, Madras; 1990.Google Scholar
- Kleiber C, Zeileis A: Applied Econometrics with R. Springer-Verlag, New York; 2008.View ArticleGoogle Scholar
- McCullagh P, Nelder JA: Generalized Linear Models, 2nd edn. Monographs on Statistics and applied Probability 37. London, New York, Tokyo, Melbourne, Madras, Chapman and Hall; 1989.Google Scholar
- NW-FVA, Nordwestdeutsche Forstliche Versuchsanstalt (Hrsg.) (2013) Waldentwicklungsszenarien für das Hessische Ried, Entscheidungsunterstützung vor dem Hintergrund sich beschleunigt ändernder Wasserhaushalts- und Klimabedingungen und den Anforderungen aus dem europäischen Schutzgebietssystem Natura 2000. Beiträge aus der Nordwestdeutschen Forstlichen Versuchsanstalt, Band 10. Universitätsverlag Göttingen, p 397 , [http://webdoc.sub.gwdg.de/univerlag/2013/NWFVA10_HessischesRied.pdf]
- Ott A, Delb H, Mattes J, Schröter H: Erfolgreiche Regulierung eines Nebenflugstammes des Waldmaikäfers. AFZ-DerWald 2006, 61(6):312–314.Google Scholar
- R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2010.Google Scholar
- Rigby RA, Stasinopoulos DM: Generalized additive models for location, scale and shape (with discussion). Appl Statist 2005, 54: 507–554.Google Scholar
- Rigby RA, Stasinopoulos DM (2009) A flexible regression approach using GAMLSS in R. Short course booklet, [http://book.gamlss.org/]Google Scholar
- Schwerdtfeger F: Die Waldkrankheiten. 4. neubearbeitete Auflage, Verlag Paul Parey, Hamburg und Berlin; 1981.Google Scholar
- Sileshi G: The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference. Pedobiologia 2008, 52: 1–17. 10.1016/j.pedobi.2007.11.003View ArticleGoogle Scholar
- Vaudor L, Lamouroux N, Olivier J-M: Comparing distribution models for small samples of overdispersed counts of freshwater fish. Acta Oecol 2011, 37: 170–178. 10.1016/j.actao.2011.01.010View ArticleGoogle Scholar
- Venables WN, Ripley BD: Modern Applied Statistics with S. Springer, New York; 2002.View ArticleGoogle Scholar
- Wenger SJ, Freeman MC: Estimating species occurrence, abundance, and detection probability using zero-inflated distributions. Ecology 2008, 89: 2953–2959. 10.1890/07-1127.1PubMedView ArticleGoogle Scholar
- Wood SN: Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC, Boca Raton; 2006.Google Scholar
- Wood SN, Augustin NH: GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecol Model 2002, 157: 157–177. 10.1016/S0304-3800(02)00193-XView ArticleGoogle Scholar
- Yee TW: The VGAM package for categorical data analysis. J Statist Software 2010, 32(10):1–34. http://www.jstatsoft.org/v32/i10/ http://www.jstatsoft.org/v32/i10/View ArticleGoogle Scholar
- Yee TW, Wild CJ: Vector generalized additive models. J R Statist Soc B 1996, 58(3):481–493.Google Scholar
- Zucchini W, Schmidt M, Gadow K: A model for the diameter-height distribution in an uneven-aged beech forest and a method to assess the fit of such models. Silva Fennica 2001, 35: 169–183. 10.14214/sf.594View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.