See online Appendix for an additional Table.
Contributions: the authors contributed equally.
This paper is an extended version of master’s thesis of Dayun Kang.
Conflict of interest: the authors declare no potential conflict of interest.
Scrub typhus, a bacterial, febrile disease commonly occurring in the autumn, can easily be cured if diagnosed early. However, it can develop serious complications and even lead to death. For this reason, it is an important issue to find the risk factors and thus be able to prevent outbreaks. We analyzed the monthly scrub typhus data over the entire areas of South Korea from 2010 through 2014. A 2-stage hierarchical framework was considered since weather data are covariates and the scrub typhus data have different spatial resolutions. At the first stage, we obtained the administrative-level estimates for weather data using a spatial model; in the second, we applied a Bayesian zero-inflated spatio-temporal model since the scrub typhus data include excess zero counts. We found that the zero-inflated model considering the spatio-temporal interaction terms improves fitting and prediction performance. This study found that low humidity and a high proportion of elderly people are significantly associated with scrub typhus incidence.
Scrub typhus is an acute febrile disease spread by the bites of the larvae of trombiculid mites infected with
Patients with scrub typhus have symptoms such as fever, headache, fatigue, swollen lymph nodes and muscle pain. They are easily cured by antibiotica (tetracycline or chloramphenicol) when administered in the early stage; however, patients who are not treated appropriately can develop complications that can lead to death, such as pneumonia, encephalitis, and multi-organ failure. Finding the risk factors for scrub typhus is important as this would contribute to prevention of outbreaks of the disease.
Previous studies suggest that meteorological factors and the proportion of elderly people influence the number of scrub typhus cases (Ogawa
In recent years, a few studies have examined the spatial or spatio- temporal distribution of scrub typhus. Kuo
In this paper, we discuss the analysis of monthly scrub typhus incidence data for all administrative districts of South Korea, while also considering the complicated spatio-temporal dependency structures. To the best of our knowledge, this is the first study to adopt a spatio-temporal zero-inflated model for scrub typhus data.
We used meteorological and socioeconomic factors as covariates and propose a Bayesian hierarchical model for the building of flexible spatio-temporal structures by combining prior knowledge with the data at hand. We examined whether such a space-time interaction structure should be adopted in analyzing the data along with the overall spatial and temporal dependency structures. In South Korea, most of the scrub typhus incidence is concentrated in the south-western regions of the country and in the autumn season because of harvest and increased outdoor activities. Taking the whole country into account, most of the monthly incidence data had zero counts, which leads to over-dispersion. Therefore, we used a zero-inflated Poisson (ZIP) distribution (Lambert,
We used monthly datasets in South Korea from 2010 to 2014 covering 251 administrative districts and 60 months. The basic characteristics of all variables are shown in
We proposed a 2-stage hierarchical framework to overcome the different spatial data resolutions. At the first stage, we predicted weather values for all administrative districts using a spatial model in which projected coordinates of longitude, latitude and weather data are covariates. At the second stage, we fitted a Bayesian spatio-temporal zero-inflated model to the incidence data, using the predicted weather values and the proportion of elderly people as covariates. The detailed framework is shown in
Stage one: spatial modelling for meteorological data. We assumed that the spatial model for each weather data is as follows:
where W (s,
Here, we estimated the true weather values at about 1,000 locations for each time point in
where
where the random effects
The WinBUGS statistical package (
We additionally considered seven competing models. All models (models 1-8) are listed in the Appendix. Model 1 to Model 4 are Poisson models, and Model 5 to Model 8 are ZIP models. Models 1 and 5 only consider covariates. Models 2 and 6 additionally contain spatially and temporally uncorrelated terms. Spatially and temporally correlated random terms were added in models 3 and 7. Finally, in models 4 and 8, spatio-temporal interaction term was considered additionally. We investigated the performance of the proposed model (model 8) and other competing models (models 1-7) in terms of deviance information criterion (DIC) and mean squared prediction error (MSPE). A model with a smaller MSPE and DIC value has better performance.
To examine the prediction performance of the proposed spatial model, we compared the values observed at the monitoring stations and the predicted values for the administrative district in which each station is located. We chose three administrative areas that contain weather monitoring stations: Inje-gun in Gangwon Province, Youngdong-gun in Chungcheong Province and Mungyeong-si in Gyeongsangbuk Province.
We also compared the empirical probability of zero counts from the real data with the estimated probability from the models. Around 73% of the incidence had zero values. In
The parameter estimates of the best model, model 8, are shown in
We compared the observed values with the predicted values from model 8. In
Investigating the relationship between weather factors and scrub typhus has led to the result that humidity is a significant risk factor, but a negative one. We found that the number of the scrub typhus cases increases as humidity decreases. This negative association can be explained by the fact that the autumn season is relatively dry and the incidence is mostly centred at that time. This result is parallel to the negative correlation of relative humidity and scrub typhus incidence shown by Li
In addition, we showed that the higher the proportion of elderly people is, the more scrub typhus occurs, which is supported by Ogawa
A negative binomial zero-inflated spatio-temporal model as an alternative for our data can be considered, but as it has larger DIC and MSPE values (DIC = 25553 and MSPE = 6.44) than the proposed Poisson zero-inflated spatio-temporal model, the latter would then be better in terms of model performance.
Since most of the hotspots are in rural areas, interventions specified for those areas can effectively prevent scrub typhus. A high proportion of the residents in rural areas are senior citizens who are likely to lack information on scrub typhus. Therefore, a key approach would be to provide education to all residents in the endemic areas before peak season. As an example, Koryung County, South Korea, effectively prevented the disease by educating its residents, especially the elderly. People who had experienced scrub typhus were invited as guest speakers and as soon as the first case of the disease occurred, information went out. In addition, the government of Koryung County distributed tick repellent and protective clothing to the residents. In doing so, the incidence of scrub typhus in Koryung County decreased compared to previous years. The prevention policies should especially be focused on the autumn season due to ensuing harvest and increased outdoor activities.
All models in this study used adopted Bayesian methods. In spite of a high computational cost, they have advantages over frequentist methods. Unlike the difficult interpretation of confidence intervals in frequentist inference, credible intervals in Bayesian inference are more straightforward and easy to interpret. Especially in spatial modelling, the Bayesian framework enables understanding based on hierarchical models highly intuitive. Combining prior knowledge with real-world data is another benefit of Bayesian inference. Here, careful selection of appropriate priors is required and we used non-informative priors. To understand how the prior distributions influence the results, we conducted a sensitivity analysis using inverse gamma distributions for variances. These prior distributions provided almost similar results.
We had a minor support problem in using the weather data as covariates in this study. As a solution, we used a two-stage model which can offer location datasets without monitoring stations using a relatively small number of observed data. Owing to this strength, statistical analysis can be conducted with the complete covariates and find the significant risk factors. Based on these results, we were able to prevent and deal with the disease effectively.
Several further tasks remain to be done. First, the model for weather data in the first stage is limited to the spatial model in this study. Adopting a spatio-temporal model for meteorological data might improve the predictive performance. Also, combining weather observation values and predicted values from numerical models might enhance the predictive performance. Second, we expect to be able to analyze the data using sex- and age-adjusted individual patient data in the future, but we were unable to obtain this information with reference to the people diagnosed with scrub typhus in this study. Third, because scrub typhus occurs mostly in the autumn, analyzing only autumn data but on a daily basis might help locate detailed trends. In addition, conducting a spatio-temporal clustering, might be helpful in deriving interventions for each season and could lead to a simulation study to investigate the effects of the interventions.
This study is the first attempt to use a Bayesian spatio-temporal ZIP model for the association between the incidence of scrub typhus in Korea and the weather and proportion of people older than 65 years. Our spatio-temporal model dramatically increased the performance. This supports that spatio-temporal models should be applied for the data with spatio-temporal dynamics. Given that many epidemiological data contain spatial and temporal dependencies, our model could be a template for the use of spatio-temporal models with epidemiological data.
Map of Tsutsugamushi triangle.
Map of monitoring stations and kriging locations for the weather model. (A) Map of temperature and precipitation stations; (B) Map of humidity stations; (C) Map of kriging locations.
Flowchart of the two-stage model. ZIP, zero-inflated Poisson.
Calibration plots of the weather data model for Inje-gun in Gangwon Province (first row), Youngdong-gun in Chungcheong Province (second row), and Mungyeong-si in Gyeongsangbuk Province (third row). (A, D, G) precipitation; (B, E, H) temperature; (C, F, I) humidity.
Calibration plot of the scrub typhus data model (Model 8).
Comparison of scrub typhus incidence and the predicted values for two representative regions. (A) Gwanak-gu in Seoul City; (B) Ulju-gun in Ulsan City.
Predicted and observed maps of scrub typhus incidence. (A) The incidence of scrub typhus, October 2013; (B) the predicted values, October 2013; (C) the incidence of scrub typhus, October 2014; (D) the predicted values, October 2014.
Summary of variables.
Variables | Description | Mean | Standard deviation | Q1 | Q3 | IQR |
---|---|---|---|---|---|---|
Scrub typhus incidence ( |
Monthly incidence in each administrative area | 2.5 | 8.7 | 0 | 1 | 1 |
Non-zero incidence ( |
Monthly non-zero incidence in each administrative area | 9.3 | 14.8 | 1 | 11 | 10 |
Precipitation ( |
Monthly average precipitation (mm) | 3.4 | 3.7 | 1.0 | 4.2 | 3.2 |
Temperature ( |
Monthly average temperature (°C) | 12.3 | 9.7 | 3.8 | 21.5 | 17.7 |
Humidity ( |
Monthly average | 67.2 | 9.2 | 60.0 | 74.7 | 14.7 |
relative humidity (%) | ||||||
Proportion of elderly people ( |
Proportion of people older than 65 years (%) | 16.0 | 7.5 | 9.9 | 21.8 | 11.9 |
Q1, first quartile; Q3, third quartile; IQR, Q3-Q1.
Model performance.
Distribution | Model | Mean square prediction error | Deviance | pD | Deviance information criterion | Est.Pr (Y=0) |
---|---|---|---|---|---|---|
Real data | 0.730 | |||||
Poisson | Model 1 | 75.07 | 160938 | 5.02 | 160943 | 0.157 |
Model 2 | 10.65 | 29633 | 306.58 | 29939 | 0.739 | |
Model 3 | 10.65 | 29640 | 295.16 | 29935 | 0.740 | |
Model 4 | 0.96 | 22640 | 2122.03 | 24762 | 0.739 | |
Zero-inflated Poisson | Model 5 | 61.24 | 76150 | 4.38 | 76154 | 0.731 |
Model 6 | 10.07 | 28330 | 357.53 | 28688 | 0.740 | |
Model 7 | 10.15 | 28410 | 366.04 | 28776 | 0.710 | |
Model 8 | 0.97 | 22320 | 2025.35 | 24345 | 0.739 |
Est.Pr (Y=0): the estimated probability of zero counts.
Posterior summaries for the spatio-temporal zero-inflated Poisson model.
Est. | Standard deviation | Monte Carlo error | 2.50% | Median | 97.50% | Relative risk | |
---|---|---|---|---|---|---|---|
Intercept ( |
-6.883 | 0.432 | 0.043 | -7.673 | -6.714 | -6.285 | 0.001 |
Precipitation ( |
0.023 | 0.017 | 0.001 | -0.008 | 0.022 | 0.055 | 1.023 |
Temperature ( |
-0.016 | 0.011 | 0.001 | -0.041 | -0.017 | 0.007 | 0.984 |
Humidity ( |
-0.098 | 0.007 | 0.001 | -0.110 | -0.099 | -0.085 | 0.907 |
Elderly people proportion ( |
0.066 | 0.006 | 0.001 | 0.054 | 0.066 | 0.075 | 1.068 |
0.418 | 0.030 | 0.002 | 0.365 | 0.416 | 0.482 | ||
0.263 | 0.163 | 0.016 | 0.091 | 0.211 | 0.726 | ||
0.770 | 0.058 | 0.004 | 0.666 | 0.767 | 0.892 | ||
1.525 | 0.149 | 0.005 | 1.262 | 1.514 | 1.852 | ||
0.835 | 0.020 | 0.002 | 0.800 | 0.833 | 0.871 |
Est.: posterior mean; 2.50% and 97.50%: lower limit and upper limit of 95% credible interval, respectively.