Self-reports, sometimes also called patient reported outcomes, are of tremendous importance in medical and psychological science, where many parameters and symptoms are only assessable via introspection (FDA, 2009). This includes mood, pain, urge to smoke, ruminative thinking and fatigue, to name a few. For these parameters and symptoms self-reports are the method of choice. However, there is a problem with self-reports, namely that they rely to a large extent on the participants’ memory. Unfortunately, autobiographical memory research has shown that recalling information is an active reconstruction process, which is likely to distort past experiences (Gorin and Stone, 2001; Fahrenberg et al., 2007; FDA, 2009). These systematic distortions are referred to as recall bias. To circumvent the problem of systematically distorted memories in self-reports, real-time assessments have been developed. The technique is mostly referred to as ambulatory assessment and it reduces the recall bias as it allows capturing symptoms, experiences and mood in real-time and in the natural environment of the participants (Trull and Ebner-Priemer, 2013) by using electronic diaries filled out on a technical device. Although there are related terms with this method, such as ecological momentary assessment (Stone and Shiffman, 1994), experience sampling (Csikszentmihalyi and Larson, 1987) or real-time data capture (Stone and Broderick, 2007), we adopted the term ambulatory assessment because it captures a wide variety of sampling methods, as well as data structures involved in the assessment of daily life experience (Trull and Ebner-Priemer, 2013). Whereas nearly all ambulatory assessment studies capture self-reports and many assess physiological or behavioural parameters, the location of the participant is rarely captured or assessed. There are only a few exceptions, including the study from Froehlich et al. (2006), comparing retrospective, explicit place preferences to real-life travel behaviour. Epstein et al. (2014) incorporated global positioning system (GPS) data and assess the relation between mood and neighborhood surroundings. Gustafson et al. (2014) tracked the location of patients with alcohol dependency warning them in realtime, including provision of sophisticated help and feedback, when they were approaching a bar. Location data can be collected continuously (e.g., every minute), while electronic diaries have to be collected less frequently and can therefore only capture a sample of the participant’s experience (Shiffman, 2007) meaning that the assessment timing must be chosen carefully with regard to the research question(s). The most common sampling strategies are either time- or event-based. The sampling strategy can also be a combination of both of these approaches. A time-based sampling scheme is suitable for monitoring variations throughout the day as a questionnaire is triggered on a predefined time schedule, which can be random or done at regular intervals (e.g., daily or hourly). In comparison, an event-driven sampling strategy triggers reports on mood only during specific events (e.g., headache) being registered by the participant themselves (Fahrenberg et al., 2007; Shiffman, 2007). A less commonly applied sampling strategy is the interactive ambulatory assessment coupled to an activity (Ebner-Priemer et al., 2013), such as the heart rate (Myrtek, 2004) or the location of the participant (Dorn et al., 2015a). These sampling strategies are especially useful if self-reports during rare events are of interest; for example, during sport events or walking through a park. If these events are short-lived, a time-based sampling strategy would likely miss them and were such rare events not captured, one could not address their association with, e.g., mood and well-being. In other words, without a self-report during a visit at a park, one could not address the potentially positive impact the park visit could have on the mood. Furthermore, it is essential to obtain a variance in both parameters of interest (e.g., physical activity and mood). Without variance the parameters’ correlation cannot be addressed (Ebner-Priemer et al., 2013). In addition, Reichert et al. (2016) further discuss sampling strategies for specific, health-related research questions.
In an ongoing project, the psychiatric-epidemiological centre (PEZ) investigates the interactions of epigenetics, the environment, mental health and well being. The study is a cooperation between the Central Institute of Mental Health (ZI), the Karlsruhe Institute of Technology (KIT) and the GIScience Research Group at the Heidelberg University. The project was motivated by epidemiological studies showing negative effects of living in cities on mental health with increased risk for schizophrenia, mood and anxiety disorders (Peen et al., 2010). Furthermore, Lederbogen et al. (2011) reveal associations of urbanity and neural social stress processing in humans. Still, it remains unclear how the physical environment accounts for these findings. Previous project works have presented an ambulatory assessment sampling strategy incorporating land use (Dorn et al., 2015a). An e-diary was triggered when a person moved to a land use differing from that of the previous trigger location. This method increased the number of unique trigger positions as well as the number of triggers at less frequently visited types of land use and it was argued that the sampling strategy is beneficial when addressing the relationship between the natural surroundings and human well-being.
The current study extends the work by Dorn et al. (2015a) by comparing four ambulatory assessment sampling strategies being simulated based on one week of movement data collected on minute basis from 143 voluntary participants. A time-based ambulatory assessment trigger option, where an e-diary is prompted at certain times of the day, was compared to combinations of time- and interactive trigger options that were activated if a participant moves a certain minimum distance or visits a location where land use and/or population density differ from those at the previous trigger position. The aim was to develop an ambulatory assessment sampling strategy that spatially distributes the self-reports on mood in an urban environment. It furthermore aimed at reducing the number of trigger events outside the study region and to increase the number of trigger events at rarely visited types of land use and in unique city districts.
Materials and Methods
In this study, we simulated ambulatory assessment trigger events by applying different criteria. For this purpose, participant movement data, as well as data on land use and population density were used. The latter data were made available at the administrative district level (Figure 1) and we also investigated whether the parameters were correlated to other socio-economic data, which made it possible to see whether the trigger methods spatially spread the trigger events according to socio-economic variations within a region. The study area comprised the adjacent districts of Rhein-Neckar-Kreis, Mannheim, Heidelberg and Ludwigshafen located in Baden-Württemberg and Rhineland-Palatinate in southwestern Germany (Figure 1). The study region altogether covered about 1400 km² and included both urban and rural areas.
Location data (coordinates) from 143 voluntary participants were collected by the PEZ in collaboration with the KIT using the smartphone app movisensXS (Movisens GmbH, Karlsruhe, Germany). The participants were randomly drawn from local population registers after considering factors, such as age, gender and ethnic background. It was also assured that both people in urban and rural areas were contacted. The data origin from the ZI project Impact of Urbanicity on Genetics, Cerebral Functioning and Structure and Condition in Young People (URGENCY). The participants were adolescents and young adults, and they were given monetary compensation for their contribution. For more details about the recruiting process, the reader is referred to Reichert et al. (2016).
The male participants (n=61) had a mean age of 18.2 years [standard deviation (SD)=6.27 years] and the female participants (n=82) had a mean age of 18.1 years (SD=6.30). The participants mainly lived in the municipalities of Mannheim and Eberbach. Mannheim has an urban character with a population of 318,000 inhabitants and a population density reaching up to 44,971 people per km² (Nexiga GmbH, Bonn, Germany). Eberbach has more of a rural character with 14,700 inhabitants and a maximum population density of 10,890 people per km² (Nexiga GmbH). In the study, each person was tracked for seven consecutive days within the study period ranging from September 2014 to April 2015. The coordinates of the provided smartphones were collected according to an algorithm incorporating both time and the smartphone moving sensor (Stumpp, 2014). The data were thereafter (when possible) aggregated to minute-based data covering all hours of the day and night. For the purpose of this study, all coordinates outside the time-range of 08:00 a.m. and 09:00 p.m. were subsequently removed. The coordinates, along with their accuracy, were collected either via the smartphone’s GPS unit (accuracy±10 m), WiFi (±40 m) or the global system for mobile communications (GSM) (±200-3000 m). The GPS unit clearly has the highest accuracy but also requires the most battery power. Furthermore, the signal cannot always be obtained. Indoors for example, the app can automatically switch to the WiFi mode for determining its coordinates. There were also situations where no coordinates at all could be obtained; in the non-filtered data, around 70% of the time steps measured by minute were due to missing coordinates. The participant movement data were processed in two steps. Firstly, the timestamps with missing coordinates were removed as well as coordinates having an accuracy worse than 100 m. Secondly, obviously erroneous coordinates where the participant moved faster than 300 km h-1 were removed. The latter could be achieved by comparing the timestamps and calculating the distance between subsequent positions. Although the pre-processing was done after the data gathering was completed, it can also be conducted in real-time in order to filter the data immediately after collection.
Land use data were obtained from the authorative topographic-cartographic information system (ATKIS) base digital landscape model (DLM) with a scale of 1:25,000 (http://www.adv-online.de/ Geotopography/ATKIS/). This is the official data source for digital topographical geodata in the Federal Republic of Germany. The data were provided by the Baden-Württemberg State Office for Geoinformation and State Development (Landesamt für Geoinformation und Landentwicklung Baden-Württemberg) and the Federal State Office for Surveying and Geo Information Rhineland-Palatinate (Landesamt für Vermessung und Geobasisinformation Rheinland-Pfalz). The Base DLM for Baden-Württemberg is regularly updated and was acquired in early 2014. The actuality for the Base DLM Ludwigshafen is October 2013. The data were pre-processed according to Dorn et al. (2015b). During the process the data were reclassified and topology and overlap issues were solved. The final data comprised twelve land use/land cover (LULC) categories including urban, residential, industry and recreation, among others (Figure 2). The same figure also includes an example of location data with coordinates collected with various time steps including minute basis. Socio-economic data were used as well. Commercial data on population density, as well as residential building density, household density, apartment rent, apartment inquiry, passerby index, and unemployment percentage for 2012 were acquired from Nexiga GmbH on a district level where each area comprises about 400 households (Figures 1 and 2). This spatial unit is based on the electoral areas. Altogether the study region consisted of 1276 districts of a size ranging from 0.01 to 15.36 km².
Trigger framework and trigger simulation
An ambulatory assessment trigger framework was set up by using a PostgreSQL database (http://www.postgresql.org/) with a PostGIS extension. The Python-based web framework Django (https://www.djangoproject.com/) subsequently enables the communication between the smart-phones and the spatial database on the server (Figure 3). On a regular interval and via the movisensXS framework, the smartphones send a json request (http://www.json.org/JSONRequest.html) with the following data to the server: smartphone ID, latitude, longitude, accuracy, and time-stamp. The data are sent for the current location as well as for the location of the latest trigger event. Based on these data, a spatial database query is conducted, and if certain criteria are fulfilled, an ambulatory assessment questionnaire is triggered. The trigger framework was set up and successfully tested. Nonetheless, this study is based on simulated trigger events, which make it possible to first collect the location data for all participants and then compare different trigger methods by using the very same data for every method. The trigger methods were evaluated based on the percentage of unique trigger positions and the distribution over administrative districts, among others. Pearson’s chi-squared test was furthermore applied in order to compare the spatial distribution over land uses. The four ambulatory assessment sampling strategies/trigger methods were defined as follows.
Method 1: time trigger
Within- and between-day variations of mood can be addressed by triggering the questionnaires at fixed time intervals that do not change between days. With trigger method 1, the participants are prompted to fill in a questionnaire on the hour between 8 a.m. and 9 p.m. regardless of whether or not they were moving (Figure 4).
Method 2: combined time and distance trigger
Trigger method 2 incorporates the movement of the participants. Here, a questionnaire is triggered if a person moves at least 500 m (independent of direction) since the previous trigger event and the amount of time passed since then is at least 40 min (Figure 5). The latter criterion limits the number of trigger events per day. Furthermore, after 100 min a questionnaire is triggered even if the person moves less than 500 m. This criterion guarantees that intra-daily data are collected even if the participant barely moves.
Method 3: location-based trigger incorporating land use
This trigger method considers both time and space. Here, a questionnaire is triggered if the participant is inside the study region and moves to an area characterized by land use different from that of the previous trigger position (Figure 6). As for trigger method 2, the amount of time since the latest previous trigger has to be at least 40 min. Furthermore, after 100 min a report is triggered even if the participant does not move to another type of land use or is outside the study region. By using this method, the idea is to collect more data at the types of land use visited less frequently.
Method 4: location-based trigger incorporating land use and population density
This method is similar to method 3, but in addition to the above described criteria it also triggers according to population density. These data are available at the administrative district level (Figure 1). Population density classes were derived by calculating the population density deciles for the study region. The obtained classes define the trigger thresholds. A questionnaire would be triggered if the participant moves to a location with a population density decile class differing from the one at the previous trigger location (Figure 7). The idea is to trigger more frequently at locations differing in socio-economic characteristics. Population density is also used as a proxy for other socio-economic data. Table 1 shows the Pearson correlation, which is derived with the R-function corr.test in the R-package psych (http://personalityproject.org/r/psych/) between seven socio-economic parameters available for 1276 administrative districts (Figure 1). It can be seen that population density (as expected) has a high correlation with both residential building density (R²=0.73) and household density (R²=0.98). A correlation can also be observed between population density and the parameters apartment inquiry (R²=0.43) and the passersby index (R²=0.48) and unemployment rate (R²=0.47), respectively. In other words, triggering according to population density would also increase the distribution over other socio-economic variables. Only population density and apartment rent have no clear correlation (R²=0.25).
Although a framework for real-time spatio-temporal ambulatory assessment trigger events was implemented and successfully tested, this study was not conducted until the movement data for 143 participants had been collected. This allowed us to use the very same input data for all trigger methods compared.
The time-based trigger method (1) on average resulted in 70.91 questionnaires per participant and week (Table 2). This amounts to an average of 10.13 trigger events per participant and day. The time- and distance-based trigger method (2), the location-based trigger method incorporating land use (3) and the location-based trigger method incorporating land use and population density (4), on the other hand, triggered 70.95, 63.87 and 67.85 times per participant and week, respectively. This equates to an outcome of 10.14, 9.12 and 9.69 per participant and day for method 2, 3 and 4, respectively. Furthermore, the percentage of unique trigger positions per participant and week was 50.43, 58.65, 59.34 and 59.51% for method 1, 2, 3 and 4, respectively. In addition, the combined time and distance trigger method (2), and those including the location of the participant, all increased the percentage of events having a different position than the last. The percentages were 50.83, 61.67, 61.28 and 61.30% for the trigger methods 1, 2, 3 and 4, respectively. An examination of the results revealed that method 2 triggered according to the distance criteria (participant moves 500 m or more since the last signal) in 59.85% of the time. This also means that in 40.15% of the cases, method 2 triggered due to the time restrictions of maximum 100 min between events. Furthermore, method 3 triggered according to changed land use in 52.89% of the cases and method 4 according to the non-time criteria in 60.93% of the cases. The spatial spreading was also evaluated by incorporating socio-economic data available for administrative districts. On average, method 1 triggered in 10.69 unique districts per participant and week, whereas method 2, 3 and 4 triggered in 13.42, 13.12 and 13.97 unique districts per participants and week, respectively. Derived on a(n) (up to) one-minute basis and including the data from all participants, Table 3 shows that 46.14% of the time between 08:00 a.m. and 09:00 p.m. was spent in residential areas. This makes residential areas the type of land use most visited by far. The next commonly visited types of land uses were urban regions, areas outside the study region and industrial areas. Further examination of the participants’ movement patterns (not shown) revealed that only a small proportion of the study area was regularly visited. Altogether, the participants spent 76.69% of the time in urban and residential areas that together covered 13.5% of the study area (Table 3). Other land use types were located further away from the participants’ homes and were only irregularly visited, e.g., farmland covering 39.2% of the study area was hardly visited at all. The results were evaluated according to the amount and percentage of trigger events at the different land uses. Importantly, however, it must be understood that an increase in percentage terms does not necessarily imply a higher number of trigger events since the total amount of trigger events differed between the methods.
The results show that 50.33% of the time-based trigger method (1) were triggered in residential areas, i.e. the most visited type of land use (Table 3). As given by Person’s chi-squared test, the corresponding percentages for method 2-4 were significantly lower as they reached 45.58, 46.83 and 48.15%, respectively. The actual number of trigger events was also lower for methods 2-4 in comparison to method 1. For urban areas, the second most visited type of land use, these methods triggered between 28.77-30.54% of the cases. For methods 2 and 4 the values were significantly higher in comparison to method 1. Furthermore, method 1 triggered outside the study region in 9.42% of the time. For method 2, the corresponding value was 10.54% and for method 3 and 4 the value decreased to 6.23% and 5.83%, respectively, which represents statistical significance at the level of 0.01. For the industrial areas, method 1 triggered in 5.04% of the cases. This value was slightly increased for method 2 (5.41%), method 3 (5.87%; P<0.01) and method 4 (5.23%). The participants spent less than 5% of the time at all the remaining types of land use where methods 2-4 constantly triggered more frequently than the time-based method 1. With method 2, the increase was found to be statistically significant for farmland and unknown types of land use. Moreover, method 3 and 4 triggered even more frequently at rarely visited types of land use. Compared to the time-based trigger method (1) the increase was significant for farmland, railway, scrub, forest and unknown types of land use. The remaining types were visited too rarely for appropriate assessment.
The correlation analysis between socio-economic parameters shows that population density is related to several other parameters. The results also show that residential building density has a high correlation with these other parameters. Depending on the data available, this finding suggests that one can also formulate a trigger criterion incorporating building density. If one obtains a spread within this parameter, one would also obtain a larger distribution within the other parameters. However, although the socio-economic data originally come from separate sources, they are all obtained from the same data provider and we cannot rule out that some parameters were used in downscaling of the others. Hence, part of the high correlation, e.g., between population density and household density, might have been caused by one of the parameter possibly used to model the other one.
Unfortunately, the four trigger methods examined in this study cannot easily be compared in real-time. Inhomogeneities between the groups concerning movement would obstruct a comparison, which is the reason why we choose to simulate the trigger events once the movement data had been collected. Depending on the aims and objective of the study there are advantages and disadvantages of all applied ambulatory assessment sampling strategies. For the time-based trigger method (1), the advantages include relatively easy implementation as well as minimizing of possible biases resulting from spatial restrictions. If the aim would be to examine within- and between-day variations of mood without accounting for events, activity or locations, a time-based trigger method might have been sufficient or even preferable. If instead the aim would be to understand the factors behind certain events, or if triggering at certain locations would be desired, a combined time and location-based trigger method might have been the better option.
In this study, three interactive ambulatory assessments sampling schemes were applied. One of them (method 2) incorporates both time and distance. A similar trigger method has earlier been applied for evaluating the positive affect related to physical activity (Ebner-Priemer et al., 2013). The present study reveals that the percentage of unique trigger positions is strongly increased when movement criteria are introduced. In addition, the number of trigger events in unique districts increases as well as the number of trigger evens in the types of land use rarely visited. For the objective of the present study, such an increase of the spatial distribution of trigger events is clearly desirable. Nonetheless, the number of trigger events outside the study region would increase notably. It may therefore be beneficial to include spatial restrictions in the trigger criteria applied.
Both method 3 and 4 include spatial criteria for the purpose of triggering. It can be observed that the amount of unique trigger positions is slightly decreased in comparison to the combined time- and distance trigger method (2). Furthermore, we observed that the amount of trigger events outside the study region was significantly reduced, while the number of trigger events at seldom visited types of land use increases. Considering the defined trigger criteria, it becomes evident that method 3 (incorporating land use) has a better distribution over various land use types, as expected, whereas method 4 (incorporating land use and population density) triggers in more unique districts. Moreover, method 4 triggers more often due to the spatial criteria and less often due to the time limit of 100 min in comparison to method 3. Apart from that, it seems that method 3 and method 4 yield comparable outcomes. For the purpose of this study, however, we favour method 3 because it requires less data and is not influenced by administrative borders/districts or any subjective decisions regarding, e.g., the population density thresholds applied in method 4. Finally, similar studies with participants in other areas could be conducted in order to validate the benefits of incorporating a location-based sampling scheme in an ambulatory assessment.
In an ambulatory assessment study addressing the correlation between the environment and mood, a time-based sampling scheme is likely to miss rare events, such as participant visits to a recreation area. By including real-time capture of the geographical coordinates of the participants, a self-report can be triggered at desired locations. The location-based sampling scheme increases the number of unique trigger locations as well as the amount of reports at rarely visited land uses. The increased variance in the collected data is expected to be of importance when addressing the relationship between the environment and mood.