Julian, Bustos, Kwong, Badilla, Lee, Bischel, and Canales: Quantifying human-environment interactions using videography in the context of infectious disease transmission

Quantifying human-environment interactions using videography in the context of infectious disease transmission


Quantitative data on human-environment interactions are needed to fully understand infectious disease transmission processes and conduct accurate risk assessments. Interaction events occur during an individual’s movement through, and contact with, the environment, and can be quantified using diverse methodologies. Methods that utilize videography, coupled with specialized software, can provide a permanent record of events, collect detailed interactions in high resolution, be reviewed for accuracy, capture events difficult to observe in real-time, and gather multiple concurrent phenomena. In the accompanying video, the use of specialized software to capture humanenvironment interactions for human exposure and disease transmission is highlighted. Use of videography, combined with specialized software, allows for the collection of accurate quantitative representations of human-environment interactions in high resolution. Two specialized programs include the Virtual Timing Device for the Personal Computer, which collects sequential microlevel activity time series of contact events and interactions, and LiveTrak, which is optimized to facilitate annotation of events in real-time. Opportunities to annotate behaviors at high resolution using these tools are promising, permitting detailed records that can be summarized to gain information on infectious disease transmission and incorporated into more complex models of human exposure and risk.


Infectious diseases, such as enteric and respiratory diseases, are frequently transmitted through the environment. This process is often conceptualized with the F-diagram, a model of the transmission of pathogens through one of six environmental reservoirs: fluids, fingers, fields, foods, flies, and fomites (Figure 1). The F-diagram describes the movement and fate of pathogens as a process originating from an infected person and moving through the environment to a susceptible person (Kawata, 1978). The process may be described as linear (i.e., open defecation contaminates drinking water, Figure 1, arrows A→D) but is frequently more complicated. For example, hand contamination from changing an infant’s diaper is transferred to stored drinking water during collection and subsequently ingested (Figure 1, arrows B→C→D).

Implicit in the F-diagram, via the arrows connecting the environmental reservoirs, are human-environment interactions, which describe the impacts of a person’s movement through and contacts with the environment on pathogen transmission. The interactions relevant to disease transmission include both those of the infected person and the susceptible person. Humanenvironment interactions that influence disease transmission occur across all scales, from large (i.e., global migration patterns) (Sorichetta et al., 2016) to small (i.e., a person’s individual contacts with contaminated surfaces) (Julian and Pickering, 2015). Accounting for these interactions helps in the understanding and control of environmentally-transmissible diseases.

Human-environment interactions can be quantified using diverse methodologies that each provide value in understanding infectious disease transmission. Amongst the most common methodologies are: i) mining census data; ii) surveys; iii) diaries; iv) structured observations; v) electronic tracking (i.e., radiofrequency identification, GPS, accelerometers); vi) videography. Videography methods to capture interactions are playing an increasingly important role in assessing the contribution of environmental pathways of pathogen transmission.

As relatively low-cost and simple methods, surveys, diaries, and structured observations have been broadly applied to understand human behavior. Surveys are retrospective questionnaires, and are increasingly taking advantage of mobile technologies (i.e., mHealth) to disseminate information, survey health, and track disease (World Health Organization, 2011). Diaries encourage self-observation and recording of targeted behaviors (i.e., locations, activities, diet, and/or social networks) (Read et al., 2008). Structured observations shift data collection from the study participants to trained observers who monitor people’s behaviors in real-time. Structured observations have been used to, for example, quantify environmental exposures (Teunis et al., 2016) and water, sanitation, and hygiene behaviors (Ram et al., 2010). Despite their widespread use, these tools often introduce bias (i.e., social desirability, recall, reporting, and/or observer bias), require a priori knowledge of which behaviors to target, and provide only limited data resolution.

Automated tracking technologies (i.e., radiofrequency identification devices (RFID), Global Positioning System (GPS) tracking, accelerometers, and pressure sensors) have been employed to study infectious disease transmission. For example, RFID and GPS have been used to monitor and quantify social contacts and accelerometers have been used to monitor hygiene intervention compliance (Ram et al., 2010). Sensors for recording human-environment interactions can collect large amounts of data over extended time-scales with little investment. However, human behavior monitoring with tracking technologies remains constrained by technological limitations. GPS, for example, provides data on location, accelerometers track local movements, and pressure sensors track changes in flow or volume – useful for monitoring water usage. Extending data to human-environment interactions requires inferences about the interpretability of the limited data.


Videography offers the opportunity to record and quantify human-environment interactions in high resolution through the collection, annotation, and analysis of digitized data. Unlike surveys, diaries, or structured observations, videos provide a permanent objective record. The videos can be reviewed to ensure accuracy, quantify inter-observer variation of targeted activities, and/or observe multiple concurrent phenomena independently. The use of multiple data collectors annotating the same video may help reduce observer bias. When used appropriately, hidden capture techniques may reduce social desirability and/or reactivity. Videos can also capture data that would be difficult or impossible to observe in real-time, such as hand configuration during handobject contacts (AuYeung et al., 2008).

Since videographic methods may capture personal or private information, culturally sensitive data protection and management plans should be developed and institutional review board approval should be sought.

Video collection

Videos may be collected via third-person perspective, firstperson perspective (wearable), or hidden cameras. Depending on the collection method, videographers may need to stay available on-site during recording. Hidden and third-person perspective cameras may involve stationary cameras tracking a single location (i.e., observing hand hygiene stations) or may be carried by a data collector. First-person perspective cameras may be worn either on the head or chest to capture, for example, hand contact events, mouthing events, and/or social contacts (Julian and Pickering, 2015). Limitations to recording length (e.g., storage, battery life) need to be considered in protocols.

Video annotation

An important challenge in the use of videography is the need to annotate, or translate, the video into a digital record of events that can be analyzed downstream. Annotation by hand follows methods employed by structured observations, where protocols are developed for the observation and recording of specific, targeted events (i.e., monitoring timing and frequency of handwashing (Ram et al., 2010)). Alternatively, several software packages have been employed to facilitate video annotation.

The recently updated Virtual Timing Device for the Personal Computer [VTDPC] (University of Arizona, Tucson, AZ, USA) was developed to quantify human-environment interactions from previously obtained videos (Zartarian et al., 1997; Ferguson et al., 2006). In brief, a video is annotated by the software user using a palette of customizable categories of actions or objects. For example, an action category may include child’s hand in mouth, object in mouth or touching; an object category may include plastic, food, or soil; and a location category may include indoor, outdoor or more specific locations. The integrated video plays when objects from each category are selected and can be paused by unselecting one or more objects. VTDPC records the items selected in each category as well as the duration of the selection. The resultant time-series of contact events is often referred to as microlevel activity time series [MLATS] (Zartarian et al., 1997; Ferguson et al., 2006). The output is a sequential list of selections from within each category, the duration of the selection, and associated metadata (i.e., date, time, data collector’s ID).

LiveTrack (Stanford University, Stanford, CA) was recently developed to complement VTDPC for annotation of events observed in real-time (i.e., structured observations). In brief, LiveTrak is a tablet-based software package that provides similar functionality to VTDPC through the a priori creation of a palette containing categories of objects. Like VTDPC, items are selected from each category and the output is a list of selections from within each category with the duration of each selection and associated metadata (i.e., date, time, data collector’s ID).

Annotation of videos is expensive in terms of labor, in part because of the typical requirement for repeat viewing. For example, 4-5 hours of annotation may be required for one hour of video. The development of computer-vision algorithms for automatic videographic annotation is thus a promising and rapidly advancing field that could alleviate these issues and improve privacy protection. However, automated annotation of natural movements of objects, like hands in uncontrolled environments, remains challenging.

Videographic analysis, similar to methods reliant on trained enumerators, are subject to inter- and intra-observer variation (Ferguson et al., 2006). Protocols that include training and inter- or intra- observer comparison are needed to reduce observer error and misclassification.

Data analysis

Analysis of the annotation output can be relatively simplistic, typically involving the use of summary statistics of frequency and duration of targeted events (Zartarian et al., 1997; Clack et al., 2017). More complex analyses utilize the micro-level activity time series to model sequential events for use in temporal exposure, dose, and risk assessment (Canales and Leckie, 2007; Julian and Pickering, 2015).

In the accompanying video we motivate and highlight the use of videography for the field of infectious disease transmission through demonstrations of VTDPC and LiveTrak. We demonstrate the capture of hand contact events using the software, which are otherwise difficult to observe through traditional methods. The example focuses on human-environment interactions at small, local scales. However, videography is readily extensible to other scenarios, including: i) human and/or animal migration patterns, social interactions and/or geographic distributions; ii) behaviors and practices around water, sanitation, hygiene, and/or energy; and/or iii) performance of targeted activities for efficiency and/or safety evaluations.


To conduct accurate health risk assessments and better control infectious disease transmission a greater understanding of humanenvironment interactions is needed. Here, we describe using videography and specialized software to better quantify human-environment interactions. Two specialized programs include VTDPC, which collects sequential microlevel activity time series of contact events and interactions, and LiveTrak, which is optimized to facilitate annotation of events in real-time. Opportunities to annotate behaviors at high resolution using these tools are promising. Further breakthroughs in computer-vision algorithms for annotation will be useful to fully unleash the power of videography in quantifying human movements for diverse applications. Windows Movie Maker and Microsoft PowerPoint were utilized in creating media for this work.


The authors would like to thank Tristan Schubert for contributions to the accompanying tutorial.



W Auyeung, RA Canales, JO Leckie, 2008. The fraction of total hand surface area involved in young children’s outdoor handto- object contacts. Environ Res 108:294-9.


RA Canales, JO Leckie, 2007. Application of a stochastic model to estimate children’s short-term residential exposure to lead. Stochast Environ Res Risk Assess 21:737-45.


L Clack, M Scotoni, A Wolfensberger, H Sax, 2017. ‘First-person view’ of pathogen transmission and hand hygiene–use of a new head-mounted video capture and coding tool. Antimicrob Resist Infect Control 6:108.


AC Ferguson, RA Canales, P Beamer, W Auyeung, M Key, A Munninghoff, KT Lee, A Robertson, JO Leckie, 2006. Video methods in the quantification of children’s exposures. J Expo Sci Environ Epidemiol 16:287-98.


TR Julian, AJ Pickering, 2015. A pilot study on integrating videography and environmental microbial sampling to model fecal bacterial exposures in peri-urban Tanzania. PloS One 10:e0136158.


K Kawata, 1978. Water and other environmental interventions-the minimum investment concept. Am J Clinil Nutrition 31:2114-23.


PK Ram, AK Halder, SP Granger, T Jones, P Hall, D Hitchcock, R Wright, B Nygren, MS Islam, JW Molyneaux, SP Luby, 2010. Is structured observation a valid technique to measure handwashing behavior? Use of acceleration sensors embedded in soap to assess reactivity to structured observation. Am J Trop Med Hyg 83:1070-6.


JM Read, KTD Eames, WJ Edmunds, 2008. Dynamic social networks and the implications for the spread of infectious disease. J Royal Soc Interface/Royal Soc 5:1001-7.


A Sorichetta, TJ Bird, NW Ruktanonchai, E Erbach-Schoenberg, C Pezzulo, N Tejedor, IC Waldock, JD Sadler, AJ Garcia, L Sedda, AJ Tatem. Mapping internal connectivity through human migration in malaria endemic countries. Scientific data 2016;3:160066.


PFM Teunis, HE Reese, C Null, H Yakubu, CL Moe, 2016. Quantifying contact with the environment: behaviors of young children in Accra, Ghana. Am J Trop Med Hyg 94:920-31.


World Health Organization, 2011. mHealth: New horizons for health through mobile technologies: second global survey on eHealth. Available from: www.who.int/goe/publications/goe_mhealth_web.pdf.


VG Zartarian, AC Ferguson, CG Ong, JO Leckie, 1997. Quantifying videotaped activity patterns: video translation software and training methodologies. J Expo Anal Environ Epidemiol 7:535-42.

Figure 1.

The F-diagram, a conceptual model of the transmission of infectious disease from an infected to a susceptible person through environmental reservoirs. Circles in solid black represent environmental reservoirs, dashed curved lines in red represent the impact of interventions, and dotted straight arrows in black represent specific scenarios of pathogen movements through the reservoirs. The curved lines are dashed to represent the incomplete protection of interventions due to insufficient technological effectiveness and/or lack of compliance.

Abstract views:


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

Copyright (c) 2018 Timothy R. Julian, Carla Bustos, Laura Kwong, Alejandro D. Badilla, Julia Lee, Heather N. Bischel, Robert A. Canales

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
© PAGEPress 2008-2018     -     PAGEPress is a registered trademark property of PAGEPress srl, Italy.     -     VAT: IT02125780185     •     Privacy