In the spirit of the 50th anniversary of Earth Day, our objective was to follow the Global Earth Challenge’s call for the use of citizen science data that is findable, accessible, interoperable, and reusable (FAIR, globalearthchallenge.earthday.org) [16]. By leveraging open data created by the three citizen science programs, we created GMOD with the overall aim to increase awareness and develop consistent, real-time surveillance of mosquitoes globally with the final target of reducing the global burden of diseases. Partnering with GLOBE Observer, MosquitoAlert, and iNaturalist, our process began with the preparation, processing, and eventual integration of OCG-standard data.
Data sourcesGLOBE observer—mosquito habitat mapper and land coverThe Global Learning and Observations to Benefit the Environment (GLOBE) is an international science and education program sponsored by NASA and supported by NSF, NOAA, and the U.S. Department of State. Established in 1995, the GLOBE Program was originally implemented in K-12 classrooms. In 2016, the GLOBE Observer mobile application was released, enabling citizen scientists of all ages to participate in NASA science. GLOBE Observer is available for use in 15 languages across 127 countries. Within the application are two tools of primary interest for this research: the Mosquito Habitat Mapper (MHM) and Land Cover tool. MHM, launched in 2017, supports citizen science efforts to identify, locate, and eliminate breeding sites of medically important mosquitoes from Aedes (Ae.), Anopheles (An.), and Culex (Cx.) genera. In addition to mosquito specimens, MHM also enables users to document suitable habitats (e.g. standing water, tires, etc.) along with larvae and pupal observations (presence/absence and number). Users are encouraged to identify their mosquito submissions to genus, but are not required. The application guides users through a visual key to identifying mosquito larvae based on anatomical characteristics of the distal segment of the abdomen under magnification. Characteristics evaluated include the presence, absence, and appearance of the siphon, saddle, pecten, tufts, and comb scales.
Unique to MHM, users are prompted to report if they have the ability to mitigate mosquito habitats, by removing water and/or covering containers—a process called source reduction. This feature serves two beneficial purposes by (1) encouraging citizen scientists to be actively involved in community-based mosquito surveillance, and (2) actively reducing the burden of mosquito-borne diseases locally. The GLOBE Land Cover tool enhances characterization of the mosquito habitat observations collected with the MHM tool by allowing users to provide ecological conditions and additional photos of the environment around the targeted habitat. The interface has a built-in compass that orients users to provide six oriented photos for each location: north, south, east, west, up, and down.
Mosquito alertLaunched in 2014 and available in 18 languages, Mosquito Alert is managed by the following public research institutions: the Centre for Ecological Research and Forestry Applications (CREAF, https://www.creaf.cat/), Pompeau Fabra University (UPF, www.upf.edu), Catalan Institution for Research and Advanced Studies (ICREA, https://www.icrea.cat/), and the Blanes Center for Advanced Studies (CEAB-CSIC, http://www.ceab.csic.es/en/). Initially focused on controlling diseases spread by Ae. albopictus in Spain, the program has expanded to include citizen science collections targeting Ae. aegypti, Ae. japonicus, Ae. koreicus, and Culex spp. Mosquito Alert prompts users to send observations of adult mosquitoes, breeding habitats, and/or bites encountered. All adult mosquitoes submitted are validated by accompanying photographs by a team of entomologists.
iNaturalistLaunched in 2008, iNaturalist is a social network of citizen scientists connected via shared observations of biodiversity, extending to plants, animals, and fungi. However, insects comprise a significant portion of all submissions, amounting to ~ 25% of all observations. Of all insect submissions, mosquitoes total ~ 78,000 (as of mid-April, 2023), representing 407 species globally. iNaturalist relies on the method of identifying observations based on crowdsourcing. More specifically, observations classified as “Research Grade” maintain at least a 2/3 community member agreement for identifying species, or the next finest taxonomic level, if species level is not possible. Uniquely, iNaturalist also uses tens of millions of photos to train artificial intelligence (AI) models for identification suggestions on more than 38,000 taxa [17].
Application development pipelineMicrosoft Azure data factory and storageData from each of the three sources are accessed via public URLs and stored in the Microsoft Azure Cloud. Using a scheduled Azure Data Factory pipeline, raw data is accessed daily from their respective source URLs and copied into an Azure Blob storage container called “Raw Data”. GLOBE MHM and Land Cover are accessed in a standard .csv format. Mosquito Alert is accessed in a flat-object .json format. iNaturalist, which uses unicode transformation format (UTF-16LE), is accessed as a lengthy multi-tiered nested-object .json format. The complexity of the iNaturalist data was so extensive that a special request to Microsoft was made to have it modify the way its Azure Data Factory copied .json data. Further, the data wrangling tool used to clean and standardize the iNaturalist data also required modification by its creators to enable such a large .json string to be ingested into its memory. Storage standards include that all data copied into the Azure Cloud has scheduled daily backups for 12 months being in a “hot” or quick-access state and after 18 months are put in “cold” or archived storage.
Trifacta—data standardizationWithin the Azure Cloud, raw data is imported into a service called Trifacta, an application designed for data wrangling of large, raw datasets. Following the Open Geospatial Consortium (OGC) standard for data formatting, each data set is cleaned and restructured to match the OGC’s SensorThings format, enabling cross source data analysis [18]. Every change made to the data structure is tracked and recorded for documenting data provenance. Within Trifacta, raw data are transformed to the OGC standard with each transformation being documented in a “recipe.” Each recipe contains customized formatting of data, including text and number conversion, creation of new fields, calculations, etc. and can be found in Additional file 1. The resulting data set from each recipe is then exported in .json format to an Azure Blob storage container called “Derived Data.” Each recipe is scheduled to run daily after the raw data has been copied into the Azure cloud via Azure Data Factory. Since each dataset follows the OGC standard, all data is stored as nested-objects in .json format. To enable ingestion to the Esri ArcGIS® Online service, datasets are run through one more recipe which presents data as a flat .json data set.
End user display customization and experience—Esri ArcGIS online applicationsThis project’s end user visual displays are created within the ArcGIS Online application interface. After Trifacta completes its daily transformation of the raw data, Esri’s ArcGIS Online web application (via ArcGIS® Notebooks) pulls each of the four flattened OGC-.json formatted data sets (GLOBE MHM, GLOBE Land Cover, MosquitoAlert, and iNaturalist) into the ArcGIS Online application interface and stores them as feature layers. Feature layers are then incorporated into the Web Map application element. This integration step is the first instance where all four data streams are displayed simultaneously. This map serves as the spatial representation of all submissions of mosquito, habitat, and land cover data and is the centerpiece to the overall dashboard visualization. In addition to the feature layers, each data set can be accessed via a RESTful API endpoint provided by Esri ArcGIS Online. An example flow of the steps performed by Python scripts in ArcGIS Notebooks can be found in Additional file 1.
Pop-upsWithin the web map application element, ArcGIS Online provides a variety of options to display and interact with data in the maps. Among these options is the pop-up configuration—a custom information box that “pops-up” on any given observation point when clicked by a user. With custom elements coded in the ArcGIS® Arcade expression language, pop-ups display alphanumeric text and media to aid in data exploration (Arcade code examples are found in Additional file 1).
DashboardsArcGIS Online includes ArcGIS® Dashboards, a dashboard creation tool that seamlessly allows users to build their display through a variety of build-in tools, including elements (e.g. headers, widgets, footers, graphs, data selectors, numeric indicators, etc.), layouts, and themes. Furthermore, within each tool are several additional customizable options for including media, dynamic text, external links, and other features.
Experience builderThe ArcGIS Dashboards tool is an excellent method for visually representing data dynamically. However, ArcGIS® Experience Builder provides a full suite of complementary tools that enables deeper data interaction and exploration, providing a true, custom website look and feel. The creation and success of GMOD relies on the premise of being free to everyone, intuitive, and most importantly, accessible. This extends to viewership on all platforms—desktop computer, mobile phone, and/or tablet screens. Given each of these screen types are variable in resolution and aspect ratios, separate dashboards were created for each (ArcGIS Dashboards creation tool provides options for different platforms). Experience Builder allows for all three screen types to be integrated, so that all platforms will share the same web address link. Another key feature of Experience Builder is the ability to link external websites as “buttons”. For use in GMOD, these clickable tabs provide an option to explore the raw and summarized data, view publications relevant to the dashboard, and an option for users to subscribe to our mailing list (connected through the included application ArcGIS® Survey123).
HubThe final key tool is ArcGIS® Hub℠, a cloud platform that facilitates rapid sharing and configuration of content. When ArcGIS® Open Data is enabled for an ArcGIS Online organization, Hub can be used to share an authoritative data repository that grants users full access to a variety of data formats, derived from the stored feature layers (.csv, .kml, .shp, GeoJSON, or file geodatabase). This feature is critical in our mission to fulfill data accessibility, transparency, and equity—all essential in promoting data sharing with those interested in using it. The data links to Hub are added via customizable “buttons”, created using Experience Builder. A summary of this entire workflow is available in Fig. 1.
Fig. 1Flow diagram of data pipeline, from source to end-user dashboard product
Comments (0)