Households in developing countries often rely on alternative shared water sources that exist outside of the datasets of public service providers. This poses a significant challenge to accurately measuring the number of households outside the public service system that use a safe and accessible water source. The article proposed a novel deep learning approach that utilizes a convolutional neural network to detect and geo-reference communal water points using Google Street View imagery. Using a case study of the Agege local government area in Lagos, Nigeria, the model processed 39 kilometres of street network in 26 minutes, successfully detecting 36 previously unregistered water points with 94.7% precision and US$0 out-of-pocket expenses. In doing so, it presents a highly precise, low-cost, and scalable solution to closing geospatial data gaps on WASH access in developing countries.
The article presents a novel deep learning approach that utilizes a YOLOv5 object detection model to locate shared water points using street-level imagery from Google Street View.
In a pilot of the Agege LGA in Lagos, Nigeria, the model processed 39 kilometres of street network in 26 minutes, successfully detecting 36 previously unregistered water points with 94.7% precision.
Model performance is evaluated in terms of inference speed, cost, and scalability.
As the international community races to achieve the ambitious target of universal access to safe water, sanitation, and hygiene (WASH) services, reliable data are crucial for monitoring progress. Current data collection typically relies on field surveys of households. While this approach is useful for measuring national or state-level progress, survey data often lack the sample size required to be statistically representative at lower administrative levels, concealing the substantial inequalities in water access that exist at the municipal or ward level. This limits the capacity of service providers to optimally target water infrastructure extension to underserved households (Pullman et al. 2014; Ntozini et al. 2015).
Water point mapping (WPM) offers an opportunity to monitor water access at lower administrative levels through compiling a registry of water point coordinate locations (Jiménez & Pérez-Foguet 2011). This practice has scaled through the development of global platforms such as the Water Point Data Exchange (WPDx), which contains nearly 700,000 records of water points compiled from primarily public and donor institutions (Global Water Challenge 2023). However, such databases often fail to maintain complete records of existing public water points and struggle to be regularly updated (Yu et al. 2019). Furthermore, many households in developing countries utilize alternative water sources such as private vendors and community-managed points that are missing from public registries, leading to significant coverage gaps in existing geospatial datasets (Andres et al. 2019; Stoler et al. 2019).
Several studies have used predictive modelling techniques to generate gridded estimates of WASH access. Ajisegiri et al. (2019) utilized a model-based geostatistics (MBG) approach commonly used in infectious disease mapping to predict access to WASH services in Nigeria at a 1 × 1 km2 spatial resolution. Yu et al. (2019) used a maximum entropy approach to predict the geospatial distribution of drinking water sources in Kenya using the WPDx database, also at a 1 × 1 km2 spatial resolution. Dejito et al. (2021) improved the spatial resolution of WASH access predictions to 250 × 250 m2 using satellite imagery, census data, and random forests regression in Colombia. However, there is a research gap in using deep learning techniques to identify the precise locations of water points in developing countries.
One under-explored data source for WPM and detection is street-level imagery available through platforms such as Google Street View (GSV). Street-level images, paired with recent advancements in computer vision techniques such as convolutional neural networks (CNNs), provide a rich source of visual information on the built environment. CNNs use layers of computational units or ‘neurons’ to process overlapping sections of an image, enabling them to detect and extract visual features from GSV imagery. In urban settings, CNNs have been successfully deployed to identify and predict urban features such as tree coverage and pedestrian behaviour using data derived from GSV images (Cai et al. 2018; Doiron et al. 2022). Computer vision analysis of street-level imagery offers a unique opportunity to detect the actual location of water points in developing countries which has yet to be explored in the WPM literature.
This article proposes a novel computer vision approach that uses a convolutional neural network to detect and geo-reference publicly accessible water points in GSV imagery in Lagos, Nigeria. In doing so, it presents a scalable, time-efficient, and low-cost method to quantify and locate water points in urban areas of developing countries.
Study area
Lagos, Nigeria is one of the largest coastal megacities in Africa, with a metropolitan area population of 24 million (C40 Cities 2019). Most of the population is unconnected to the public water utility – only 3.1% of Lagos residents receive piped water from on-premises connections and 44.4% of households rely on water sources (piped or non-piped) located off-premises – primarily motorized boreholes and improved hand dug wells (National Bureau of Statistics & UNICEF 2022). Given the limited coverage of the public utility, communal sharing of water resources is an important dynamic of water access in Lagos. More than 7,255 ‘publicly owned’ water points operate across Lagos state, the majority of which are financed by non-governmental entities such as the private sector, communities, or philanthropists, and are shared by members of a community. However, there is also a strong dynamic of sharing privately owned water facilities with neighbours or relatives; an estimated 18,138 water points are considered ‘publicly used’, suggesting a significant number of privately owned water points are shared communally, accounting for 45.5% of all water points in Lagos state (Federal Ministry of Water Resources et al. 2022).
Communal water points in Lagos state often have precarious management structures, with just 41.6% of publicly used water points having a caretaker in place. Consequently, one-third of publicly owned boreholes break down in their first year of operation, of which half remain non-functional for over a year (Federal Ministry of Water Resources et al. 2022). This necessitates proper data collection to continually monitor the reliability, functionality, and safety of communal water points.
The primary existing registry of public water points in Lagos State is the Geo-Referenced Infrastructure and Demographic Data for Development (GRID3) Nigeria Lagos Public Water Points Dataset, which contains a list of 1,180 communal and utility-managed water works, boreholes, and wells with latitude and longitude coordinates compiled between February 2018 and June 2019 (GRID3 Nigeria 2022). However, the GRID3 dataset contains just 16.3% of the total estimated publicly owned water points in Lagos state, illustrating the existing gap in geospatial data (Federal Ministry of Water Resources et al. 2022).
Model overview
The object detection CNN was built using the one-stage YOLOv5s architecture, based on its suitability for object detection and computational ease (Ultralytics 2020). One-stage detectors such as the YOLO family of models perform inference in a single pass through the network, typically enabling them to be faster and require less computing power at the expense of being less accurate than their two-stage counterparts. YOLOv5 is a family of five open-source object detection models, each with the same underlying architecture, but varying levels of layer width and depth. The YOLOv5s or ‘Small’ model is optimized to balance accuracy with fast detection and is well suited for the analysis of large datasets of GSV images with a high level of precision without requiring additional computational resources.
Ultimately, YOLOv5s is well suited for detecting water tanks in street-level imagery for two reasons. First, its rapid inference speed and low computational cost enables its use for analysing large image datasets from the GSV API without requiring expensive hardware. Second, the combined features of the PANet and multiple detection head layers allow the model to identify various sizes and resolution levels of objects, enabling the detection of water points located at varying distances and locations within street-level imagery.
The CNN was trained using a novel dataset of 215 labelled images of boreholes with faucets or handpumps captured from GSV images sampled from across Nigeria (Patel 2023). Each image in the dataset was annotated with a bounding box around the entire water point using Roboflow, and randomly allocated into training (70%), validation (20%), and testing datasets (10%) (Dwyer et al. 2022). The model trained for 150 epochs using stochastic gradient descent as an optimization function and a batch size of 16.
YOLOv5s optimizes weights by measuring three forms of loss – box loss, objectness loss, and classification loss. The box loss measures the error between the predicted and actual bounding box coordinates and the centre of each object. Objectness loss measures the difference between the predicted and actual probability of an object existing within a given region. Finally, classification loss measures the classification error of the model. Since this model is trained on a single-class dataset, classification error is equal to zero and is therefore omitted below. The model reached peak performance in epoch 148, where the validation loss was minimized and precision and recall metrics stabilized, indicating optimal model parameters. These weights were used for the final model.
The model generated predictions for each image and appended the records to the metadata CSV file saved during the image extraction process. The final output was a CSV file that contained the following attributes: date of image capture, panoramic ID, latitude, longitude, object class, and confidence level.
Model performance
While the model proved to be effective in a restricted study area, covering large urban areas can require processing up to 1 million images, requiring substantial computing resources, funds, and time (Cai et al. 2018). Thus, model scalability is a critical component of its usefulness to urban planners, governments, and service providers. The methods used in this article seek to identify cost- and time-effective methods of detecting water points in developing countries without the requirement of expensive hardware.
There are three types of financial costs incurred from deploying the object detection model. The first is the cost of downloading images from GSV. As of September 2023, the Street View Static API uses a tiered pricing model, charging 0.007 USD per image for the first 100,000 images, then decreasing to 0.0056 USD per image for large-volume downloads. All accounts are offered a $200 monthly credit, enabling a single user to download 28,571 images per month free-of-charge, sufficient to cover 357 km of street network (Google 2023). The second source of costs is acquiring computing hardware such as central processing units (CPUs) and graphics processing units (GPUs) to conduct inference on the downloaded images. To outsource these costs, inference was conducted in Google Colaboratory (‘Colab’) using a free cloud-hosted GPU, which is better-suited for computationally intensive tasks such as deep learning. The third cost is deployment of the deep learning model. For this research, Roboflow was used for image annotation, data pre-processing, model training, and deployment. Roboflow offers both free and premium platforms. The free platform allows inference on 10,000 images per month, with the potential to apply for up to 100,000 credits per month for open-source non-commercial projects related to research and education. The total cost depends on the desired time frame and street network coverage. Processing the estimated 12,000 km of GSV network in Lagos in a single setting would cost $5,316 and require between 139 and 423 h of processing time, which would exceed the limits of most free cloud GPUs. Practically, this can also be completed over a period of 34 months at no cost using monthly credits.
The choice of hardware is driven by multiple factors including the desired inference speed, particularly for large datasets or high-resolution images. To compare performance across processing units, inference was repeated on the Agege dataset using three GPUs and one cloud tensor processing unit (TPU) available through Colab's free-of-charge and premium subscription models to assess the resulting time efficiency improvement from more sophisticated hardware. Table 1 outlines inference time (in seconds) and speed (in frames per second) using each processing unit on the Agege dataset, which consists of 38.8 km of street network. It further estimates the time required to process 80,000 images or 1,000 km of street network using linear extrapolation (Cai et al. 2018).
Hardware . | Subscription . | Inference time (s) . | Inference speed (f/s) . | 1,000 km Inference time (s) . |
NVIDIA A100-SXM4 GPU | Premium | 4,829 | 0.63 | 127,163 |
NVIDIA Tesla V100 SXM2 GPU | Premium | 2,133 | 1.42 | 56,169 |
NVIDIA Tesla T4 GPU | Free | 2,073 | 1.47 | 54,589 |
Google TPU v2 | Free | 1,589 | 1.91 | 41,843 |
Hardware . | Subscription . | Inference time (s) . | Inference speed (f/s) . | 1,000 km Inference time (s) . |
NVIDIA A100-SXM4 GPU | Premium | 4,829 | 0.63 | 127,163 |
NVIDIA Tesla V100 SXM2 GPU | Premium | 2,133 | 1.42 | 56,169 |
NVIDIA Tesla T4 GPU | Free | 2,073 | 1.47 | 54,589 |
Google TPU v2 | Free | 1,589 | 1.91 | 41,843 |
For each hardware type, performance was evaluated in terms of total inference time (seconds), inference speed (frames per second), and estimated processing time for 1,000 km of street network using linear extrapolation (seconds).
While the article demonstrates strong performance in the Agege pilot, there are limitations to the interpretability of the model and availability of GSV images that must be considered to properly contextualize the results.
The object detection model used in the study can identify water points but lacks the capability to classify them as ‘public’ or ‘private’ based on visual features alone, particularly given the fluid nature of water point ownership and usage in Nigeria. In the absence of additional data or context, this study treats all identified water points as ‘publicly accessible’ because of the presence of access points with multiple taps or faucets. The model's outputs are only the physical locations of water points, and not qualitative attributes such as ownership and accessibility.
GSV imagery may not be contemporaneous to existing geospatial registries and thus may contain images of boreholes that are no longer operational or in existence. Accordingly, it is possible that the water points detected by the model were omitted from the GRID3 Nigeria dataset due to being non-functional. While functionality can be visually inferred using cues such as individuals using the facilities, this would require manual inspection to validate. For a comprehensive analysis of water accessibility, water point functionality, and affordability, collecting additional data through household surveys or manual inspection is essential. Furthermore, image metadata on capture dates should be carefully referenced against the data collection period of existing registries to ensure relevance.
GSV has additional dataset-specific limitations for WPM. GSV images are captured along the street network of a city; therefore, the model can only detect points that are visible from the nearest road. This limitation restricts the model's ability to identify water points located in areas not accessible by vehicles or outside the scope of the street network. This includes other types of facilities such as dug wells or piped connections, which are typically sited underground or at a distance from roads and are thus often unobservable from street-level imagery. Thus, when interpreting the results, it is important to consider that the model's detections are not likely to be an exhaustive list of the water points available in each study area. Furthermore, the methodology is restricted to areas with deep GSV coverage. In sub-Saharan Africa, this is primarily limited to major urban areas such as Accra, Nairobi, Kigali, Johannesburg, and Dakar. Finally, the coordinates associated with each water point in the GSV images reference the location where the image was captured, which may not match the exact location of the water point. Additional data sources or ground-truthing efforts may be necessary to validate and refine the locations of water points detected through GSV.
This article proposed a novel deep learning approach to detect and geo-reference publicly accessible water points from GSV images. Using a convolutional neural network built on the YOLOv5s architecture and trained using a custom dataset of street-level imagery, the model was able to detect 36 previously unregistered communal water points in Agege with 94.7% precision and a processing time of less than an hour. This demonstrates that deep learning offers a highly precise, low-cost, and scalable solution to close geospatial data gaps on communal water points in developing countries. While the object detection model and GSV imagery offer valuable tools for mapping and identifying water points, policymakers should approach model outputs with a clear understanding of their limitations. The inability to directly classify qualitative features of water points and the constraints of GSV data coverage underscore the need for complementary data collection methods and a cautious interpretation of findings in the context of measuring the overall level of water access in the study area. However, the model's outputs can streamline the surveying process by identifying precise locations for manual inspections, ultimately providing a more accurate understanding of the existing water infrastructure and its implications for the water access within the local community.
The author gratefully acknowledges helpful feedback from MIT Senseable City Lab reviewers, DEDP research seminar participants, and two anonymous referees.
All relevant data are included in the paper or its Supplementary Information.
The authors declare there is no conflict.