Data Sources
-
Advancing global NWP
Humidex from Advancing global NWP through international collaboration. Weekly mean anomalies have been calculated relative to a 20-year model climatology. The areas where the ensemble forecast is not significantly different from the climatology, according to a WMW-test, are blanked. -
All India Health Centres
All India Health Centres Directory as on 7th October, 2016 provided by Open Government Data (OGD) Platform India. -
Census Estimates
Granular census estimates either provided to us by local governments or created by our own projection based on census data.
An official census estimate for Patan can be found here. -
EarthEnv-DEM90
Digital elevation model distributed in EHdr format, unprojected and referenced to the WGS84 geodetic datum. -
Forests Derived From ESA WorldCover 2021
The ESA WorldCover 10m 2021 provides a land cover map with 11 classes in a GeoTIFF format.
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S., Lesiv, M., Herold, M., Tsendbazar, N.E., Xu, P., Ramoino, F., Arino, O., 2022. ESA WorldCover 10 m 2021 v200. -
GHS
Settlement Model Grid
Publicly available data downloaded as GeoTIF to categorize buildings into Urban or Rural categories.
-
Global Surface Water
A collection of several water indicators provided in a consolidated data set.
Jean-Francois Pekel, Andrew Cottam, Noel Gorelick, Alan S. Belward, High-resolution mapping of global surface water and its long-term changes. Nature 540, 418-422 (2016). (doi:10.1038/nature20584) -
Google-Microsoft Open Buildings (combined and published by VIDA)
Publicly available data contain a catalogue of buildings with specific coordinates and polygons (i.e. shapes of the buildings) for any given country or region. -
Kenya Hospitals from Health Facilities in Sub-Saharan Africa
This master list of health facilities was developed from a variety of government and non-government sources from 50 countries in sub-Saharan Africa. Each data record represents a geolocated health facility. -
Kenya Informal Settlement Boundaries from
Slum Dwellers International and
SCollective
provide geolocated boundaries of slums in Nairobi, Kenya. -
Ookla's Open Data
Open data sets available on a complimentary basis to help people make informed decisions around internet connectivity and internet speed. -
Open Buildings 2.5D Temporal Dataset
Provides a worldwide coverage of building height data in a raster format with effective spatial resolution of 4 meters used for the building height calculation process. -
Open Energy Maps
Providing building-level electricity access and demand estimates for Kenya. -
Open Street Map
(OSM)
Publicly available data contain a catalogue of buildings with specific coordinates and polygons (i.e. shapes of the buildings). Data are downloaded as shapefiles (.shp) from geofabrik.de. -
Overture Maps
Publicly available data contain a catalogue of buildings with specific coordinates and polygons (i.e. shapes of the buildings). -
Sentinel-2
Cloud-Optimized GeoTIFFs
Sentinel-2 satellite images are downloaded from the public S3 bucket Sentinel-2 Cloud-Optimized GeoTIFFs containing satellite images of the Earth's surface divided into pre-defined tiles.
-
Shelter Associates
Shelter Associates creates granular spatial data by mapping settlement patterns and infrastructure in and around each target slum. This spatial data is supplemented with survey data collected at the household level and analyzed to identify the most vulnerable population and plan targeted interventions.
Slum boundaries are utilized to train the informal settlement detection model.
Data Consolidation
The building footprints obtained from Google-Microsoft Open Buildings (combined and published by VIDA) are merged with select buildings from OpenStreet Maps building footprints. The building footprint catalog is further enriched by other data from public sources to provide fine grained information on the building level. Some of the examples are provided below, for the full description please visit the following GitHub repository.
Global Human Settlement Layer (GHSL) provides a publicly available data layer named Settlement Model (SMOD) grid, which is used to classify buildings into Urban/Suburban/Rural categories based on their location.
Open Buildings 2.5D data layer is used to calculate the height of buildings. Building height is estimated in meters and as number of floors, based on which the gross floor area of each building is computed in square meters. Each floor is considered to be three meters tall, while the first floor is considered to be higher, 4.5 meters tall.
Open Energy Maps provides electricity access and electricity demand related estimates for most of the buildings in our footprint catalog in Kenya. Building footprint catalogs used by Open Energy Maps and Open Building Insights are not identical, requiring the use of a matching algorithm to map the building footprints from the two different sources, which are describing the same building.
Classification Models
Open Building Insights provides building classification into residential, non-residential and industrial based on their use. To achieve this two machine learning models were created using two different methodologies.
First, a high-quality training data set is required to train a machine learning model. The classification model used by Open Building Insights leverages a training data set derived from publicly available Open Street Map (OSM) and Overturemaps data. Both provide an alternative building footprint catalogue that is created by humans and contains auxiliary information about buildings, e.g., their type. Given the nature of human-created data, OSM is sparser compared to the VIDA-derived data set and contains auxiliary information leading to a very small portion of OSM data being actually used to train the machine learning model.
To improve the size of the training data set additional labelled buildings are derived from OSM tags and landuse attributes by cross-referencing them to building footprints. For more detailed description visit the following Github repository.
The first model created is based on image recognition, deriving building use from openly available
satellite image of the building roof.
A custom neural network architecture based on the DenseNet121 model with preloaded imagenet
weights is chosen as base model for given classification task. The DenseNet121 model is a
specific variant of the DenseNet architecture, which is a convolutional neural network (CNN)
designed for image classification tasks. The model was first introduced in Huang et al. (2017).
For additional details visit the following
GitHub repository.
The other model is trained to classify buildings based on their metadata, i.e., numeric attributes
describing the building itself and its neighborhood. Examples of such attributes are the area of the
building footprint, the squareness of the building, the amount of other walls (faces), or the building
density of the neighborhood around the building.
For additional details visit the following
GitHub repository.
Informal Settlement Detection Model
A custom machine learning model is created to distinguish formal and informal 50x50m
large urban tiles. The model is trained to find urban informal settlements based on the
attributes of the built up area, such as the density of buildings, their sizes and heights.
To obtain informal tiles for training the publicly available Shelter Associates data was used,
which organization surveys informal settlements inside the State of Maharashtra and
provides informal settlement boundaries.
For more details please visit the following
GitHub repository.
Distance Metrics
In case of action plans it is important to understand how far buildings are from hospitals or the road network in general. For example, a building located some distance away from the road network might not be immediately accessible to provide help, when needed.
For this reason several distances are computed, such as the aforementioned direct (air) distances
to the road network, or a direct (air) distance to the closest forest or water bodies. For
some features, on the other hand, it makes sense to compute travel distances on the road network, e.g.,
ambulances can take the shortest road from the building to the hospital. Such shortest roads
are computed using the Open Source Routing Machine and the
travel time is also estimated for cars and by foot.
For more details please visit the following
GitHub repository.
Exposure Metrics
Past Humidex data is used to indicate vulnerability to heat events. Days from the past are
split into four categories based on severity of heat events and calculated for every building.
Heat exposure can be estimated based on these figures from the past.
For more details please visit the following
GitHub repository.
Socio-economic Metrics
The main idea of estimating occupancy for buildings lies in disaggregation of official
statistics into buildings proportionally to a selected mix of building attributes, e.g.,
the gross floor area of buildings. The idea of disaggregating census data
to estimate population density in dense grids is not new and it is used
in several papers. This idea is re-visited and utilized to disaggregate population to
residential buildings, estimating their occupancy.
For more details please visit the following
GitHub repository.
Solar Rooftop Potential
A new U-Net convolutional neural network is designed and trained to develop a normalized
Digital Surface Models (nDSM) from open-source Sentinel-2 satellite imagery,
resampling - in the process - the data from 10 meters to 50-centimeter spatial resolution.
For more details please visit the following
GitHub repository.
Modeling Urban Growth
Modeling Urban Growth (MUG) is an open-source AI model that predicts patterns of future
urbanization to inform sustainable infrastructure and energy planning. Developed through
the AI Alliance with support from IBM and Sustainable Energy for All, MUG combines
satellite, geographic, and demographic data to forecast where cities are likely to expand.
The model empowers decision-makers to anticipate growth, prioritize resources, and plan
resilient urban systems. MUG is publicly available for retraining and adaptation to any
geography using open data.
MUG is an Al Alliance project, and is publicly available and
open-source on GitHub.
References
A graph-based neural network approach to integrate multi-source data for urban building function classification in Computers, Environment and Urban Systems, Vol. 110, pp. 102094, by Kong B., Ai T., Zou X., Yan X. and Yang M., 2024.
Advances in Small Area Population Estimation in the Absence of National Census Data by Lazar N. A., Boo G., Chamberlain R. H., Nnanatu Ch. Ch., Darin E., Leasure R. D., Yankey O., Gadiaga A., Juran S., de la Rua L., Espey J. and Tatem J. A., preprint.
Deep Neural Network Regression for Normalized Digital Surface Model Generation With Sentinel-2 Imagery in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 8508-8519 by K. Müller et al., 2023.
Deep Residual Learning for Image Recognition by He, Zhang, Ren, Sun, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778.
Densely Connected Convolutional Networks by Huang, Liu, van der Maaten, Q. Weinberger, 017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2261-2269.
Disaggregating population data for assessing progress of SDGs: methods and applications in INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2022, VOL. 15, NO. 1, 2-29 by Qiu, Y.; Zhao, X.; Fan, D.; Li, S. and Zhao, Y.
European Commission: Eurostat, Applying the degree of urbanisation - A methodological manual to define cities, towns and rural areas for international comparisons - 2021 edition, Publications Office of the European Union, 2021.
GHS-SMOD R2023A - GHS settlement layers, application of the Degree of Urbanisation methodology (stage I) to GHS-POP R2023A and GHS-BUILT-S R2023A, multitemporal (1975-2030). European Commission, Joint Research Centre (JRC) by Schiavina, Marcello; Melchiorri, Michele; Pesaresi, Martino (2023). Dataset doi: 10.2905/A0DF7A6F-49DE-46EA-9BDE-563437A6E2BA.
High-Resolution Building and Road Detection from Sentinel-2 by W. Sirko, E.A. Brempong, J.T.C. Marcos, A. Annkah, A. Korme, M.A. Hassen, K. Sapkota, T. Shekel, A. Diack, S. Nevo, J. Hickey, J.A. Quinn, 2023.
Multimodal Data Fusion for Estimating Electricity Access and Demand by Stephen J. Lee, 2023.
Predicting building types using OpenStreetMap by Atwal K. S., Anderson T., Pfoser D. et al., 2022.
World Settlement Footprint 3D - A first three-dimensional survey of the global building stock by Esch, Brzoska, Dech, Leutner, Palacios-Lopez, Metz-Marconcini, Marconcini, Roth and Zeidler, 2022.
Source Code
The source code of the solution is available on GitHub under the license specified for the repository.
Data License
Open Building Insights data is shared under the Open Data Commons Open Database License (ODbL) v1.0 license.