Model Training Approach
A high-quality training data set is required to to train a machine learning model. The classification model used by Open Building Insights leverages a training data set derived from publicly available Open Street Map (OSM) and Overturemaps data. Both provide an alternative building footprint catalogue that is created by humans and contains auxiliary information about buildings, e.g., their type. Given the nature of human-created data, OSM is sparser compared to the VIDA-derived data set and contains auxiliary information leading to a very small portion of OSM data being actually used to train the machine learning model.
To provide information about the buildings used for model training the training data set, i.e. , those buildings from OSM, which have their type filled, are merged into the VIDA-derived building catalogue with their classification determined by OSM instead of the model.
Building Classification
The developed machine learning model is trained on Sentinel-2 satellite images and is capable of identifying various building typologies within a geography. The envisioned priority tree for building classification is shown. By progressing down the classification tree, the model should be able to classify a broader variety of building types. The current state consists of a model operating at the 1st priority level and is therefore capable of classifying buildings as either residential or non-residential within this country area.
Model Architecture
A custom neural network architecture based on the DenseNet121 model with preloaded imagenet weights is chosen as base model for given classification task. The DenseNet121 model is a specific variant of the DenseNet architecture, which is a convolutional neural network (CNN) designed for image classification tasks. The model was first introduced in Huang et al. (2017).
The DenseNet architecture is characterized by multiple dense blocks separated by transition layers, which reduce the feature-map sizes (spatial dimensions, i.e. width and height) through convolution and pooling operations. A sketch of such a sequence taken from the original paper is shown in Figure 9: Example of a DenseNet architecture. A dense block is constructed such that each layer receives input from all preceding layers. Therefore, the feature maps from all previous layers are concatenated and used as input for the current layer. All layers in the DenseNet121 base model are set to be trainable. In this way, during training, the weights of the layers in the model will be updated to better suit our specific task.
To further refine the custom model, additional input was provided in a form of building metadata, namely its footprint area and urban classification (based on SMOD), this input being concatenated to the input image at the first layer.
Training the model
The main data set for training the model consists of Sentinel-2 Satellite images and part of the building footprint catalogue from Open Street Map (OSM), which contains the type of buildings. Each building on the Sentinel-2 images is, if also contained in the OSM data, labelled as either residential or non-residential. The resulting labelled data set is used for training, validation and testing of the model.
The model described in Model Selection and Architecture is trained on a imbalanced data set consisting of ~95000 images, containing twice as much residential than non-residential images. Of all labelled images, 70% are used for training (initial fit) and 20% for validating (fine-tuning) the model (see section Final Configuration and Results). The remaining 10% are used for testing (i.e. consist of independent, unseen data). The model configurations are fine-tuned after validation, as schematically depicted in Figure 10: Schematic overview of the model training process, and saved upon achieving satisfactory performance.
The resulting model is then tested using previously unseen data and the model scores are computed. The Adam optimizer with a learning rate of 0.001 and the binary cross-entropy are used as the optimization algorithm and loss function respectively for training the neural network. Furthermore, a learning rate reduction callback is used: This callback will monitor the validation accuracy and if it stops improving, it will reduce the learning rate to help the model converge better. The model is trained for 50 epochs, whereas the training data are shuffled before each epoch.
Kenya
India
Data Sources
-
Sentinel-2
Cloud-Optimized GeoTIFFs
Sentinel-2 satellite images are downloaded from the public S3 bucket Sentinel-2 Cloud-Optimized GeoTIFFs containing satellite images of the Earth’s surface divided into pre-defined tiles.
-
Google-Microsoft Open Buildings (combined and published by VIDA)
Publicly available data contain a catalogue of buildings with specific coordinates and polygons (i.e. shapes of the buildings) for any given country or region. -
GHS
Settlement Model Grid
Publicly available data downloaded as GeoTIF to categorize buildings into Urban or Rural categories. -
Open Street Map
(OSM)
Publicly available data contain a catalogue of buildings with specific coordinates and polygons (i.e. shapes of the buildings). Data are downloaded as shapefiles (.shp) from geofabrik.de. -
Overture Maps
Publicly available data contain a catalogue of buildings with specific coordinates and polygons (i.e. shapes of the buildings).
References
Densely Connected Convolutional Networks by Huang, Liu, van der Maaten, Q. Weinberger, 2018.
Deep Residual Learning for Image Recognition by He, Zhang, Ren, Sun, 2015.