Satellite imagery processing for convolutional neural networks
In this post, we will explain how to create satellite imagery datasets and how to apply a classification model to them based on convolutional neural networks.
Within the field of data science, it is estimated that 80% of the time spent goes into the creation of datasets. Moreover, if we take into account the complexity of geospatial information (and of satellite imagery in particular), this percentage might be even higher.
In order to provide a solution for this, at Dymaxion Labs, we developed a set of open-source libraries, which help to streamline dataset creation, thereby focusing our efforts on implementing the best strategy to solve the issue. In this post, we will work with one of those libraries: Satproc. With this tool, we can prepare the dataset to execute the burned area detection model described in this post.
The area of interest we are going to work with is located in the north of Corrientes Province, Argentina. This region is rich in biodiversity and, in turn, it is suffering from one of the worst droughts in decades. Unfortunately, to this date, Corrientes wildfires have already spread to 10% of the total area.
This guideline covers from the initial imagery processing to obtain the Normalized Burn Ratio (NBR), to the creation of the necessary datasets to work with the neural network. The original image was obtained from the Sentinel-2 sensor, and it was taken on the first week of February, 2022 in Corrientes Province, Argentina. The image in TIF format can be downloaded from this link.
Determining the Normalized Burn Ratio
How can we calculate the NBR?
The Normalized Burn Ratio enables the detection of burned areas in wide regions. Its formula combines near-infrared (NIR) and short-wave infrared (SWIR) bands.
The typical spectral response curve of healthy vegetation exhibits high reflectance values in the NIR, but low reflectance values in the SWIR. Conversely, when the vegetation suffers the consequences of a fire, significant changes in its spectral signature can be observed: a recently burned area will exhibit low reflectance values in the NIR, but high reflectance values in the SWIR. Thus, the normalized difference between both values is a great indicator of this phenomenon, where high values of NBR denote healthy vegetation, while low values represent bare ground areas or recently burned areas.
Severity of the burned area
The difference between the NBR obtained before and after a fire is used to calculate the delta Normalized Burn Ratio. The greater the dNBR, the greater the severity of the damage. Besides, the areas with negative dNBR values could be signaling a vegetation recovery after the event. For further information about this issue, please visit this link.
Since the image we downloaded was obtained with the Sentinel-2 sensor, the normalized ratio will be used with bands 8A and 12. We will calculate it using the GDAL library, based on the script gdal_calc.py:
gdal_calc.py \
-A ./img/s2/sample_corrientes.tif — A_band=8 \
-B ./img/s2/sample_corrientes.tif — B_band=12 \
— calc=”(A-B)/(A+B)” \
— outfile=./img/NBR_img/NBR_s2.tif! Calculation of the normalized burn ratio based on sentinel-2
Generating chips
Now that we have obtained the ratio which the model uses, we will generate the necessary dataset to make the prediction.
To install Satproc, you just need to execute `pip install pysatproc`. These are the most common parameters to use it:
pip install pysatproc
When the library is installed, a set of scripts for imagery processing are also installed:
- satproc_extract_chips: It extracts chips of raster images, creating optional masks for each chip through a vector file containing tags.
- satproc_make_masks: It makes masks from raster images and a vector file containing tags.
- satproc_polygonize: It polygonizes or vectorizes chip images in a single vector file containing polygons.
- satproc_generalize: It generalizes vector files by simplifying and smoothening the polygon boundary lines.
- satproc_smooth_stitch: It smooths stitched chips, which is useful to improve the result of the prediction when there is a stitch.
- satproc_scale: It changes the value scale of the raster images.
- satproc_match_histograms: It seeks to align the raster images histograms, based on a reference image.
In this post, we will focus on satproc_extract_chips.
The following are some of the available arguments:
optional arguments:-h, --help show this help message and exit--size of image tiles, in pixels (default: 256)--step-size STEP_SIZEstep size (i.e. stride), in pixels (default: 128)--sliding-windows-mode {exact,whole,whole_overlap}mode of sliding windows (default: whole_overlap)--labels inpul label shapefile (default: None)--label-property LABEL_PROPERTYlabel property to separate in classes (default: class)--classes [CLASSES ...]specify classes order in result mask. (default: None)--masks {extent,boundary,distance} [{extent,boundary,distance} ...], -m {extent,boundary,distance} [{extent,boundary,distance} ...]--mask-type {single,class,instance}--aoi Filter by AOI vector file (default: None)--within only create chip is it is within AOI (if provided)(default: False)--no-within create chip if it intersects with AOI (if provided)(default: False)
> To check the online material, please visit this link.
Practical example of chip extraction
In this example, the chips are generated in the appropriate size for the trained neural network. Based on this, the model can be deployed to estimate the burned surface area of the area of interest.
satproc_extract_chips \./img/NBR_img/NBR_s2.tif \-o ./img/NBR_img/chips/160_80/ \--size 160 \--step-size 80 \
As it is seen there, the parameters related to the types and the area of interest are not used, since we seek to make predictions based on these images.
The resulting chips remain available in the output directory, which is labeled as follows:
Images/<input image name>_<row>_<column>
For example:
NBR_s2_0_0.tif NBR_s2_0_7.tif NBR_s2_1_6.tif NBR_s2_2_5.tif
Other features of Satproc
Now, we will focus on some features which are useful to optimize the use of the library, depending on the use case.
- Within an area of interest (AOI)
If we have an image of a particular region, but we wish to extract chips from just a specific area, we need to specify the file path to the vector file corresponding to the area of interest. The command would be the following:
satproc_extract_chips \./img/NBR_img/NBR_s2.tif \-o ./img/NBR_img/chips/aoi/160_80/ \--size 160 \--step-size 80 \--aoi ./shp/aoi/aoi_sample.geojson \
This image shows the result:
2. Contrast improvement:
In some cases, highlighting certain attributes of the image can help improve the model performance. Usually, this step is part of the data preprocessing stage. However, it can also be calculated with Satproc. This feature can be extremely useful for the automation of the modeling methodology.
satproc_extract_chips \./img/NBR_img/NBR_s2.tif \-o ./img/NBR_img/chips/160_80/ \--size 160 \--step-size 80 \--rescale \--rescale-mode percentiles \--upper-cut 98 --lower-cut 2
There are different re-escalation modes, which can even be customized by the user. The documents include a list of the modes that are implemented by default.
3. Generating footprints in vector format
This example is useful for determining the outline of each resulting chip. For instance, it can be used together with other sources of spatial information of interest. You just need to add this parameter: — write-footprints.
satproc_extract_chips \./img/NBR_img/NBR_s2.tif \-o ./img/NBR_img/chips/aoi/160_80/ \--size 160 \--step-size 80 \--write-footprints
The resulting vector layer looks like this:
Concluding remarks
Often, within the development of imagery classification models, analysis automation becomes an issue when it comes to applying such models to large expanses of land.
With the Satproc open-source library, we can achieve that in a standardized way, customizable to the analysis we are executing, and without losing sight of the main goal: solving the issue (and avoiding struggling with the geospatial data necessary to find a solution.)
You can access the code to replicate this example through this Google Colab.