Exploratory data visualization#
In this tutorial, we will demonstrate how to utilize tools within the scientific Python ecosystem for interactive and exploratory data visualization.
1 | Xarray and rioxarray#
Xarray is an open Python package for processing multi-dimensional data arrays in concise and intuitive ways. Xarray introduces richer labels and attributes for arrays than the NumPy array object provides, making data processing easier.
Rioxarray is an extension of xarray and integrates well with rasterio, a geospatial library for processing raster (image) data. Specifically, rioxarray uses rasterio as the data accessor and convert data between raster files and xarray objects.
Hvplot#
Hvplot is a high-level plotting API in the Python ecosystem. It works with many popular data libraries such as (geo)pandas and xarray, which we’ll see in this and the next hands-on topics. Hvplot can visualize data using various base libraries specified by the user.
Here, we will use Hvplot to interactively examine a satellite image.
🛠️ Prepare the tool and the data#
🚩 Task: Log into Callysto Hub (or other other working environment you prefer). Install rioxarray and hvplot by
!pip install rioxarray hvplot
Note
Installing rioxarray will also install xarray due to its dependency. Note that you will need to install these packages every time you start a fresh server on Callysto.
🚩 Task: Let’s read a real satellite image for this task. First, import rioxarray and hvplot.xarray:
import rioxarray
import hvplot.xarray
The Amazon Web Services (AWS) registry of open data has a great open data set called “Sentinel-2 Cloud-Optimized GeoTIFFs” at https://registry.opendata.aws/sentinel-2-l2a-cogs/. We will use one image file from this data set. Specify the image URL:
image_url = 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/51/R/UH/2023/3/S2B_51RUH_20230311_0_L2A/TCI.tif'
The metadata of this true-color image (TCI) and other associated assets can be viewed on the STAC browser. You can see that the true-color image has a label called COG, which stands for Cloud-Optimized GeoTIFF.
Note
STAC is the acronym of SpatioTemporal Asset Catalogs. STAC provides a common standard for describing geospatial information of a data set. We will further review it during the next lecture.
❓ Question: What is a TIFF file? What is a GeoTIFF file? What is a Cloud-Optimized GeoTIFF file?
🛠️ Load the data#
🚩 Task: Access the data set without downloading by reading it through rioxarray:
# da stands for data array
da = rioxarray.open_rasterio(image_url)
# Entering the object name only will print a quick overview of the data set
da
<xarray.DataArray (band: 3, y: 10980, x: 10980)>
[361681200 values with dtype=uint8]
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 3e+05 3e+05 3e+05 ... 4.098e+05 4.098e+05 4.098e+05
* y (y) float64 2.8e+06 2.8e+06 2.8e+06 ... 2.69e+06 2.69e+06
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGE
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0❓ Question: How many dimensions are there in this data array? What are they?
❓ Question: Why are the x and y coordinates so large?
🚩 Task: Use the rasterio accessor to get specific geospatial information as a variable. Try to get the size of the image and the geospatial transformation (as an Affine transformation matrix):
print(f"Grid width: {da.rio.width} pixels")
print(f"Grid height: {da.rio.height} pixels")
print(f"Geospatial transformation (m): \n{da.rio.transform()}")
Grid width: 10980 pixels
Grid height: 10980 pixels
Geospatial transformation (m):
| 10.00, 0.00, 300000.00|
| 0.00,-10.00, 2800020.00|
| 0.00, 0.00, 1.00|
As you can see, no data has been downloaded to the local disk, yet we still can retrieve the metadata shown above. This is why COG is powerful for cloud computing. Practically, when using COG, we will need to download only the necessary part of the data. The TCI.tif file has a size of 276 MB (which can be checked after a manual download), but when you only access the metadata, it only needs to download a few KB of data, so you won’t feel there is any downloading time while you execute these commands.
We will discuss cloud-optimized geotiffs in more detail during the next hands-on topic and lecture.
🛠️ Interactive visualization#
This raster image has ~120 million pixels (10980\(\times\)10980), so it would take a lot of time for any visualization tools to read all pixels and render them on your screen. I have commented out this line, but feel free to try it if you don’t mind waiting:
# da.hvplot.image(x='x', y='y', data_aspect=1)
For this hands-on session, I have examined the data and decided to slice the image to speed up the processing time.
🚩 Task: Crop the data array using the provided extent and visualize this subset using hvplot. Switch to a different band by sliding the bar to the right.
# x and y slices are based on the same UTM projection.
da_subset = da.sel(x=slice(320000, 330000), y=slice(2780000, 2770000))
da_subset.hvplot.image(x='x', y='y', data_aspect=1)
❓ Question: What are the rectangular blocks in the left area of the image?
❓ Question: Can you invert the colormap so bright colors represent high reflectance, which will be easier for readers to interpret?
Hints!
The default colormap for this case is
kbc_r(See https://hvplot.holoviz.org/en/docs/latest/ref/plotting_options/color_colormap.html).Check here about how to specify a different name for a reversed colormap.
🚩 Task: Instead of showing a single band one at a time, visualize the true color image by showing three bands together:
da_subset.hvplot.rgb(x='x', y='y', data_aspect=1, bands='band')
❓ Question: Which band is Red? Which are Green and Blue? How do you know that?
❓ Question: Can you make a profile plot with three profiles, each representing the pixel value of a specific band along the x-axis direction at y = 2774000? The hvplot gridded data tutorial may be useful for this.
Hints!
You can select a line of pixels by
da_subset.sel(y=slice(2774000, 2773960)).Line plots are the type you may want to look for.
Solution
da_subset.sel(y=slice(2774000, 2773960)).hvplot.line(x='x', by='band', groupby='y')
2 | Ipyleaflet#
In the previous section, we selected a subarea of the image by giving the boundary UTM coordinates. Here, we will try a more intuitive and interactive method using ipyleaflet for data slicing.
Ipyleaflet is a Python API for Leaflet, a JavaScript-based library for web maps. Ipyleaflet utilizes the Notebook UI to render web maps, enabling users to interactively explore geospatial content with the aid of additional functionality from IPython/Jupyter, including widgets.
🛠️ Prepare the tool and load the data#
🚩 Task: Log into Callysto Hub (or other other working environment you prefer). Install ipyleaflet by
!pip install ipyleaflet
Note
Note that you will need to install these packages every time you start a fresh server on Callysto. This workflow also requires the pyproj package, but it has already been installed on Callysto.
🚩 Task: Import necessary modules and load the data:
# This is used in the previous section
import rioxarray
import hvplot.xarray
# These are new to this section
from ipyleaflet import Map, basemaps, basemap_to_tiles, Rectangle, GeomanDrawControl
from pyproj import Transformer
# Here we will use the same COG data set from the previous section. And we continue to use rioxarray to load it.
image_url = 'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/51/R/UH/2023/3/S2B_51RUH_20230311_0_L2A/TCI.tif'
da = rioxarray.open_rasterio(image_url)
🛠️ Make a map showing the full extent of the image#
❓ Question: da.rio.bounds() returns the extent of the image in the projected coordinates. What are the coordinates of (1) the lower left corner and (2) the upper right corner?
Solution
The coordinates of the lower left corner are bounds[:2], and the coordinates of the upper right corner are bounds[2:].
❓ Question: What is the EPSG number of the coordinate reference system (CRS) used for this image?
Hint!
da.rio.crs
Note
Take a look at here for additional explanation about EPSG.
We now know that the projected coordinate system is UTM zone 51N.
🚩 Task: To show the extent on a web map, we have to transform the coordinates from UTM 51N to Lat/Lon (a.k.a. EPSG 4326). Execute the code below:
# Construct a coordinate transformer from dataset's CRS to 4326 (using pyproj)
transformer = Transformer.from_crs(da.rio.crs, "epsg:4326")
# Transform the corner coordinates
bounds = da.rio.bounds()
coords = [transformer.transform(x, y) for x,y in [bounds[:2], bounds[2:]]]
coords
[(24.312301875374363, 121.02911850575056),
(25.314010096875364, 122.10386391236358)]
🚩 Task: Based on the previous sample code, Construct the webmap as follows. Note that I added a few more lines to draw a rectangle on the map, indicating the full extent of the satellite image.
# Map center set to the lower left corner of the image
center = coords[0]
# Again, we use OpenStreetMap as the basemap.
m = Map(
basemap=basemap_to_tiles(basemaps.OpenStreetMap.Mapnik),
center=center,
zoom=8
)
# Add the image extent
rectangle = Rectangle(bounds=coords)
m.add_layer(rectangle)
# Enable rectangle draw (and disable the other draws)
draw_control = GeomanDrawControl(polyline={}, polygon={}, circlemarker={})
# draw_control = DrawControl(polyline={}, polygon={}, circlemarker={})
draw_control.rectangle = {
"shapeOptions": {
"fillOpacity": 0.5
}
}
m.add_control(draw_control)
# Display the map
m
Note
If you encounter a JavaScript error, try to refresh the Jupyter page and restart the kernel. For technical details about GeomanDrawControl, see this documentation.
🛠️ Display and subset the image within the selected area#
🚩 Task: Draw a rectangle on the map using the toolbar on the left. We will use this rectangle as the bounding box to slice the image and display the data within the box.
🚩 Task: Get the coordinates of this new bounding box using the following code:
rec_coords = draw_control.data[-1]['geometry']
rec_coords = rec_coords['coordinates']
xy1 = rec_coords[0][0] # lower left
xy2 = rec_coords[0][2] # upper right
print(f"Lower left: {xy1} \nUpper right: {xy2}")
Lower left: [121.451417, 24.529634]
Upper right: [121.676578, 24.71939]
🚩 Task: Convert these corner coordinates back to the data CRS using the following code:
transformer2 = Transformer.from_crs("epsg:4326", da.rio.crs)
slicing_coords = [transformer2.transform(lat, lon) for lon, lat in [xy1, xy2]]
slicing_coords
[(343130.6810593138, 2713746.6874056426),
(366144.36492820125, 2734523.320407676)]
🚩 Task: Use these numbers to slice the data again and plot the image within the new bounding box.
da_subset = da.sel(x=slice(slicing_coords[0][0], slicing_coords[1][0]), y=slice(slicing_coords[1][1], slicing_coords[0][1]))
da_subset
<xarray.DataArray (band: 3, y: 2077, x: 2301)>
[14337531 values with dtype=uint8]
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 3.431e+05 3.431e+05 ... 3.661e+05 3.661e+05
* y (y) float64 2.735e+06 2.735e+06 ... 2.714e+06 2.714e+06
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGE
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0da_subset.hvplot.image(x='x', y='y', data_aspect=1, cmap='gray')
🚩 Task: Make a new rectangle selection on the leaflet map and repeat the slicing & visualization steps (by re-executing all cells below). Confirm whether the image matches your desired output!