Getting and analyzing remotely sensed open data#
Our last tutorial will show you how to navigate through a STAC catalog, search within a cloud-hosted data set, and acquire some popular open satellite data sets.
1 | STAC#
SpatioTemporal Asset Catalogs (STAC) are a set of specifications for describing spatial-temporal datasets, such as satellite images and vector data. Satellite data sets described in STAC can be more accessible because users do not need to learn how to repeatedly search for and access data (as is typical with current commercial data sets). The first version of STAC was released in May 2021 and became popular among major satellite data providers. You can also check out this blogpost by the Planet Inc. for more details.
The full STAC specification, along with numerous tutorials, is available on the STAC website. Also, on the STAC intex website, you can find a collection of STAC catalogs to explore.
🛠️ The Planet disaster data#
For the first half of this tutorial, we will use the Planet disaster data, a small dataset prepared by Planet Labs Inc., to demonstrate how STAC integrates with the data analysis workflow. We will be looking at Hurricane Harvey, one of the most devastating natural disasters in the U.S. Here, we will examine data acquired by the PlanetScope satellite constellation over Houston, TX, U.S., on August 31, 2017, a few days after the hurricane brought record-breaking rainfall. The base URL (i.e., landing page) of this STAC catalog is at https://www.planet.com/data/stac/catalog.json, and there is also a browser you can use to easily navigate through the catalog.
❓ Question: Go to https://www.planet.com/data/stac/browser/disasters/collection.json and navigate to the image item 20170831_162740_ssc1d1. From what constellation does this scene come? What is the spatial resolution of the image?
🛠️ STAC browser#
The pretty navigation of an STAC catalog is powered by a tool called STAC browser.
🚩 Task: Go to the landing page of STAC browser and paste the URL of the Planet disaster data STAC catalog (https://www.planet.com/data/stac/catalog.json). Does it look the same as the last task?
2 | Intake#
Intake is a Python-based tool for interacting with and accessing local and cloud data. With the STAC plugin installed (Intake-STAC), we can use Intake to navigate a STAC catalog, perform basic searches, and load the desired data into memory for further analysis. Intake also provides a Jupyter-based GUI for querying/describing data using proper visualization.
In this tutorial, we will use Intake-STAC to load data from the Planet STAC catalog.
🛠️ Prepare the tool and the data#
🚩 Task: Log into Callysto Hub (or other other working environment you prefer). Install intake-stac and other convenient tools. (Intake itself will be installed along with this library due to a dependency.)
!pip install intake-stac hvplot rioxarray xarray
Note
You will need to install these packages every time you start a fresh server on Callysto.
🚩 Task: Import intake and hvplot.xarray:
import intake
import hvplot.xarray
🛠️ Check the STAC catalog#
🚩 Task: Use Intake to open the Planet disaster STAC catalog:
catalog_url = "https://www.planet.com/data/stac/catalog.json"
catalog = intake.open_stac_catalog(catalog_url)
catalog
❓ Question: print list(catalog). What data sets does this STAC collection contain other than the disaster data?
🚩 Task: To access the Planet disaster data, go into one layer deep and get the corresponding STAC collection:
collection = catalog['planet-disaster-data']
collection
❓ Question: print collection.metadata. What license does this data set use?
🚩 Task: Go a few layers deeper as follows, and we will eventually get a STAC item that points to the desired satellite image:
Note
To check the name of each layer, use list(collection), list(collection['hurricane-harvey']), and so on.
item = collection['hurricane-harvey']['hurricane-harvey-0831']['Houston-East-20170831-103f-100d-0f4f-RGB']
item
mosaic:
args:
chunks: {}
urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
description: 3 Band RGB Mosaic
direct_access: allow
driver: intake_xarray.raster.RasterIOSource
metadata:
href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
plots:
geotiff:
cmap: viridis
data_aspect: 1
dynamic: true
frame_width: 500
kind: image
rasterize: true
x: x
y: y
roles:
- data
title: 3 Band RGB Mosaic
type: image/tiff; application=geotiff; profile=cloud-optimized
thumbnail:
args:
chunks: {}
urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
description: Thumbnail
direct_access: allow
driver: intake_xarray.image.ImageSource
metadata:
href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
plots:
thumbnail:
bands: channel
data_aspect: 1
flip_yaxis: true
kind: rgb
x: x
xaxis: false
y: y
yaxis: false
roles:
- thumbnail
title: Thumbnail
type: image/png
You can see this asset has two different scenes: mosaic and thumbnail. The thumbnail is also known as the browse image. Its purpose is to provide a low-resolution overview of the image so users can explore as much as possible without downloading large files.
thumbnail = item['thumbnail']
Now you can try to enter thumbnail or thumbnail.describe() for additional information about this image.
🛠️ Examine the thumbnail image#
🚩 Task: To load the thumbnail image, you can either download it from the url:
thumbnail.urlpath
'https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png'
or use the to_dask() method to lazily stream it as a Dask array.
da_thumbnail = thumbnail.to_dask()
da_thumbnail
/home/whyjz278/.local/lib/python3.10/site-packages/intake_xarray/image.py:474: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
'dims': dict(ds2.dims),
<xarray.DataArray (y: 552, x: 549, channel: 3)> dask.array<xarray-<this-array>, shape=(552, 549, 3), dtype=uint8, chunksize=(552, 549, 3), chunktype=numpy.ndarray> Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 ... 543 544 545 546 547 548 549 550 551 * x (x) int64 0 1 2 3 4 5 6 7 8 ... 540 541 542 543 544 545 546 547 548 * channel (channel) int64 0 1 2
🚩 Task: Visualize da_thumbnail using hvplot, as we did in the last tutorial.
Solution
da_thumbnail.hvplot.image(x='x', y='y', data_aspect=1)
🛠️ Examine the full-resolution image#
Now let’s explore the high-resolution mosaic data using the visualization steps we showed in the last tutorial!
🚩 Task: Load the full-resolution, mosaicked image (item[mosaic]) to dask and visualize it using true-color band combination:
mosaic = item['mosaic']
da_mosaic = mosaic.to_dask()
da_mosaic
/home/whyjz278/.local/lib/python3.10/site-packages/intake_xarray/raster.py:107: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
'dims': dict(ds2.dims),
<xarray.DataArray (band: 3, y: 22094, x: 21984)>
dask.array<open_rasterio-501c37a1d96e315c20253204d9aec1ce<this-array>, shape=(3, 22094, 21984), dtype=uint8, chunksize=(3, 22094, 21984), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 -1.066e+07 -1.066e+07 ... -1.058e+07 -1.058e+07
* y (y) float64 3.524e+06 3.524e+06 ... 3.447e+06 3.447e+06
spatial_ref int64 0
Attributes:
AREA_OR_POINT: Area
TIFFTAG_DATETIME: 2017:09:01 15:10:49
TIFFTAG_RESOLUTIONUNIT: 2 (pixels/inch)
TIFFTAG_SOFTWARE: Adobe Photoshop CC 2015.5 (Macintosh)
TIFFTAG_XRESOLUTION: 72
TIFFTAG_YRESOLUTION: 72
scale_factor: 1.0
add_offset: 0.0## Slice a portion of the image and plot a RGB image of that slice.
da_mosaic_subset = da_mosaic.sel(x=slice(-10641000, -10638000), y=slice(3474000, 3471000))
da_mosaic_subset.hvplot.rgb(x='x', y='y', data_aspect=1)
❓ Question: Can you find where in Houston is affected by Hurricane Harvey?
3 | PySTAC-Client#
Intake-STAC is a high-level tool for manipulating a STAC catalog and performing common tasks. If one wants to implement a customized workflow, the low-level PySTAC-Client package is necessary. In fact, Intake-STAC partially builds on top of PySTAC-Client.
To showcase a demo using PySTAC-Client, we will again focus on Taoyuan International Airport and create a simple animation showing the airport’s time-lapse over the years. We will perform a simple query over the Sentinel-2 Cloud-Optimized GeoTIFFs STAC catalog, hosted on AWS by Element 84.
Note
Not all STAC catalogs can be queried. This is an optional feature for the data provider to determine whether to adopt. Thankfully, the popular Landsat and Sentinel catalogs on the AWS Open Data Repository all support STAC querying.
🛠️ Prepare the tool#
🚩 Task: Log into Callysto Hub (or other other working environment you prefer). Install pystac-client and import necessary modules into Python:
!pip install pystac-client
import intake
import hvplot.xarray
import xarray as xr
import pandas as pd
import pystac_client
import numpy as np
🛠️ Fetch and visualize the data#
🚩 Task: The STAC entry point for the Sentinel-2 Cloud-Optimized GeoTIFFs is https://earth-search.aws.element84.com/v1. Perform the query using PySTAC-Client as follows:
# If you are curious about the content of this STAC catalog, don't forget the STAC browser!
catalog_url = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(catalog_url)
results = catalog.search(
collections=["sentinel-2-l2a"], # Search within the sentinel-2-l2a collection
bbox = [121.21, 25.06, 121.26, 25.09], # Bounding box; [lon-left, lat-bottom, lon-right, lat-top]. Roughly focused on the Taoyuan International Airport.
datetime="2021-09-26/2023-03-31") # Search within this time span.
items = results.item_collection()
print(len(items)) # This shows how many items are available.
120
The output indicates that 120 items are available based on our search criteria.
🚩 Task: Pass items to Intake and try enabling the commented line:
items_intake = intake.open_stac_item_collection(items)
# list(items_intake) # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A'] # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A'].metadata # Try this!
🚩 Task: Each Sentinel-2 item comes with many images, such as thumbnail, blue band, red band, NIR band, etc. Take S2B_51RUH_20230311_0_L2A (the image we used in the previous tutorial) as an example, check what images are there by:
item = items_intake['S2B_51RUH_20230311_0_L2A']
for key in item.keys():
print(key)
Each entry listed above is an image you can access in this asset.
❓ Question: What are aot, nir, thumbnail, rededge1-jp2, and swir22-jp2 bands?
🚩 Task: Retrieve visual, the 3-band true-color combination, from every image in the query result:
Note
This might take a few minutes, depending on the internet connection and CPU speed. If it gets too long, try changing the search criteria to reduce the number of returned items.
# Initialize an empty collection
data_list = []
timestamp_list = []
# Loop over each queried image
for scene in items_intake.values():
timestamp = scene.metadata['s2:generation_time'] # get time stame of each image
da = scene['visual'].to_dask() # lazily load the data as Dask array
da_subset = da.sel(x=slice(320000, 325000), y=slice(2777000, 2772000)) # Slice the image near the airport
# Now we apply a simple filter to mask our images with cloud cover > 10%.
if np.sum(da_subset.values[0, :, :] < 255) / 250000 > 0.9: # 250000 stands for the total number of the pixels in the sliced data. 255 is a saturated band color.
timestamp_list.append(timestamp)
data_list.append(da_subset)
Now we can merge all the data into a giant 4-D data cube!
datacube = xr.concat(data_list, pd.Index(timestamp_list, name="t"))
datacube = datacube.sortby('t') # Sort the image, ordered by the acquisition time
datacube
<xarray.DataArray (t: 23, band: 3, y: 500, x: 500)>
dask.array<getitem, shape=(23, 3, 500, 500), dtype=uint8, chunksize=(1, 3, 500, 500), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 3.2e+05 3.2e+05 3.2e+05 ... 3.25e+05 3.25e+05
* y (y) float64 2.777e+06 2.777e+06 ... 2.772e+06 2.772e+06
spatial_ref int64 0
* t (t) object '2021-10-02T06:08:39.000000Z' ... '2023-03-16T06:...
Attributes:
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGE
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0🚩 Task: Finally, let’s display the data using Hvplot’s animation widget. Have fun exploring the data using the display buttons!
datacube.hvplot.rgb(
groupby="t", # adds a widget that switches the view along the t axis.
data_aspect=1,
widget_type="scrubber",
widget_location="bottom",
)
❓ Question: Can you identify any changes in the landscape around the Taoyuan international airport during the search period?