Getting and analyzing remotely sensed open data#

Our last tutorial will show you how to navigate through a STAC catalog, search within a cloud-hosted data set, and acquire some popular open satellite data sets.

1 | STAC#

SpatioTemporal Asset Catalogs (STAC) are a set of specifications for describing spatial-temporal datasets, such as satellite images and vector data. Satellite data sets described in STAC can be more accessible because users do not need to learn how to repeatedly search for and access data (as is typical with current commercial data sets). The first version of STAC was released in May 2021 and became popular among major satellite data providers. You can also check out this blogpost by the Planet Inc. for more details.

The full STAC specification, along with numerous tutorials, is available on the STAC website. Also, on the STAC intex website, you can find a collection of STAC catalogs to explore.

🛠️ The Planet disaster data#

For the first half of this tutorial, we will use the Planet disaster data, a small dataset prepared by Planet Labs Inc., to demonstrate how STAC integrates with the data analysis workflow. We will be looking at Hurricane Harvey, one of the most devastating natural disasters in the U.S. Here, we will examine data acquired by the PlanetScope satellite constellation over Houston, TX, U.S., on August 31, 2017, a few days after the hurricane brought record-breaking rainfall. The base URL (i.e., landing page) of this STAC catalog is at https://www.planet.com/data/stac/catalog.json, and there is also a browser you can use to easily navigate through the catalog.

Question: Go to https://www.planet.com/data/stac/browser/disasters/collection.json and navigate to the image item 20170831_162740_ssc1d1. From what constellation does this scene come? What is the spatial resolution of the image?

🛠️ STAC browser#

The pretty navigation of an STAC catalog is powered by a tool called STAC browser.

🚩 Task: Go to the landing page of STAC browser and paste the URL of the Planet disaster data STAC catalog (https://www.planet.com/data/stac/catalog.json). Does it look the same as the last task?

2 | Intake#

Intake is a Python-based tool for interacting with and accessing local and cloud data. With the STAC plugin installed (Intake-STAC), we can use Intake to navigate a STAC catalog, perform basic searches, and load the desired data into memory for further analysis. Intake also provides a Jupyter-based GUI for querying/describing data using proper visualization.

In this tutorial, we will use Intake-STAC to load data from the Planet STAC catalog.

🛠️ Prepare the tool and the data#

🚩 Task: Log into Callysto Hub (or other other working environment you prefer). Install intake-stac and other convenient tools. (Intake itself will be installed along with this library due to a dependency.)

!pip install intake-stac hvplot rioxarray xarray

Note

You will need to install these packages every time you start a fresh server on Callysto.

🚩 Task: Import intake and hvplot.xarray:

import intake
import hvplot.xarray

🛠️ Check the STAC catalog#

🚩 Task: Use Intake to open the Planet disaster STAC catalog:

catalog_url = "https://www.planet.com/data/stac/catalog.json"
catalog = intake.open_stac_catalog(catalog_url)
catalog

Hide code cell output

planet:
  args:
    stac_obj: https://www.planet.com/data/stac/catalog.json
  description: ''
  driver: intake_stac.catalog.StacCatalog
  metadata:
    description: This catalog serves as the STAC Catalog for data licensed under Creative
      Commons provided by Planet Labs PBC. Details on each asset type, program, and
      associated license are provided in the catalogs below.
    id: planet
    stac_extensions:
    - https://stac-extensions.github.io/stats/v0.2.0/schema.json
    stac_version: 1.0.0
    stats:catalogs:
      count: 12
      extensions:
        https://stac-extensions.github.io/stereo-imagery/v1.0.0/schema.json: 4
        https://stac-extensions.github.io/version/v1.2.0/schema.json: 4
      versions:
        1.0.0: 9
        1.1.0: 3
    stats:collections:
      count: 28
      extensions:
        https://stac-extensions.github.io/version/v1.2.0/schema.json: 3
        https://stac-extensions.github.io/web-map-links/v1.1.0/schema.json: 4
      versions:
        1.0.0: 13
        1.1.0: 15
    stats:items:
      assets:
        application/geo+json: 1423
        application/json: 48
        application/vnd.laszip+copc: 4
        application/x-hdf5: 93
        application/xml: 11
        application/zip: 16
        image/jpeg: 1
        image/png: 108
        image/tiff: 3
        image/tiff; application=geotiff: 2253
        image/tiff; application=geotiff; profile=cloud-optimized: 13995
        text/xml: 4
      count: 10731
      extensions:
        https://planetlabs.github.io/stac-extension/v1.0.0-beta.3/schema.json: 30
        https://stac-extensions.github.io/eo/v1.0.0/schema.json: 2
        https://stac-extensions.github.io/eo/v1.1.0/schema.json: 492
        https://stac-extensions.github.io/file/v2.0.0/schema.json: 16
        https://stac-extensions.github.io/label/v1.0.0/schema.json: 1423
        https://stac-extensions.github.io/pointcloud/v1.0.0/schema.json: 4
        https://stac-extensions.github.io/projection/v1.1.0/schema.json: 400
        https://stac-extensions.github.io/projection/v2.0.0/schema.json: 91
        https://stac-extensions.github.io/raster/v1.1.0/schema.json: 385
        https://stac-extensions.github.io/stereo-imagery/v1.0.0/schema.json: 18
        https://stac-extensions.github.io/view/v1.0.0/schema.json: 124
      versions:
        1.0.0: 3723
        1.1.0: 7008
    title: Planet Labs PBC - Open Data
    type: Catalog

Question: print list(catalog). What data sets does this STAC collection contain other than the disaster data?

🚩 Task: To access the Planet disaster data, go into one layer deep and get the corresponding STAC collection:

collection = catalog['planet-disaster-data']
collection

Hide code cell output

planet-disaster-data:
  args:
    stac_obj: https://www.planet.com/data/stac/disasters/collection.json
  description: '[Planet Disaster Data](https://www.planet.com/disasterdata/) makes
    imagery available directly to the public, volunteers, humanitarian organizations,
    and other coordinating bodies in support of the International Charter for Space
    and Major Disasters. Data is released for individual disaster events, providing
    a 30 day window pre- and post-disaster. Imagery is provided under Creative Commons
    licenses, free of charge, with either CC-BY-SA or CC-BY-NC. This STAC catalog
    provides fully public access of a very small subset of the imagery, as a proof
    of concept.'
  driver: intake_stac.catalog.StacCollection
  metadata:
    catalog_dir: ''

Question: print collection.metadata. What license does this data set use?

🚩 Task: Go a few layers deeper as follows, and we will eventually get a STAC item that points to the desired satellite image:

Note

To check the name of each layer, use list(collection), list(collection['hurricane-harvey']), and so on.

item = collection['hurricane-harvey']['hurricane-harvey-0831']['Houston-East-20170831-103f-100d-0f4f-RGB']
item
mosaic:
  args:
    chunks: {}
    urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
  description: 3 Band RGB Mosaic
  direct_access: allow
  driver: intake_xarray.raster.RasterIOSource
  metadata:
    href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
    plots:
      geotiff:
        cmap: viridis
        data_aspect: 1
        dynamic: true
        frame_width: 500
        kind: image
        rasterize: true
        x: x
        y: y
    roles:
    - data
    title: 3 Band RGB Mosaic
    type: image/tiff; application=geotiff; profile=cloud-optimized
thumbnail:
  args:
    chunks: {}
    urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
  description: Thumbnail
  direct_access: allow
  driver: intake_xarray.image.ImageSource
  metadata:
    href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
    plots:
      thumbnail:
        bands: channel
        data_aspect: 1
        flip_yaxis: true
        kind: rgb
        x: x
        xaxis: false
        y: y
        yaxis: false
    roles:
    - thumbnail
    title: Thumbnail
    type: image/png

You can see this asset has two different scenes: mosaic and thumbnail. The thumbnail is also known as the browse image. Its purpose is to provide a low-resolution overview of the image so users can explore as much as possible without downloading large files.

thumbnail = item['thumbnail']

Now you can try to enter thumbnail or thumbnail.describe() for additional information about this image.

🛠️ Examine the thumbnail image#

🚩 Task: To load the thumbnail image, you can either download it from the url:

thumbnail.urlpath
'https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png'

or use the to_dask() method to lazily stream it as a Dask array.

da_thumbnail = thumbnail.to_dask()
da_thumbnail
/home/whyjz278/.local/lib/python3.10/site-packages/intake_xarray/image.py:474: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(ds2.dims),
<xarray.DataArray (y: 552, x: 549, channel: 3)>
dask.array<xarray-<this-array>, shape=(552, 549, 3), dtype=uint8, chunksize=(552, 549, 3), chunktype=numpy.ndarray>
Coordinates:
  * y        (y) int64 0 1 2 3 4 5 6 7 8 ... 543 544 545 546 547 548 549 550 551
  * x        (x) int64 0 1 2 3 4 5 6 7 8 ... 540 541 542 543 544 545 546 547 548
  * channel  (channel) int64 0 1 2

🚩 Task: Visualize da_thumbnail using hvplot, as we did in the last tutorial.

🛠️ Examine the full-resolution image#

Now let’s explore the high-resolution mosaic data using the visualization steps we showed in the last tutorial!

🚩 Task: Load the full-resolution, mosaicked image (item[mosaic]) to dask and visualize it using true-color band combination:

mosaic = item['mosaic']
da_mosaic = mosaic.to_dask()
da_mosaic
/home/whyjz278/.local/lib/python3.10/site-packages/intake_xarray/raster.py:107: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(ds2.dims),
<xarray.DataArray (band: 3, y: 22094, x: 21984)>
dask.array<open_rasterio-501c37a1d96e315c20253204d9aec1ce<this-array>, shape=(3, 22094, 21984), dtype=uint8, chunksize=(3, 22094, 21984), chunktype=numpy.ndarray>
Coordinates:
  * band         (band) int64 1 2 3
  * x            (x) float64 -1.066e+07 -1.066e+07 ... -1.058e+07 -1.058e+07
  * y            (y) float64 3.524e+06 3.524e+06 ... 3.447e+06 3.447e+06
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:           Area
    TIFFTAG_DATETIME:        2017:09:01 15:10:49
    TIFFTAG_RESOLUTIONUNIT:  2 (pixels/inch)
    TIFFTAG_SOFTWARE:        Adobe Photoshop CC 2015.5 (Macintosh)
    TIFFTAG_XRESOLUTION:     72
    TIFFTAG_YRESOLUTION:     72
    scale_factor:            1.0
    add_offset:              0.0
## Slice a portion of the image and plot a RGB image of that slice.
da_mosaic_subset = da_mosaic.sel(x=slice(-10641000, -10638000), y=slice(3474000, 3471000))
da_mosaic_subset.hvplot.rgb(x='x', y='y', data_aspect=1)

Question: Can you find where in Houston is affected by Hurricane Harvey?

3 | PySTAC-Client#

Intake-STAC is a high-level tool for manipulating a STAC catalog and performing common tasks. If one wants to implement a customized workflow, the low-level PySTAC-Client package is necessary. In fact, Intake-STAC partially builds on top of PySTAC-Client.

To showcase a demo using PySTAC-Client, we will again focus on Taoyuan International Airport and create a simple animation showing the airport’s time-lapse over the years. We will perform a simple query over the Sentinel-2 Cloud-Optimized GeoTIFFs STAC catalog, hosted on AWS by Element 84.

Note

Not all STAC catalogs can be queried. This is an optional feature for the data provider to determine whether to adopt. Thankfully, the popular Landsat and Sentinel catalogs on the AWS Open Data Repository all support STAC querying.

🛠️ Prepare the tool#

🚩 Task: Log into Callysto Hub (or other other working environment you prefer). Install pystac-client and import necessary modules into Python:

!pip install pystac-client
import intake
import hvplot.xarray
import xarray as xr
import pandas as pd
import pystac_client
import numpy as np

🛠️ Fetch and visualize the data#

🚩 Task: The STAC entry point for the Sentinel-2 Cloud-Optimized GeoTIFFs is https://earth-search.aws.element84.com/v1. Perform the query using PySTAC-Client as follows:

# If you are curious about the content of this STAC catalog, don't forget the STAC browser!
catalog_url = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(catalog_url)

results = catalog.search(
    collections=["sentinel-2-l2a"],              # Search within the sentinel-2-l2a collection
    bbox = [121.21, 25.06, 121.26, 25.09],       # Bounding box; [lon-left, lat-bottom, lon-right, lat-top]. Roughly focused on the Taoyuan International Airport.
    datetime="2021-09-26/2023-03-31")            # Search within this time span.

items = results.item_collection()

print(len(items))         # This shows how many items are available.
120

The output indicates that 120 items are available based on our search criteria.

🚩 Task: Pass items to Intake and try enabling the commented line:

items_intake = intake.open_stac_item_collection(items)
# list(items_intake)                                       # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A']                 # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A'].metadata        # Try this!

🚩 Task: Each Sentinel-2 item comes with many images, such as thumbnail, blue band, red band, NIR band, etc. Take S2B_51RUH_20230311_0_L2A (the image we used in the previous tutorial) as an example, check what images are there by:

item = items_intake['S2B_51RUH_20230311_0_L2A']
for key in item.keys():
    print(key)

Hide code cell output

aot
blue
coastal
granule_metadata
green
nir
nir08
nir09
red
rededge1
rededge2
rededge3
scl
swir16
swir22
thumbnail
tileinfo_metadata
visual
wvp
aot-jp2
blue-jp2
coastal-jp2
green-jp2
nir-jp2
nir08-jp2
nir09-jp2
red-jp2
rededge1-jp2
rededge2-jp2
rededge3-jp2
scl-jp2
swir16-jp2
swir22-jp2
visual-jp2
wvp-jp2

Each entry listed above is an image you can access in this asset.

Question: What are aot, nir, thumbnail, rededge1-jp2, and swir22-jp2 bands?

🚩 Task: Retrieve visual, the 3-band true-color combination, from every image in the query result:

Note

This might take a few minutes, depending on the internet connection and CPU speed. If it gets too long, try changing the search criteria to reduce the number of returned items.

# Initialize an empty collection
data_list = []
timestamp_list = []

# Loop over each queried image
for scene in items_intake.values():
    timestamp = scene.metadata['s2:generation_time']                          # get time stame of each image
    da = scene['visual'].to_dask()                                            # lazily load the data as Dask array
    da_subset = da.sel(x=slice(320000, 325000), y=slice(2777000, 2772000))    # Slice the image near the airport
    # Now we apply a simple filter to mask our images with cloud cover > 10%. 
    if np.sum(da_subset.values[0, :, :] < 255) / 250000 > 0.9:                # 250000 stands for the total number of the pixels in the sliced data. 255 is a saturated band color.
        timestamp_list.append(timestamp)
        data_list.append(da_subset)

Now we can merge all the data into a giant 4-D data cube!

datacube = xr.concat(data_list, pd.Index(timestamp_list, name="t"))
datacube = datacube.sortby('t')    # Sort the image, ordered by the acquisition time
datacube
<xarray.DataArray (t: 23, band: 3, y: 500, x: 500)>
dask.array<getitem, shape=(23, 3, 500, 500), dtype=uint8, chunksize=(1, 3, 500, 500), chunktype=numpy.ndarray>
Coordinates:
  * band         (band) int64 1 2 3
  * x            (x) float64 3.2e+05 3.2e+05 3.2e+05 ... 3.25e+05 3.25e+05
  * y            (y) float64 2.777e+06 2.777e+06 ... 2.772e+06 2.772e+06
    spatial_ref  int64 0
  * t            (t) object '2021-10-02T06:08:39.000000Z' ... '2023-03-16T06:...
Attributes:
    AREA_OR_POINT:       Area
    OVR_RESAMPLING_ALG:  AVERAGE
    _FillValue:          0
    scale_factor:        1.0
    add_offset:          0.0

🚩 Task: Finally, let’s display the data using Hvplot’s animation widget. Have fun exploring the data using the display buttons!

datacube.hvplot.rgb(
    groupby="t",  # adds a widget that switches the view along the t axis.
    data_aspect=1,
    widget_type="scrubber",
    widget_location="bottom",
)

Question: Can you identify any changes in the landscape around the Taoyuan international airport during the search period?