Module 5. Raster Data
Learning Objectives
Identify two types of raster data
Define remote sensing
Explain the goal of image classification
Explain how K-means clustering is used to create groups.
Lecture Slides
Lab Assignment
Quiz 5
Lecture Video
Overview
Raster data is a representation of continuous spatial information as a set of cells arranged in a regular grid pattern. Each cell, or pixel, in the grid contains a single value representing a specific attribute such as elevation, land use, or temperature. Raster data is commonly used in remote sensing, GIS, and mapping applications to store and display satellite images, aerial photos, and other forms of continuous spatial data.
In GIS, raster data is often used to represent remotely sensed data. Remote sensing is the process of measuring some object from a distance. In the case of earth observation remote sensing, sensors measure the energy that is reflected or emitted from objects on the earth's surface. Differences in the properties of those features result in different spectral reflectance signatures that can be used to differentiate them from one another. In contrast to remote sensing, in situ measurements are taken through direct contact with an object. This module will introduce you to raster data types, storage, and analysis.
Benefits of Raster Data
Raster data is often used in contrast to vector data when continuous phenomena are mapped. For example, land cover is a continuous phenomenon that may have indiscrete boundaries or transition zones between categories. Land cover is commonly represented with raster images. Land cover is the vegetation characteristics or manmade features on the surface of the earth. This is in contrast to land use, the activities shaping the land's surface. For example, the forest is a type of land cover, but the land use might be residential.
Raster data also benefits from a simple and efficient method of data storage due to the inherent grid structure. Processing speed for rater data is also more efficient than vector data due to this structure. Finally, raster data provides an easy method for visualizing data, either as imagery or as pseudo-color representations.
Types of Raster Data
Aerial photography started with balloons and pigeons but shifted to airplanes after their invention in 1903. The modern remote sensing age began with the launch of Sputnik in 1957, eventually leading to numerous satellites carrying sensors for capturing electromagnetic energy from Earth's surface. Today, most remote sensing is done using these sensors.
Multispectral Data
A multispectral image is one form of the raster data model. This form is a very common way to store information from multiple electromagnetic spectra of the light spectrum, typically beyond what can be seen with the naked eye (visible light). Unlike a normal visible light image that captures only the red, green, and blue (RGB) wavelengths, a multispectral image captures data in many more wavelengths, including near-infrared, mid-infrared, and thermal infrared. The electromagnetic spectrum is represented in the video below.
A multispectral raster image is comprised of multiple bands, and each band is a raster that represents the data measurement for a single portion of the electromagnetic spectrum. Individual bands can be combined to create a multispectral image containing many bands. Landsat is a series of earth observation satellites that produce multispectral images. It has been operated since 11972 and provides one of the most comprehensive global records of the planet's surface. The images captured by the Landsat satellites provide valuable information for a wide range of applications, including agriculture, forestry, mineral exploration, environmental monitoring, and land use planning. There are many other similar satellites today.
We characterize multispectral imagery by four dimensions of resolution. Spatial resolution is the term used to describe the coarseness of an image. It is defined as the distance between the centroid of each pixel and is reported in linear units. For example, Landsat 8 OLI imagery captured from the visible portion of the spectrum has a spatial resolution of 30 meters. Temporal resolution refers to the amount of time it takes for a sensor to revisit a given location at the same view angle in its orbit. There are also two different methods of describing the spectral properties of data. First, spectral resolution is the ability of the sensor to detect small changes in the wavelength. You can think of this as how wide its view is along the spectrum. Higher spatial resolution results from narrower sampling bands, while lower spectral resolution results from wide samples. The second metric is radiometric resolution, the measure of a sensor's ability to measure differences in the magnitude of energy within the ground area of a single pixel. Radiometric resolution is recorded as the bit depth of an image, where the larger the bit depth the more sensitive the sensor is to variations. The table below compares two multispectral sensors' resolution and some of their other attributes.
Table 1. Comparison of Landsat 8 and Sentinel 2 satellite data.
Launch Date
February 11, 2013
June 23, 2015
Operating Agency
US Geologic Survey
European Space Agency
Spatial Resolution
15 - 100 depending on band
10m, 20m, & 60m depending on band
Number of Bands
11
12
Bandwidth (Spectral Resolution)
20-180nm depending on band
15-185nm depending on band
Radiometric Resolution
16-bit
12-bit
Temporal Resolution
16-day
5-days
Other forms of Geospatial Raster Data
A digital terrain model (DTM) is a digital representation of the elevation of the earth's surface. Digital Terrain Models (DTM) may also be stored in a raster format called a Digital Elevation Model (DEM). The pixels in these rasters represent elevation at each pixel location. While rasters are not the only method of representing terrain data, we will focus on raster DEMs in this module. Standard DEMs represent the topographic surface of the earth and contain flattened water surfaces.
DEMs are identified by the horizontal resolution just like multispectral imagery. Additionally, each pixel represents a vertical distance measured in relationship to a vertical datum. DEMs are typically stored in common rater data formats, like TIFFs. There are a number of different methods for deriving elevation data. Light detection and ranging (LiDAR) data are collected from aircraft using sensors that detect the reflections of a pulsed laser beam. The reflections are recorded as millions of individual points, collectively called a point cloud, representing the 3D positions of objects on the surface, including buildings, vegetation, and the ground. A bare earth DEM can be created from this point cloud product.
The Shuttle Topography Radar Mission (STRM), led by the National Geospatial-Intelligence Agency, is an international research effort for collecting digital elevation data between 60 degrees north and 54 degrees south latitude. The radar sensor for this product was flown in 2000 on the Space Shuttle Endevour. Since 2014, Version 3 data product has been available at a resolution of 1-arcsecond (30 meters) resolution from the USGS.
This section only touches the surface of terrain representation. Terrain is a key input to many of the environmental models that are created by geospatial data scientists and deserve more in-depth consideration beyond this course.
Raster Analysis
As mentioned previously, raster data lends itself to efficient analysis due to its fundamental grid-based model. The main form of raster analysis is Map Algebra, where operators are used to transform raster data on a pixel-by-pixel basis. Such transformations can be conducted with a single data layer or by combining multiple layers.
Local Functions
There are four types of local functions that can be applied to raster images. First, mathematical operations apply functions to a raster image. It is possible to use simple arithmetic operators such as add, subtract, multiply, or divide to a raster band to create a new band. Data normalization, where data is rescaled from its raw values to values between 0 and 1 or some other standard range. This is used to make the data appear similar across all records and fields. Logical operations apply Boolean operations to the raster, returning a TRUE or FALSE binary response, typically represented as ones and zeros. Third, reclassification is the process of reassigning raster values based on a predetermined set of values. For example, values between 1 and 10 could be reclassified to 1. This is a common method for generalizing raster data. Finally, multi-layer overlay analyses are used to combine data from multiple raster bands, similar to the overlay method of vector data. The overlay is often used to combine categorical data. For example, combining land use and land cover.
Spectral Indices
Spectral indices are mathematical equations that utilize spectral image bands. Their main function is to highlight spectral differences caused by different ground features. There are numerous spectral indices made for hydrological, geological, and vegetation research. One of the most well-known spectral indices is NDVI, or Normalized Difference Vegetation Index. The usefulness of this index is based on the fact that live green plants absorb solar radiation for photosynthesis while they re-emit radiation in the near-infrard while absorbing in the red region. In contrast, impervious surfaces, clouds, and water features tend to absorb in the NIR region while having a high reflectance in the red portion of the electromagnetic spectrum.
NDVI is calculated as the ratio of NIR - RED spectral reflectance values to NIR + RED spectral reflectance values. The nature of this equation results in values that range from -1 to 1. Dense green vegetation is represented by higher positive values in an NDVI image, while water, clouds, and impervious surfaces produce low values. While NDVI can provide information about land cover, the precise values resulting from NDVI are not useful for quantitative analysis as they are dependent on the individual image and the environmental characteristics at the time the image was created, and cannot be compared directly with values generated for other images.
Neighborhood, Zonal, and Global Analyses
Another strength of raster data is the ability to perform analyses across multiple scales. In this section, we will look at three different scales of analysis.
First, neighborhood operations are applied to a small subset of pixels within an image at a time. The small subset of pixels is known as a moving window or filter. As the moving window moves across the image, the next pixel becomes the central pixel, its value is recalculated by mathematical operation, and it is saved to the output raster. The analyst has control over the size, shape, and step (distance between recalculations). A common moving window is a square 3 b 3 pixel that is moved one step. Neighborhood operations are used to produce new data, such as an image texture layer that provides quantitative information about the variation of pixel values.
Second, zonal functions are operations constrained on defined regions (zones). Zonal functions are an excellent choice for summarizing data when you have established bounded regions. For example, zonal statistics can be used to estimate the average vegetation greenness across different land management units. It is worth noting that the zones may be defined as additional rasters or as vector data.
Next, there are a plethora of different types of global functions. Global operations consider the entire collection of raster pixels to extract different metrics. Cost functions, for example, use a distance measure to evaluate minimal or maximal distances across a raster. Viewshed analysis is another example of global function. It determines the raster surface locations visible to an observer. There are other examples of global functions, too. Remember, typical global operations determine fundamental mathematical or statistical values for the raster.
Image Classification
In addition to applying mathematical operations to an image, it is possible to derive categorical information about features on the earth's surface. Image classification is the process of transforming remote sensing imagery into thematic maps based on the spectral information represented in the digital image. Image classification uses statistical methods to assign image pixels to thematic categories. A common classification goal is to identify different regions of land use categories link 'industrial' and 'residential.'
Two broad categories of classification methods exist. Approaches are grouped based on their need for analyst input. Supervised classification uses labeled input and output data to build and test a classification model. You might use known locations of certain tree species to train a statistical classifier for detecting different tree species in an image. The unsupervised classification does not use predefined examples of the classes. Instead, it relies on emergent patterns in the spectral data. In unsupervised classification, the analyst often performs additional processing to improve the thematic accuracy of the map, such as merging similar categories.
Clustering is an example of unsupervised learning. The process of clustering is splitting apart the full dataset into groups of similar data. A common unsupervised method of classification is K-means. K-means is a centroid-based clustering method. The goal of K-means is to find k clusters and assign pixels to the nearest cluster centroid, such that the squared distances from the cluster are minimized. K is the number of clusters to be identified and is predefined by the analyst, while the rest of the clustering process is handled by the computer.
This description of supervised and unsupervised classification is very simplified. Classification methods are a complex topic and require extensive knowledge of data preparation and handling. These topics and more are covered extensively in courses on remote sensing and machine learning. If you are interested in learning more about these topics, it is advisable to enroll in GGIS 477 or GGIS 527.
Reading
Williams, C. (2019). Raster Formats and Sources The Geographic Information Science & Technology Body of Knowledge (4th Quarter 2019 Edition), John P. Wilson (Ed.). DOI: 10.22224/gistbok/2019.4.11(link is external).
"What is Remote Sensing?" Earth Data: Open Access for Open Science. NASA. https://www.earthdata.nasa.gov/learn/backgrounders/remote-sensing
Last updated