# Module 4. Vector Data

## Learning Objectives

* Describe what vector data is and identify types of vector data.
* Identify an appropriate spatial operation for a given task or question.
* Explain different types of joins and their potential uses.
* Differentiate between Spatial Operations and Spatial Analysis.

## Lecture Slides

{% embed url="<https://docs.google.com/presentation/d/1QYvU35azmg_JKDrBp2isYGyVewnsg99kCgAJYOLHK3U/edit?usp=sharing>" %}
Lecutre 4. Vector Data
{% endembed %}

## Assignments

* [ ] Lab Assignment
* [ ] Quiz 4
* [ ] Lecture Video

## Overview

As we learned earlier, geospatial data can be stored in vector data models. Vector data is represented by **vertices-** discrete geometric coordinates (x,y). Vertices can be connected to one another through **arcs, or edges,** defined by two end nodes.  Points are zero-dimensional locations comprised of individual vertices. A tree could be represented by a point. Lines are collections of two or more vertices that are connected to one another. Lines are one-dimensional. Rivers are often represented by lines in GIS. Finally, polygons are collections of three or more vertices that are connected in a closed system. A polygon is a two-dimensional feature. Building footprints are commonly stored as polygon features.

### Benefits of Vector Data

Both raster and vector data models have strengths and weaknesses. We talked about these briefly in the previous module.  Vector data is a strong choice when the features you want to represent have discrete boundaries, such as buildings. Vector data also has the benefit that it can be stored without data loss, and geographic location can be measured precisely.  The vector format makes it possible to store multiple data attributes. For example, you might have a vector data set that contains building footprints where each footprint has attributes related to its use, size, and age. This is more difficult to capture in raster data models. Finally, it is possible to record the topological relationships of vector data.&#x20;

The two most common data structures used for vector data are the spaghetti model and the topological data model.  The **spaghetti model** represents points, lines, and polygons as strings of (x,y) coordinate pairs. This model does not have an inherent structure, thus the name spaghetti model. Polygons in the spaghetti model are represented by their own set of (x,y) coordinate pairs. This means that even when two polygons have a shared border, the border is recorded for each of the polygons. **Topological data models** include inherent information about the spatial relationships between vector features.&#x20;

### Topology&#x20;

**Topology** is a set of rules and behaviors that establish how points, lines, and polygons share coincident geometry. Topology is used to ensure data quality about the way features are connected to one another and their relative positions to one another. There are four main types of topological relationships- adjacency, connectivity, containment, and coincidence. **Adjacency (Contiguity)** is the topological relationship when two or more polygons share a common bo,undary and the arc has a direction. **Connectivity** is the topological relationship of coincident arcs and nodes. **Containment (Area)** refers to the creation of boundaries through the connection of arcs to form polygons. **Coincidence** occurs when two points have the same (x,y) coordinates but do not have connectivity.  For example,  street lines have coincident geometry with census blocks.&#x20;

Many GISystems have tools available for evaluating topological relationships and ensuring data integrity. Topological integrity is important when trying to perform an analysis based on spatial relationships between features.&#x20;

### Topological Errors

Topological errors occur during digitization or raster-to-vector conversion. Topological errors can cause errors in analyses when the relationship between features is incorrectly defined. For example, network analysis, such as shortest-path distances, require valid topology in order for the algorithm to determine the elements of a network and their connectivity. **Error propagation** arises when inaccuracies in the original data are propagated through to the output layer.  In some spatial analyses, topological errors can lead to errors in the final product. Thus it is important to evaluate the topological accuracy of data in those cases.&#x20;

### Vector Analysis

Vector data analysis encompasses a wide range of spatial analyses that utilize geometric objects (points, lines, polygons). Vector data analysis falls under the umbrella term **geoprocessing** a collection of analyses that allow users to perform spatially explicit analyses on a dataset or multiple datasets to create a new dataset. A common geoprocessing analysis using GIS is buffering.  **Buffering** is the process of creating a zone that is drawn around any point, line, or polygon that encompasses all of the areas within a specified distance from a feature or collection of features.  An analyst might create a 1-mile buffer around the school and then determine how many parks are located within that buffer. This information can be used to make decisions about school siting, land use planning, and community development.

**Overlay** analysis is another form of geoprocessing where multiple data layers are overlaid to find combinations of data attributes. An example of using overlay analysis in GIS is determining the areas of land that are vulnerable to flooding. In this case, a GIS analyst would overlay a map of flood zones on top of a map of land use. This would allow the analyst to see which areas, such as residential neighborhoods or industrial parks, are located in flood-prone areas and may be at risk for flooding.  There are many different types of geoprocessing tools available in GISystems. The table below outlines some of the most common ones.&#x20;

| Operation        | Description                                                                       | Example                                                                                                 |
| ---------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| Buffer           | define a region by establishing a boundary by a given distance from a feature     | Isolating riparian zones along a river                                                                  |
| Overlay          | overlay multiple data layers to find attribute combinations                       | Habitat suitability                                                                                     |
| Clip             | Reduce a dataset by the boundary of another dataset                               | Reduce data size                                                                                        |
| Merge            | combine two data sets of the same type                                            | Combine two location point files (Dunkin Donuts and Starbucks) to evaluate the location of coffee shops |
| Dissolve         | unify boundaries based on a common attribute value                                | Combine census blocks based on average income                                                           |
| Intersect        | Overlap analysis where all layers of overlap are output                           | Combine residential locations with flood zone map to evaluate the parcels within different flood zones  |
| Union            | Combines two layers while maintaining all input feature boundaries and attributes | Combine land use and zoning data into one dataset                                                       |
| Erase/Difference | Remove features from a dataset                                                    | Intersect a land cover dataset with forest fire extent to examine extent of fragmentation               |

## GIScience of Vector Data

GIScience research related to vector data aims to develop new methods and techniques for managing, analyzing, and visualizing vector data, enhancing the quality and accuracy of vector data, and utilizing the information within vector data to make informed decisions.

Geographic Information Science addresses vector data from a variety of perspectives. For example, spatial data modeling, where research questions address the development of models for representing and storing vector data.  Spatial data analysis is focused on the development of new spatial statistical methods.  Spatial data integration involves developing methods for integrating multisource vector data. While machine learning research involves developing methods for extracting information from geographic data using machine learning algorithms.  Another interesting area of research concerning vector data is uncertainty quantification and visualization which evaluates topics such as error propagation and uncertainty modeling.

## Readings

You may need to obtain these from the [University of Illinois Library](https://www.library.illinois.edu/search-tools/). &#x20;

B. A. Ricker, P. R. Rickles, G. A. Fagg & M. E. Haklay (2020) Tool, toolmaker, and scientist: case study experiences using GIS in interdisciplinary research, Cartography and Geographic Information Science, 47:4, 350-366, DOI: [10.1080/15230406.2020.1748113](https://doi.org/10.1080/15230406.2020.1748113)

Heikinheimo, V., Tenkanen, H., Bergroth, C., Järv, O., Hiippala, T., & Toivonen, T. (2020). Understanding the use of urban green spaces from user-generated geographic information. *Landscape and Urban Planning*, *201*, 103845. <https://doi.org/10.1016/j.landurbplan.2020.103845>&#x20;