Skip to content

1. Overview of Spatial Data#

This chapter explains fundamental concepts of spatial data, defines some related terminology, and describes the features of spatial data that are particular to Altibase.

The Concept of Spatial Data#

Spatial Data#

The term "spatial data" refers to the representation of multidimensional data, such as points, lines and surfaces, as a list of numbers using a particular coordinate system. A typical example of spatial data is electronic map data, which is used to represent the topography of the real world in a coordinate system.

There are two fundamentally different kinds of spatial data: raster data and vector data.

In a vector model, points define coordinates, and points and lines define the borders between different aspects of the real world. The location of each of these aspects on a map is specified and maintained using a consistent coordinate system. Points, lines and polygons are used to represent coordinates or geographical features that are irregularly distributed in the real world. Lines are used to represent one-dimensional features such as roads, and polygons are used to represent two-dimensional features such as forests and the like.

In a raster model, space is uniformly divided into units known as pixels or cells. The location of a geographical aspect or set of coordinates is defined as a matrix of the pixels and cells in which the aspect or set of coordinates exists. The level of detail that it is possible to represent using a raster model depends on the cell size. The area in each cell cannot be divided any further; that is, all of the attributes that apply to the cell apply uniformly to the entire area within the cell. All cells are identical in size. A raster model typically consists of millions of cells.

The basic units in a vector model are points, lines and polygons. Compared to a raster model, a vector model typically consists of far fewer basic units, and moreover they are not uniform in size. A vector file typically contains thousands of elements, the specific location of each element being defined by one or more consecutive coordinate values. Vector data is easier to manipulate on a computer than raster data because the number of data items is lower, and additionally because the model is easier to adjust for different scales and yields more precise results when the scale is changed.

The advantages and disadvantages of vector and raster data are shown in the following table:

Type Advantages Disadvantages
Vector data • Easier to represent real-world phenomena using data structures
• More space-efficient data storage
• Easier construction of topological relationships
• More precise graphical representation
• Enables normalization of locations and attributes
• Complicated data structures
• Difficulty overlapping map layers
• Different topological form of respective units
• Requires expensive equipment
• Complicated spatial operations
Raster data • Makes spatial analysis easier
• Simple and clear data structure
• Consistent topological form of all units
• Easy to overlap map layers
• Low-cost technology and rapid development
• Easier integration with remote sensing data
• Difficulty constructing topological relationships
• Creating a projection is time-consuming
• Large amounts of graphical data
• Large amounts of data loss upon data compression
• Quality of output can be low

Table 1-1 Advantages and Disadvantages of Vector and Raster Data

The Characteristics of Spatial Data#

Spatial data are generally characterized by an indefinite form and presence in large quantities.

Spatial data is said to be indefinite because the content and structure of each spatial object are represented in different ways depending on the type of the spatial object. Furthermore, even two objects of the same type can differ in the number of points, sub-objects, etc. they are comprised of, causing them to have different forms and lengths.

Spatial data is generally voluminous. The size of spatial data can exceed the size of a storage page in a DBMS. For example, it can take megabytes (MB) of storage just to store the data for the coastline of a single nation.

Because of these characteristics, there are many considerations that do not apply to non-spatial data that must be taken into account to store and manipulate spatial data efficiently.

Spatial Data Models#

There are two conventional spatial data models: the field-based model, in which a space is represented using a series of attributes, and the entity-based model, in which a space is represented as an object (or entity) made up of a set of points. In the field-based model, in which each point in space has one or more attributes, spaces are defined in a manner similar to the use of a continuous function in the Cartesian coordinate system.

However, the field-based model does not take the concept of objects, which is important to the entity-based model, into account. The objects that are usually the most important in the entity-based model include points, which indicate the locations of objects; lines, which indicate two connected points; polylines, which indicate a series of connected lines; and polygons.

The spatial data model that was put forth by the Open Geospatial Consortium (OGC) is a conceptual model. That is, the model was defined in terms of spatial schema, and is an abstract data model rather than one that was intended for use in a particular implementation. This spatial schema forms the basis for the ISO 19107 Spatial Schema standard and is used in many other OGC specifications such as the OpenGIS Simple Features Specification and the GML Implementation Specification.

Spatial schema consists of the geometry package, which is a geometric object model in which is proposed a quantitative technique for describing the spatial characteristics of features that are abstractions of real world features, and the topology package, which provides a model for describing the relationships between geometric objects.

Based on the data models in the abstract specification, the OGC proposed more concrete OpenGIS geometry models in specifications intended for use in practical implementation, such as the OpenGIS Simple Features Specification.

The so-called "Geometry Class" is a root class and an abstract class, and consists of subclasses such as Point, Curve, Surface and GeometryCollection. All geometric objects are defined in relation to the spatial reference system for a defined coordinate space. The actual classes that can be instantiated for use are the Point, LineString, Polygon, GeometryCollection, MultiPoint, MultiLineString and MultiPolygon classes. The rest of the classes are defined as abstract classes.

Spatial Database Systems#

A spatial database contains both non-spatial data, typically represented using characters and numbers, and spatial data, which are represented as coordinate values for spatial objects. The geographic objects (usually spatial objects) that are managed in a spatial database have both general and geometric attributes, and include topological information, that is, information about the spatial relationships between objects.

In order to store and manage such spatial data efficiently, a spatial database system must be able to represent and control non-spatial and spatial data at the logical level, and save and manage the data efficiently at the physical level. Therefore, in a spatial database system, it must be possible to represent data logically, and to provide functions for performing spatial operations on spatial data.

On the physical level, it must be possible to store data efficiently, and to provide spatial indexing so that spatial data can be accessed efficiently. In addition, it must be possible to efficiently store and manage the topological and general attributes of the spatial data that describe spatial objects.

Spatial data modeling is a technique that defines how spatial data is represented, and must support spatial data types such as points, lines and surfaces which can be used to describe real-world features. These spatial data types must be as simple and precise as possible, even when they are used to represent compound spatial objects. Additionally, the use of spatial operators must be supported with all spatial data types.

Spatial operators are those that are used to perform various kinds of spatial analysis, including the analysis of the topological relationships between spatial objects, so that spatial queries can be processed efficiently. For this reason, spatial database systems must support a variety of useful spatial operators.

Spatial Index#

In general, complex and expensive operations on geometric figures are required in order to process spatial queries. Traditionally, point object operations involve sequential scanning and evaluation to determine which points are included in which objects. This incurs high processing costs because of frequent, repeated disk access and the repetitive assessment of geometric conditions.

Therefore, efficient spatial access methods have been developed in order to reduce the number of objects that actually need to be processed in order to analyze large amounts of stored spatial data. A so-called "spatial access method" is a method in which a spatial index is used to reduce the number of objects that need to be processed when processing a spatial query. This method has a time complexity requirement, whereby it must be possible to conduct a search in sub-linear time, and a spatial complexity requirement, whereby the size of a spatial index must be smaller than the size of the indexed data. It also requires dynamic update, by which an object can be added to or removed from a spatial index without entailing a great reduction in performance. A typical spatial data access method that satisfies these requirements is one that uses an R-Tree.

The R-Tree, a height-balanced hierarchical multi-dimensional index structure that was originally designed for use with secondary storage, is a generalization of the B-tree for use with multi-dimensional data spaces. Like a B-tree, an R-Tree has a height-balanced tree structure, meaning that references to objects exist only at terminal nodes. The R-tree, which uses an MBR (Minimum Bounding Rectangle) to represent a spatial object, was designed to reduce the number of nodes that must be visited in order to locate a spatial object. In addition, because R-trees support the dynamic creation of tree structures, update and search operations can be performed on one R-tree at the same time, and there is no need to periodically rearrange the tree structure.

Spatial Reference Systems#

In a spatial reference system, mutual associations are drawn between locations in real-world space and spatial objects that are defined on the basis of coordinates in mathematically expressed vector space. Spatial references can be realized using either spatial coordinates or identifiers.

A spatial reference system can be used to define a coordinate system for use with spatial data and the range of geographic area to which each data item refers.

Coordinate System#

A coordinate system is used to specify the relative location of objects in a given area (for example, all or part of the earth's surface).

A reference system that is used to represent a location on the earth using longitude and latitude values is, together with an ellipsoid that represents the topology of the earth, called a geodetic reference system. In Korea, the Korean Geodetic System, which is based on the Bessel Ellipsoid and on longitude, latitudes and azimuth values determined through astronomical observations, is used. GPS surveying has become more common, the demand for universal location reference systems that can be used around the world has grown, giving rise to international geodetic systems that can be used the same way all over the world. International survey systems can be classified as ITRF (International Terrestrial Reference Frame), WGS (World Geodetic System) and PZ (Parametry Zemli) systems, depending on the reference and ellipsoid on which they are based.

Reference Ellipsoids#

A reference ellipsoid is a mathematical representation of the earth. It is a model of the earth that is obtained by rotating the earth around the major and minor axes of an earth-shaped ellipsoid. It is used as a reference model on which to base coordinate values. Various ellipsoids are used around the world, depending on the conditions in each country. One universal ellipsoid that is used around the world is WGS84.

Geoids#

A geoid is a model of the earth's topology that is based on an imagined average sea level if the ocean were somehow able to penetrate the continents, in which all points are perpendicular to the direction of gravity. In other words, a geoid is an ellipsoid that represents the earth and throughout the surface of which the gravitational pull of the earth is the same, and represents a reference surface 0m above sea level which is the average sea level. Although the topology of a geoid is irregular due to differences in the gravitational pull of the earth in different locations, it can be approximated as an ellipsoid having a regular surface.

The Datum#

A datum, meaning a parameter or a set of parameters that is either used unchanged or as the basis for the calculation of other parameters, is a value (i.e. a point, line, or surface) that is determined based on an observation and used as an arbitrary basis for calibrating other observations. A datum consists of longitudinal, latitudinal and azimuth values, and can also be referred to as a "horizontal datum" or "horizontal geodetic datum". In addition, a datum defines an origin, scale and direction according to the axes of the terrestrial coordinate system, and can be a geodetic datum, a vertical datum or an engineering datum. When a datum is changed, the coordinates of its spatial data are also changed.

Projection#

Projection is the process in which three-dimensional information about all or part of the earth is converted to a two-dimensional map by projecting it onto a two-dimensional planar surface, such as a cylinder, a cone, a disk, etc. Everyday paper maps, as well as the numerical maps that are used for 2D GIS, are created using this projection process. Because shape, distance, direction, scale and area are inevitably distorted during the projection process, it is important to choose a projection method such that the distortion of the area to be projected is minimized.

In Korea, the Transverse Mercator (TM) projection method is used to create topographical maps. In TM projection, a geographic area is projected by bringing it into contact with a cylinder at its center meridian line (e.g. 127 ° East Longitude), which accurately projects the area close to the center meridian line. This projection method is thus frequently used when projecting areas that are extensive in the North-South direction. As one moves further from the center meridian, the distortion of distance, area, scale and direction becomes greater.

The Characteristics of Spatial Data in Altibase#

The method by which spatial data is processed in Altibase has the following characteristics:

  • Since Altibase features high-performance memory database technology and includes a spatial data model, it is superior not only for existing GIS applications, but also for the next-generation high-speed spatial data processing systems that will be required in the ubiquitous data environment. Moreover, because the spatial model is integrated with a conventional SQL-based RDBMS model, existing DBMS development environments and expertise can be utilized, leading to increased productivity.

  • In Altibase, spatial data (i.e. information about the location of a particular area) can be stored, managed and analyzed alongside traditional character and numeric data. This allows users to create, analyze and utilize spatial data such as information about the locations of office buildings or the size of flooded areas.

  • The capability of Altibase has been expanded through the inclusion of a series of advanced spatial data types for representing geometric objects such as points, lines and polygons, as well as numerous functions and features for working with these data types. Because it is so easy to integrate business data and spatial data, users can readily increase the range of business logic in their database applications.

Spatial Data Terminology#

Closed#

If the start and end points of a LINESTRING object are the same, the spatial object is said to be "closed". In order for a spatial object to be considered to be a closed object, all of the elements that the spatial object consists of must be closed. For a more detailed explanation of this concept, please refer to the description of the ISCLOSED function.

Compound Object#

Compound objects are those spatial objects that are made up of two or more spatial objects.

Dimension#

A dimension is the minimum number of real numbers needed to represent the location of any point within a figure, object or space. A point has zero (0) dimensions, a straight line has one dimension, a plane has two dimensions, and a body that has volume has 3 dimensions. Note that n-dimensional space is possible, as there is no limit on the number of dimensions.

Element#

An element is any of the common figures that comprise a spatial object. A single spatial object can be constructed from elements such as points, lines or surfaces.

Empty#

A spatial object in which no elements exist is said to be "empty". Unlike NULL, which implies that the contents of an object are unknown, the empty state indicates definitively that the object contains nothing. For a more detailed explanation of this concept, please refer to the description of the ISEMPTY function.

Line#

A line can be thought of as a trace of the continuous movement of a point. Next to the point, the line is the simplest figure. It has a location and a length, but no width or thickness. Lines can be straight or curved.

Multiple Object#

Multiple objects are spatial objects that are made up of more than one spatial object having the same form and comprise the MultiPoint, which consists only of points, the MultiLineString, which consists only of LineStrings, and the MultiPolygon, which consists only of Polygons.

Point#

This is the simplest spatial object. It has a location, but no size.

Scale#

Scale is the ratio between the distance shown in a map or picture and the actual distance in the real world. A scale value is typically expressed as a ratio, which enables the same units to be used for measurements on the map and in the real world. If, for example, the scale of a map is 1:25,000, one real-world unit of distance on the map represents 25,000 times that much distance in the real world. In other words, 1 cm on the map represents 25,000 cm in the real world. Note that the scale of a map describes horizontal distances only, and is not applicable to measurements of area or height.

Simple#

If the elements of a spatial object do not have unusual points, such as intersections or points of tangency, the spatial object is considered to be "simple". For a more detailed explanation of this concept, please refer to the description of the ISSIMPLE function.

Spatial Object#

A spatial object consists of both spatial data, which are used to represent the real world, and operations, such as procedures, methods and functions, related to this data.

Surface#

A surface can be thought of as a trace of the continuous movement of a line, or as a planar form that cannot be described using a single point or line. Surfaces are classified as either planes or curved surfaces depending on whether their profile protrudes from a plane (i.e. a flat surface). Generally, the term "surface" is understood to refer to a planar surface, unless otherwise specified.