Monday, August 29, 2011

Smallworld Technical Paper No. 10 - Discrete Geometry with Seamless Topology in a GIS

by R. G. Newell and M. Doe

Abstract

Modern GIS systems go to great lengths to model a seamless world containing a rich topological model. However, there is a requirement, particularly among utility companies such as electricity and telecommunications companies to model their network in both geometric and, possibly, multiple schematic coordinate systems. The schematics may hold an alternative representation or a detail of par of the network. This paper describes an elegant method of modelling such situations so that topology is handled seamlessly and data common to the alternative

representations is not replicated. The technique, known as 'multiple worlds , is a natural and straightforward extension of the spatial indexing techniques used to handle a seamless database efficiently. One of the benefits of the approach is that any network analysis application written to work in a single coordinate world will work unchanged in a multiple world situation. Applications include the integration of electricity substation schematics within a geographic database, facilities management, multi-sheet schematic applications such as power systems diagrams as well as P&ID diagrams in chemical plant applications.

Introduction

The work reported in this paper was carried out as a result of the requirement to model complex network applications, particularly in the electricity supply and telecommunications industries. The ideas presented have been developed within Smallworld GIS. Smallworld GIS is one of a number of systems whose origins lay more in database management foundations than in file and CAD foundations (Chance et al. 1990, Easterfield et al.1990). In such systems, the underlying principle is that the GIS manages a single logical database that manages both spatial and aspatial data together. The principle extends to cases where the data is held in multiple existing (legacy) databases.

One of the benefits of this approach is that the spatial data can be modelled as it really is - a seamless carpet of topologically structured geometry. Such an architecture requires different solutions to those implemented in tile-based systems. In particular, spatial organisation is very important so the system gives a good response time for spatial queries on very large seamless databases. There are many successful spatial indexing schemes (Abel 1983, Abel 1984, Batty 1990, Bundock 1987, Guttman 1984, Kriegel 1990, Libera 1986, Newell 1991, Samet 1989, Seeger 1988, Smith 1990). In this paper we briefly describe the method employed in Smallworld GIS. For a fuller explanation see Newell et al. 1991.

Although much has been written about managing spatial data within one coordinate system, less has been written on managing data in multiple coordinate systems, including cases where there is a topological structure, but possibly no corresponding spatial coordinate data: for example, a complex junction where a large number of inputs are connected to corresponding outputs using a connection'matrix'.

A common example of multiple coordinate systems in the electricity supply and telecommunications industries is schematic diagrams of parts of a network where there is too much detail to represent the structure in the 'geographic' coordinate system. Such a case arises with substation schematics. Figure 1 shows such an example, where a substation at Barnes Road has 3 separate representations, one geographic and 2 schematic, as well as a duct containing cables represented geographically and as a cross-section.

[Figure not yet available]

Figure 1: Geographic, schematic and cross-section representations

On the face of it, one might think that a tile-based architecture is more suited to handling such situations, since presumably such systems have ways of handling connections across tile boundaries. We hope to go on to show in this paper an elegant approach that maintains connections across multiple coordinate systems without incorporating any other compromises into the system.

Handling Topology

Smallworld GIS models the world as a collection of objects that have relationships and attributes. Complex object relationships are modelled by one-to-one, one-to-many or many-to-many relations that are defined in a CASE tool. Object attributes are aspatial (represented as numbers or characters for example) and spatial (represented as points, chains and areas). An aspatial attribute may be defined as a 'logical' attribute; such an attribute is not defined in the database, it is always derived by a method on the object when needed.

[Figure not yet available]

Figure 2: Smallworld Object Structure

Topological relationships are modelled by points, chains and areas sharing nodes, links or polygons. A chain consists of a linear sequence of links (xy or xyz coordinate strings) and an area consists of a contiguous set of polygons. The two level structure allows for a much richer modelling of topology than a layer based approach. Figure 2 shows an idealised representation of the structure underpinning the Smallworld topology model. One object may have multiple geometries and many objects may share one geometry.

Spatial Organisation

The Smallworld GIS Version Managed Datastore holds all of its spatial and aspatial data seamlessly; there are no tiles or layers. The basic storage structure is tabular, but unlike a conventional relational database, the access language is the object-oriented language, Smallworld Magik. Fast spatial retrievals are achieved by means of an improved quadtree indexing technique, which neither complicates nor compromises the logical data structure described above. The primary key of all spatial data contains an automatically generated 'spatial key'. The method both clusters data spatially on disk and provides an efficient index; Figure 3a illustrates how a simple quadtree index can sometimes produce an inefficient index. Figure 3b illustrates the overlapping quadtree index in which a small overlap produces additional digits in the key.

[Figure not yet available]

Figure 3a: Traditional Quadtree Index

[Figure not yet available]

Figure 3b: Smallworld Overlapping Quadtree Index

Because the spatial key is not necessarily unique for two objects in proximity, it is extended by an automatically generated unique identifier (see figure 3c).

       Quadtree key Unique id       Figure 3c: Unique spatial key 

Multiple Worlds

Although it is essential in many cases to model a seamless real world with a seamless geographic database, there are also cases where some information needs to be handled in a different coordinate system - a different world. Examples include applications where part of the model is represented as a drawing schematic, either as an alternative representation or as an expanded detail of part of the geographic world. An example of the latter is a schematic that represents the internal structure of an electricity substation. In such cases, although the geometry may be spread across different coordinate systems, it is important to handle the topology seamlessly, so that applications such as network analysis can work across multiple worlds unhindered by modelling artifacts.

Implementation

Smallworld GIS handles multiple worlds as an extension to the quadtree indexing scheme described above. The spatial index can be extended to include extra high order bits. In this way topological tracing operations work in an identical manner regardless of whether the geometry is in a single coordinate system or multiple coordinate systems. Figure 4 shows how the spatial key is extended to handle multiple worlds,

       World Id Quadtree Key  Unique id       Figure 4: Structure of Smallworld spatial       index. 

A further extension of this idea allows the data to be partitioned according to various non-spatial criteria. In Smallworld GIS, this technique is used to control both drawing priority and to implement the idea of multiple worlds. A number of high order bits are reserved to store a world ID. The next block of bits is reserved to store drawing priority (see figure 5).

The clustering properties of the spatially indexed database will ensure that all geometry in a particular world will be clustered on disc and it is therefore very efficient to scan for geometry within one world.

The spatial scanners, used by the GIS in hit-searching and in screen refresh, can be constrained to find data which has a particular world ID. Hence, viewing and interaction with the spatial database is made world-specific.

Types of World

There may be many types of world in a GIS. Each world type typically has a different meaning for the geometry it contains.

Geographic worlds represent the location of objects in the real world. This geometry may or may not have associated topology.

Schematic worlds have geometry which represents the existence of objects and possibly their topology, but does not carry any real-world positional information.

Cross Section worlds are essentially geographic, i.e. geometry represents real world locations of objects, but use a coordinate system in which the x axis represents distance along some arbitrary direction and the y axis represents vertical position.

Worlds and Universes

There are a fixed number of bits available to be shared between the spatial index, drawing priority and world ID.

The number of bits used for the spatial index (together with the overall extents of the scanned area) determines the size of the smallest quad-tree box and hence the resolution of the quad-tree. This, in turn, affects the efficiency of hit-searching and graphics re-draws. In a world which is densely populated and which has many geographically small objects, a relatively fine resolution would be required. Conversely, a world with very little geometry, or with very sparse geometry, can be scanned with coarser resolution.

The number of bits used for the world ID and for drawing priority determines the total number of worlds and priorities that can exist in the database. Hence, there are competing requirements which have to be juggled to achieve a compromise allocation of bits.

Using the topmost few bits of the spatial index to store a 'universe id', each 'universe' uses a different allocation of the rest of the bits. This allows each universe to contain worlds with certain characteristics. A geographic universe may contain just one world and hence uses its remaining bits to obtain maximum spatial resolution. A substation internals universe or cross-section universe would need to contain a large number of worlds, but each world is small and has little geometry in it. We can therefore use up a large number of bits to store the world ID, leaving relatively few for the spatial index without performance loss. A universe for network schematics lies somewhere between the previous two examples: there may be a relatively small number of schematic diagrams and the diagrams will be relatively sparsely populated compared with the geographic world. Figure 5 shows the full structure of the spatial key.

  Universe  World  Drawing      Quadtree  Unique   Id        Id     priority     Key       id 

Figure 5: Complete structure of spatial key

Alternative Representations of Objects and Networks

An object may be represented in several worlds and it may take a different form in each. For example, an electricity substation is represented in the geographic network by an area which is its boundary. In a schematic diagram, however, the substation is represented as a collection of transformers, busbars and switches.

A cable or pipe may be represented by its centre line (chain) geometry in the geographic world, but by a point in the world owned by a cross section object. The cable may appear in many cross sections.

A network may also have more than one representation. An electricity network, for example, may be represented in the geographic world by a connected network of cables, overhead lines and joints, and in a schematic world as a connected network of load sections. Both collections of geometry describe the same network.

Connections Between Worlds

Topological connections from one world to another may be made with a special object called a hypernode. This object has two point geometries, one in each world. This allows network traces to jump from world to world. For example, a network trace in an electricity network may arrive at a substation, jump into its internals world follow the geometry related topology inside it, then jump back out into the network again.

Hypernodes are a simple example of a multi-pin device and can be given special behaviour to change the trace state, or to prevent certain types of trace from jumping.

Complex Objects

An object may have a very simple representation in the network but may have considerable internal topological complexity which is hidden from the outside world. This internal topology may be implemented in a number of ways:

Objects with Internals worlds

Each object owns its own internals world. The internal topology of the object is associated with geometry in that world.

For example, an electricity substation has components: busbars, transformers and switches, which have their geometry in the owning substation's internals world. These components form an internal topological network for the substation.

Multi-pin-devices

These are objects with several connection pins (point geometries). The number of pins may be fixed, as in a simple switch or junction box, or dynamically variable - pins may be added or removed as required by the user.

External connections to the network are made to these pins, but the internal topology of the object is encapsulated as behaviour on the object itself. This topology may be implemented as a hard coded set of rules, which may refer to attributes of the object itself or to other conditions, or as a connectivity table which may be updated by the user.

For example, the following pseudo-code describes a method on a simple switch object which responds to the message 'connectivity_trace' from the network tracer.

      method simple_switch.connectivity_trace(pin)       if self.closed       then if pin = input_pin            then return output pin            else return input_pin            endif       else            return unset       endif 

A multi-input/output junction box may have an associated connectivity table. When asked for output pins connected to a particular input pin, the junction box will perform a lookup of this table and return a number of pins. Records may be added to or removed from this table as connections are made or broken in the junction box.

Objects with trace behavior

These are objects with simple, geometry related topology but which have behaviour which modifies that simple topology in some way.

For example, a road_junction may have a single node in the road network, which may be connected to several roads. However, when asked for exit roads given a particular approach road, the junction object overrides the simple topology (in which all roads which join that junction are connected) and may take into account such things as one way streets or restricted access. The road junction may also assign various costs to these transitions; turning right (in the UK) may be a more costly move than going straight on or turning left.

As with the multi-pin device, this internal topological complexity may be defined algorithmically or by table lookup, or some combination of the two. There may be a dependence on various internal external parameters such as the day of the week, or the angle between approach road and exit road at a roundabout.

Encapsulating topology inside objects in this way allows us to simplify the geometry of topologically complex networks. In a fibre-optic cable network, for instance, each cable may contain 256 fibres and at each junction, the connections between individual fibres must be modelled. However, the cable can be modelled by a single chain and when network traces are carried out, the current fibre number is carried as an attribute of the trace. The junction objects can use this attribute to determine connectivity and will modify it when passing back output cables. Hence, the network tracer will ask the junction "I have arrived along this cable on fibre number 125. Give me connected output cables and corresponding fibre numbers".

Conclusion

The approach described in this paper to handling topologically structured data in multiple coordinate systems is a simple and natural extension to the spatial indexing techniques employed to organise a single large seamless geographic database. The method is elegant in that it adds no complexity to the topological capabilities of the system implemented for a single coordinate world. Even though geometry is stored seamlessly, the spatial key implemented organises the data physically so that data belonging to each coordinate system is stored contiguously in different parts of the base datastore tables. The approach has been highly effective for implementing complex models for a range of network applications.

References

Abel, D.J. & Smith, J.L. (1983): A Data Structure and Query Algorithm Based on a Linear Key for a Rectangle Retrieval Problem, Computer Vision, Graphics, and Image Processing 24,1, October 1983.

Abel,D.J. & Smith, J.L. (1984): A Data Structure and Query Algorithm for a Database of Areal Entities, The Australian Computer Journal, Vol 16, No 4.

Batty, P. (1990): Exploiting Relational Database Technology in GIS, Mapping Awareness magazine, Volume 4 No 6, July/August 1990.

Bundock, M. (1987): An Integrated DBMS Approach to Geographical Information Systems, Autocarto 8 Conference Proceedings, Baltimore, March/April 1987.

Chance, A., Newell, R.G. & Theriault, D.G. (1990). An Object-Oriented GIS - Issues and Solutions, EGIS '90 Conference Proceedings, Amsterdam,, April 1990.

Easterfield, M.E., Newell, R.G. & Theriault, D.G. (1990): Version Management in GIS Applications and Techniques, EGIS '90 Conference Proceedings, Amsterdam, April 1990.

Guttman, A. (1984): R-trees: A Dynamic Index Structure for Spatial Searching, Proceedings of ACM SIGMOD Conference on Management of Data, Boston, June 1984.

Kriegel, H., Schiwietz, M., Schneider, R., Seeger, B. (1990): Performance Comparison of Point and Spatial Access Methods, in Design and Implementation of Large Spatial Databases: Proceedings of the First Symposium SSD '89, Santa Barbara, July 1989.

Libera, F.D. & Gosen, F. (1986): Using B-trees to Solve Geographic Range Queries, The Computer Journal, Vol 29, No 2.

Newell, R.G., Easterfield, M. and Theriault, D.G. (1991): Integration of Spatial Objects in a GIS, Proceedings of Auto-carto 10, Baltimore March 1991, Volume 7, pp 408-415.

Samet, H. (1989): The Design and Analysis of Spatial Data Structures, Addison Wesley, 1990, ISBN 0-201-50255-0.

Seeger, B. & Kriegel,H. (1988): Techniques for Design and Implementation of Efficient Spatial Access Methods, Proceedings of the 14th VLDB Conference, Los Angeles, California, 1988.

Smith, T. R. and Gao, P. (1990): Experimental Performance Evaluations on Spatial Access Methods Proceedings of the 4th Spatial Data Handling Symposium, Zurich 1990, Vol 2, p991.

No comments: