Monday, August 29, 2011

Smallworld Technical Paper No. 13 - The Wide Area Connection

by John Rowland, Grampian Regional Council

Abstract

GIS data is voluminous, demanding upon bandwidth and therefore normally requires high speed network links. This has served to constrain "real time" wide area distribution of GIS data. In conjunction with British Telecom, Gandalf Digital Communications Ltd and Smallworld Systems Ltd, Grampian Regional Council believes it has been able to implement a realistic solution to this problem using Smallworld's recently developed "Persistent Cache" functionality running over British Telecom "Kilostream" links.

This paper:

  • briefly explains Grampian Regional Council's requirement for wide area GIS;
  • overviews wide area communication options;
  • explains the basic concept of intelligent bridging;
  • describes the key features of the Smallworld System which have been used to implement wide area connections;
  • reviews experience to date;
  • briefly considers what the future may hold.

Grampian Regional Council & its Corporate GIS

Grampian Regional Council administers a land area of approximately 8,000km2 which is home to a population of 530,000, half of whom live in the City of Aberdeen. As with other Scottish Regional Councils, its responsibilities include the provision of water, drainage, roads, economic development, strategic planning, fire, police, education and social services.

The Council's main headquarters is Woodhill House in Aberdeen, some departments also operate from a number of divisional and other offices located throughout the Region.

In 1992 the Council commenced implementation of a Corporate GIS which was installed in Woodhill House for use by four departments (Economic Development and Planning, Property, Roads and Water Services). The Council selected Smallworld GIS running under UNIX as its core system. At present all departments share a single corporate GIS database which is managed by a Sun MP630 file server. With ongoing data capture this database continues to increase in size; at the time of writing it held 4GB (Giga bytes) of GIS data.

Responsibility for maintaining this database and ongoing implementation of the system on behalf of user departments is vested in a six person team called the "GIS Unit". To date the Council has acquired a total of twenty nine GIS "seats" from Smallworld with more on the way. Ten of these seats have recently been acquired by the Department of Water Services for installation at six different office locations remote from Woodhill House (see figure 1).

Until recently it had not been viable for the Council to operate their Corporate GIS over a wide area network. However, Smallworld's recently developed Persistent Cache database management software combined with "state of the art" network bridge technology has enabled the Council to implement wide area connections using leased British Telecom Kilostream lines. At the time of writing two of the Department of Water Services' remote offices have been connected to the main file server in Woodhill House.

[ Figure 1 not available ]

Wide Area Connection Components

The wide area connection has four key components: a physical communication link (British Telecom 64Kbps Kilostream in the first instance), intelligent bridging (Gandalf LANLine), Smallworld version managed GIS database and Smallworld Persistent Cache software (2).

Physical communication links

A 500m x 500m Ordnance Survey vector tile of an urban area typically contains around 250Kbytes of uncompressed data. In order to pass such a tile over a network and display it in a total elapsed time of less than 45 seconds the network has to pass data at a speed of in excess of 44Kbps (Kilo bits per second). In order to view 1km2 of similar data in the same time the speed would have to increase to in excess of 180Kbps.

This should not be a problem over a local area networks with a bandwidth of10Mbps (Mega bits per second). However, if all that there is between office locations is a public telephone network, a couple of high speed modems operating at 14.4Kbps and the inherent "dial up" delay of analogue communications, then there clearly is a problem.

There is no alternative but to seek a digital communications link . Depending upon what you are prepared to pay, digital links can provide effective line speeds of 64Kbps up to in excess of 8Mbps with minimal "dial up" delay. They can either be ISDN ("pay when you use") dial up links or dedicated "Kilostream" or "Megastream" leased lines.

ISDN

ISDN In United Kingdom ISDN (Integrated Services Digital Network) is available either as ISDN2 providing an effective 128Kbps line speed using two 64Kbps channels or ISDN30 providing an effective 1.92Mbps using thirty 64Kbps channels. At the time of writing British Telecom ISDN2 socket installations were being charged at approx £400 per site, line rent at £84 per quarter and transmission at normal telephone call rate charges.

Leased Lines

Leased lines normally incur an initial installation charge and a subsequent annual rental charge which varies according to distance from the nearest digital exchange. At the time of writing British Telecom were charging £900 per site to install 64Kbps "Kilostream". The annual line rent of a link between two sites varies according to distance and proximity to BT exchanges, some indicative figures are quoted in the ISDN2 v Kilostream comparison below.

In contrast 2Mbps "Megastream2" currently costs £6,200 per site plus £750 per link for a first installation and 8Mbps "Megastream8" £9,734 per site plus £2,625 per link. Line rents vary according to distance between BT exchanges, for example if two exchanges were 50km apart, Megastream2 would currently cost £15,740 per annum to rent and Megastream8 £55,108 per annum. Even the most optimistic GIS cost benefit analysis may have difficulty in justifying expenditure of this magnitude!

Despite current talk of information super highways it is of little surprise that many multi site GIS installations are still reliant upon using tapes, discs and couriers to transfer data between individual sites.

The wide area connections to Grampian Regional Council's six Water Service remote offices are being implemented using a single 64Kbps Kilostream channel to each office.

[ Figure 2 not available ]

Intelligent Bridging

Bridge or gateway devices are needed to connect the physical wide area communication link between two remote sites to the local area networks (LANs) at those sites.

A bridge is effectively a filter which joins two network segments such that data will only pass through the bridge to a second segment if it is destined for a device connected to it. Bridges are commonly used to segment local area ethernets so that unwanted data packets are not allowed to flow along segments where they are not needed.

In a UNIX environment bridging is achieved using the IP (Internet Protocol) part of the TCP/IP protocol (1). Every device connected to an ethernet has its own unique IP address. A data packet being transmitted from one device to another always carries with it the IP address of the device to which it is being sent. In the case of a data packet which is broadcast to all devices on a network the IP address is coded so as to indicate that it needs to be delivered to every device.

Gateways are special devices for transferring data between two different networks which adhere to different network protocols. As such they actually have to restructure the data packets which pass through them and are therefore inherently slower than bridges.

Wide area physical communication links between sites are nearly always slower than the local area networks they connect together, hence bridge or gateway devices are needed to prevent unwanted local area traffic from escaping to and causing congestion on the physical wide area link. Bridges supplied by Gandalf and other vendors for this purpose incorporate a number of intelligent features to enhance their performance.

Data Compression

Data Compression algorithms are used to compress transferred data, so as to achieve actual throughput which exceeds the quoted bandwidth of the physical wide area communication link. The degree of compression depending upon the extent that data is already compressed. For example tests at Grampian Regional Council indicate that their Gandalf "LANLine" bridges operating over 64Kbps Kilostream are able to compress raw NTF files by ratios in excess of 3:1 and already compressed TIFF files by ratios of around 2:1, thus achieving effective throughput of data in excess of 192Kbps for raw NTF and 128Kbps for TIFF. Even higher compression ratios of up to 8:1 can be achieved with these devices.

[ Figure 3 not available ]

Transparent Automatic Dial Up

Transparent Automatic Dial Up Bridges built specifically for connecting local area networks to "dial up" links such as ISDN embody an "automatic dial up facility whereby (for UNIX networking) the bridge is configured with a table which maps different network IP addresses to the phone numbers to which they are connected. Thus packets emanating from a "departure" site will cause their interconnecting bridge to automatically dial up the phone number of the "destination" site.

ISDN bridges will normally also have a configurable "time out" connection period which specifies how long an ISDN connection should remain connected for after a packet has been transmitted. For example if the time out were set to 30 seconds then the connection will close every time there is a break of 30 seconds between transmitted packets. Given that ISDN connection dial up can be made in as little as 5 seconds it is quite feasible to make several very short connections during the course of the working day and only incur a relatively small phone bill.

Automatic dial up and subsequent timed out disconnection is totally transparent to the user thus the ISDN bridge provides a virtual permanent connection.

Bandwidth On Demand

Bandwidth On Demand ISDN2 incorporates two individual 64Kbps channels which can either be used in parallel to achieve an effective 128Kbps bandwidth (with compression actual throughput will be even faster), or separately to send data to two different destinations at the same time. Similarly Kilostream can be installed in multiples of 64Kbps channels and used in much the same way.

"Bandwidth on demand" characteristics of local to wide area bridges enable individual ISDN and Kilostream channels to be automatically opened and closed to different destinations according to actual traffic volumes. With the Gandalf "LANLine" bridges it is also possible to mix and match Kilostream and ISDN together such that an ISDN connection can be opened when a single permanent Kilostream channel becomes overloaded.

Virtual Extended Local Area Networks

The net effect of state of the art intelligent bridging used in conjunction with digital wide area communication links such as ISDN and Kilostream is to create a virtual extended local area network. In a UNIX environment client workstations located at one site can access server devices at another site several kilometres away as if both devices were connected to the same local area network. Albeit with degraded performance if the volume of data being transferred between sites exceeds the available bandwidth of the physical wide area link.

Not only does this permit remote offices to access main office data, but also to output data to peripheral devices, such as expensive large format electrostatic plotters, located in the main office.

Database Version Management

In order to understand how Persistent Cache is being used to provide Grampian Regional Council's "wide area connection" it is first necessary to provide a brief explanation of their implemen-tation of Smallworld's version managed database.

Smallworld Version Management permits several versions of the database to exist simultaneously. In Grampian's case these versions are organised hierarchically as illustrated by figure 2. There is a single definitive top alternative" which is normally never written to directly. Each department is then provided with its own version of the "top alternative" which again are normally never written to directly, instead all users who are required to write to the database are each provided with their own "personal writable alternative".

For routine data capture work users are usually asked to update their departmental alternative on a daily basis by "posting up" their own personal alternative to it. This has to be preceded by a "merge down" of all changes which have already been posted to their departmental alternative. Once all personal versions have been "merged and posted" a departmental administrator then ensures that their own department's alternative is "merged and posted" to the "top" definitive alternative. Thereby inheriting changes and updates made by other departments.

Grampian Regional Council's GIS Unit is responsible for maintaining the Ordnance Survey map base and other shared corporate datasets such as a number of different gazetteers. Within the alternative structure the GIS Unit is treated as another department thus departments, and in turn end users, have their map base maintained for them by virtue of the "post and merge" procedures.

Within the UNIX file system the GIS database is held in a set of files storing different types of data (eg geometrical points, lines, areas, associated attributes etc). Database alternatives can be created so as to either be located totally within a file set held in a single directory or, created so as to reside in a separate sub directory with the same file structure. Thus the UNIX file system can if desired be configured so as to totally or partially mirror the database alternative structure (figure 3). This in turn implies that different alternative versions of the database can be stored on different storage devices on the same network.

Persistent Cache

Smallworld's Persistent Cache software (2) enables all or a subset of a GIS database to be cached to a local disc attached to a client workstation which is in turn configured to be a local cache server to both itself and other clients. By maintaining a copy of frequently accessed data in the local cache, it is an elegant and transparent way of providing large systems with high performance over low speed communication links.

In figure 4, workstation A is a local cache server located at a remote site along with client workstation B. GIS read transactions generated by workstations A and B look first to the local cache to retrieve data. If the requested data has not been cached it is retrieved from the main file server via the wide area connection and then cached.

The local cache has a configurable operating capacity, once this capacity has been filled old cached data is deleted from the cache on a "least recently used" basis. The cache capacity can be set to be large or small depending upon the size of the required database subset. If need be (local disc space permitting) it could be set to be large enough to replicate the original database.

When using Persistent Cache, remote site users are able to retrieve cached data very quickly and uncached data at the speed of the wide area connection. Hence if a subset of the main database is cached there will be occasions when read transactions may suddenly appear to slow down as data is retrieved over the wide area connection.

Write transactions write directly to the user's alternative every time a database record is inserted, updated or deleted and then subsequently copied back to the local cache.

At appropriate periods of time, remote site users initiate merging and posting of their changed data with higher order alternative versions. The merge and post processes are run on whichever machine the various alternatives are held. The local cache being updated where new "merged down" change data is located in a geographical area that is already held in cache.

By virtue of the ability of being able to map alternative versions of the database onto different UNIX directories (see figures 2 and 3) user's alternatives can either be held on the main server back at headquarters or somewhere locally at the remote site. This provides organisations with a high degree of flexibility as to how they operate over wide area connections.

Holding Remote Site Alternatives on Main Server

If users' alternatives are located at headquarters then all write data is passed over the wide area connection whenever a database record is inserted or updated. In a data capture environment this implies that relatively small amounts of data are passed frequently over the wide area connection.

Database commits and alternative version posting are processed back on the main server and therefore no data is passed over the wide area connection. Similarly the merge process (merging down of changed data from higher order alternatives) is also undertaken back on the main server, however the amount of changed data passed back across the wide area connection will depend upon the volume of merged down changed data which maps onto currently cached "geography". By holding all remote site alternative change data on the main server the remote site users do not need to be concerned with data backup and other routine system administration tasks which can all be undertaken back at headquarters.

[ Figure (diagram) not available ]

Holding Remote Site Alternatives Locally

By holding user's alternative change data locally no write data is passed over the wide area connection until the locally held alternative versions are merged and posted with and to higher order versions located back on the main file server. If daily posting and merging is undertaken then this implies a daily transfer of a larger volume of change data over the wide area connection.

The volume of changed data merged back down to the locally held alternatives is entirely dependent upon the amount of data which has been recently posted to the top (definitive) version of the database by other users. This could be considerable if say a new batch of Ordnance Survey maps had been recently loaded.

Populating the Local Cache

The local cache is essentially an extended reflection of the data which a local client work-station holds in memory. It is therefore composed of a subset of object class layers for "blocks" of geographical extent. For example Grampian's Water Service divisional offices cache background map and water supply object class layers for all or part of their divisional areas of operation.

Upon initial creation the local cache is "empty" and must be populated. Users can be left to do this during the course of natural usage, upon first access all data is "hauled" over the wide area connection and then cached. This could be a little tedious if two or more users at the local site are simultaneously hauling data over a 64Kbps line. They could therefore instead organise to "zoom out" to a large extent of geography as they leave for home so that the area in which they wish to work the following day has been cached upon return to work the following morning.

Alternatively initial cache data can be written to tape by staff back at headquarters and then copied into the local cache in order to "kick start" it.

Grampian Regional Council's Wide Area Connection

Kilostream v. ISDN2

Although the Council already had some operational wide area communication links it was decided that the Corporate GIS would have its own dedicated links because of difficulties in extending heavily subscribed existing facilities to sites where GIS was required. The lowest cost option able to provide acceptable performance was therefore sought. This turned out to be a choice between ISDN2 and single channel Kilostream. Capital installation costs were very similar for both (approx £2,500 per site) however, in the case of ISDN2 ongoing running costs varied considerably according to degree of use.

For total daily connection times of less than about four hours per working day ISDN2 is cheaper to operate than fixed fee Kilostream as illustrated below for a notional 247 working days per year at current British Telecom day rate call charges:

[ Figure (cost notes) not available ]

The above costings indicate that the most cost effective option is dependent upon the nature of GIS use at the remote site. If there is a low level of write transaction at a site where a significant proportion of the database is held on the local cache then ISDN2 provides a very flexible and potentially inexpensive wide area link. However, if there is a high level of regular write transaction or considerable regular "hauling" of uncached data throughout the working day then Kilostream is going to be the more viable.

Because it was known that the first two Water Service offices to be connected were "heavy" GIS users (they had been previously using GIS in a standalone capacity) and there still appeared to be technical problems handling broadcast messages over ISDN it was decided to adopt Kilostream for the first wide area connections.

Experience to date

Initial use indicates that the successful operation of the wide area links is more dependent upon operational management than technical factors. The two remote sites connected to date comprise of two locally networked GIS workstations currently used for data capture work. By its very nature data capture work does not involve frequent extended panning across the map base, hence "hauling" of uncached data has not been a problem with a relatively large capacity cache which was pre-populated prior to installation.

Data transfer across the wide area connection performs rather like a motorway contraflow, in so much that if there is very little traffic on the motorway then, ignoring speed limits, traffic flow is virtually as quick as if there were no contra flow. However as the volume of traffic increases the actual throughput speed decreases in almost exponential proportion.

1km2 of inner city water data takes only slightly longer to display when retrieved over the wide area connection as when retrieved straight from cache. However 1km2 of inner city water data plus all Landline OS data takes significantly longer to display.

Grampian's two Water Service offices have been configured so that local user's alternative versions are stored back on the main server, consequently data is passed over the Kilostream every time a record is inserted or updated. Users have noticed a degradation of write transaction time when they both write simultaneously. The degree of degradation is acceptable but does indicate that sites with a number of writing users may need to either store their alternative versions locally or be provided with access to additional communication channels over the wide area link.

The conclusion to date is that the nature of GIS usage needs to be understood in order to specify and configure a wide area connection for optimum performance.

[ Figure 4 not available ]

What Of The Future

Grampian Regional Council believes that it has been able to implement wide area networked GIS at realistic cost using technology which is available today. It has been proven that a single channel Kilostream link operating at 64Kbps is adequate for the scale of present implementation. Furthermore this has been achieved with a great deal of "behind the scenes" activity which is totally transparent to the user.

The computer press makes great play of cheap high speed local and wide area ATM (Asynchronous Transfer Mode) networks being the way of the future (3), however the technology is not yet available and until it is, it is difficult to see how GIS data can be viably transferred between different systems in anything like real time.

In the longer term the Council is keen to reduce the cost of providing wide area connections to more marginal GIS users by using ISDN2 instead of Kilostream. It is also keen to exploit the potential for transfer of data between different organisations using ISDN. The cost of operating ISDN2 between locations over 35 miles apart is the same no matter whether they are 36 or 500 miles apart. Unlike "fixed" Kilostream links, ISDN connections can be made between any two locations which can dial to one another.

Persistent Cache has also been seen as a way of relieving congestion on heavily used local area networks. The Council is currently planning a 6 seat GIS sub network in its headquarters which will use Persistent Cache to reduce the volume of GIS data over the building's main backbone LAN.

Acknowledgements

The authors wish to thank British Telecom, Gandalf Digital Communications Limited, Grampian Regional Council and Smallworld Systems Limited for their support and assistance in compiling this paper. Particular thanks go to Alistair Reid, Andrew Swanson and George Wallace of Grampian Regional Council for their part in installing wide area connection components and Andrew Reid of Gandalf for his enthusiastic support, also to the staff of the Department of Water Services for acting as "test drivers".

References

1. SOUTHERTON A. Modern UNIX, Chapter 4, Wiley 1992.

2. NEWELL R.G. BATTY P.M. GIS databases are different. Proceedings of the AGI 93 Conference Part 3.

3. UNIX NEWS No 56 October 1993, ATM is the wave of the future p63-65.

Copyright © 1996 Smallworld Systems, Inc. All rights reserved

No comments: