Monday, August 29, 2011

Smallworld Technical Paper No. 12 - AM/FM Data Modelling For Utilities

by Peter M. Batty

Abstract

Data modelling for AM/FM is more complex than for traditional applications for a variety of reasons, in particular because of the need to model spatial and topological relationships. This paper examines the general differences between data modelling for AM/FM and other applications, and looks at a range of common modelling issues in utility applications, including the efficient handling of various types of tracing and network analysis, outage management, and generation and maintenance of schematics.

Introduction

There are some significant differences between data modelling for traditional applications and GIS or AM/FM, which arise from the need to model spatial and topological relationships. Utility applications have some additional complications compared to many GIS applications, due to the need to model complex networks, and these complications are not well handled by traditional GIS data models (the term GIS data model is used in two different contexts in this paper: one is the system data model used by the GIS vendor to model spatial and topological aspects of data stored in the system, and the other is the user-specific data model developed on top of the core system for a specific application).

This paper starts by discussing the general differences between data modelling for GIS and other applications, from a user application perspective. It goes on to look at a variety of modelling issues relevant to utility applications, primarily from a system data model point of view. In particular a number of non-traditional modelling techniques which offer benefits for utility applications are examined. A more detailed look is then taken at a fairly complex application which is common to most utilities, outage management, considering how the system data model features which have been discussed can be applied in practice, and what sort of trade-offs need to be considered in the design. Finally, the creation and maintenance of the user-specific data model is discussed, and in particular the use of CASE tools is examined.

Why Is GIS Data Modelling Different?

A traditional data modelling exercise goes through two major stages: the production of a logical model, and then a physical model. The logical model represents a model of the real world and is completely independent of how the system is physically implemented. The physical model represents the actual data structures (typically tables) which are used to implement the logical model. By far the most common way of representing a data model is using some form of entity-relationship diagram in which the objects (or entities) which need to be modelled are displayed as boxes, and the relationships between them are represented by lines.

Dangers in Trying to Represent Spatial Relationships

A data model for a traditional application explicitly records all relationships between entities which may be of interest to the application. This provides a big trap for the unwary person producing a GIS logical data model, because in a typical GIS application there are many relationships which will be derived implicitly from the spatial characteristics of objects. For example, in a utility application you might want to know which facilities are in a given work area, which are in a given tax area, which cables are underneath roads, and so on. In a traditional logical model this would result in defining relationships between work area and all facilities, between tax area and all facilities, between cable and road, and so on. It is possible to very quickly end up with a logical model in which almost everything is related to everything, like the following:

[ Figure not available ]

It should be fairly obvious that this is not a very helpful model, although the author has seen a number of logical models like this. The second variant of the not terribly useful logical model is the following, which again the author has seen in real situations:

[ Figure not available ]

In this case the designer has realised that the previous logical model is not useful, and has tried to show that objects are related by their location. While this model represents a little better what we are trying to model, it is still not very helpful. Firstly, objects are in the majority of cases not related because they have exactly the same location, but because their locations have some more complex relationship between them, such as one crosses the other, one is inside the other, or one is within 100 feet of the other. This could in theory be modelled by defining suitable relationships between locations, but in practice it is totally impractical to maintain explicit relationships between all locations. Secondly, since all objects are shown as having a relationship to location on this diagram, this again clutters things up (if non-spatial relationships were also shown on this diagram it would be very confusing). If we understand that any object may have a spatial component to it, and that through this it can be related to any other object with a spatial component, then it is generally clearer not to try to show this on our logical model.

Although the previous two models were somewhat simplified examples, as a general rule it makes sense to omit spatial relationships from a logical model. This is not an absolute rule though. In particular there are some relationships which could be derived spatially but which could also be regarded as more explicit. A common example is a hierarchical aggregation of areas. For example, in a utility there is usually a hierarchy of organiz- ational areas, such as local office areas, which are aggregated into districts, which are aggregated into regions (use of the terms district, division and region, and their position in the hierarchy, seems to vary from one utility to another). Since a district is always the aggregation of a number of local office areas, this is quite a strong relationship and it may be useful to show this on the logical model (whether this relationship is modelled explicitly in the GIS is a separate implementation decision which will be discussed later).

Representation of Topological Relationships

Similar issues apply to the representation of topo-logical relationships: which objects can be connected to which other objects? As with spatial relation-ships, topological relationships can be quite exten-sive, especially in a utility application, and show-ing them explicitly may clutter up the data model diagram. However, there is a better case for explicitly showing topological relationships than spatial relationships. The main one is that in a utility application there are usually definite rules which are associated with topological relationships, for example a cable can be connected to a trans-former but not to a gas main. This is not generally the case with spatial relationships, for example you would typically not prohibit a cable from crossing a gas main (there may be some such spatial constraints, but they are normally far fewer, and too complex to represent on a data model diagram).

It is very important to make sure that topol-ogical relationships (or connectivity rules) are documented in some form, but this does not necessarily need to be done by showing them on the data model diagram. There are various other options, such as creating a separate entity-relationship type diagram just showing topol-ogical relationships, using a matrix or table to show which pairs of objects can be connected, or just listing the valid connections for an object with the rest of the documentation about that object.

A diagrammatic or tabular representation of connectivity rules can generally only convey high level information about the rules. Rules can be quite complex, so it is often necessary to include some additional description in addition to a diagram. For example, a rule might be that it is only valid to connect two primary electrical conductors if the phases of one are a subset of the other - so a cable with phase BC could be connected to a cable with phase B, but not to a cable with phase AB. Another rule might be that it is not possible to connect two gas mains with different diameters unless they have a suitable fitting in between them.

Some people might debate whether defining these sort of complex rules is part of the data model or part of the application. There is a strong argu-ment for including these as part of the data model, since enforcing them is fundamental to the integrity of the database. Also, when using an object-oriented approach to design and development, which is widely recognized as being of benefit in complex applications like GIS, the behaviour of an object is an important part of its definition and when taking this approach, connectivity rules should definitely be regarded as part of the data model.

In summary, it is difficult to give a hard and fast rule for how to document connectivity rules. In many cases it may be appropriate to display the high level rules on the data model diagram (ignoring complex constraints), if this can be done without detracting from the clarity of the diagram. Whether this is practical depends on the number of topological relationships and the number of explicit relationships which need to be shown on the diagram. In any case, it is likely that addit-ional documentation will be required for each object to explain more complex aspects of the rules.

Constraints Imposed by the System Data Model

We have already mentioned that we are really discussing two separate types of data model in this paper: the application-specific or user data model, which has been the main topic of discussion so far, and the system data model which is provided by the GIS vendor.

It is the system data model which handles fundamental issues such as how spatial data and topological relationships are modelled. When implementing a GIS, you typically cannot change the system data model. This puts constraints on the way that you do data modelling, which at times may hinder you but at other times (hopefully most of the time!) should help you. It helps you because the GIS vendor has implemented a model and functionality which handles the most complex fundamental aspects of the system. It also gives you a frame of reference in which to think about how to specify spatial and topological relationships, which as we have seen already can be difficult to do if you start with a blank sheet of paper. For example, most systems use some variant on the traditional GIS model in which an object can be a point, line or area. As you define the objects in your data model, you categorise each one as a point, line or area, which makes the modelling process much simpler than if you did not have this framework to work within.

However, while this simplification generally helps you, there is a danger that the system data model may not be sufficiently rich to handle the complexity of what you want to model. The experience of this author is that the traditional point-line-area model has a number of shortcomings, especially for modelling utility networks. The next section of this paper looks at various aspects of the system data model, and in particular looks at some examples of non-traditional modelling approaches which offer benefits for utility applications.

System Data Model Issues

This section considers various aspects of the system data model provided by the GIS vendor which are important in being able to model utility networks effectively.

Sheet-based Versus Continuous Models

At a fundamental level, it is very important that the database is seamless, so that an object like a cable is always stored as a single object regardless of how long it is, and it does not have to be split into mul-tiple objects because it crosses arbitrary map sheet or tile boundaries. This significantly simplifies application development and data maintenance.

Most systems now provide this capability at a basic level. However, it is important to consider how you can access these objects as a user or application developer. Because of the very large data volumes in typical GIS databases, most systems only allow you to work on a subset of the database at one time for analysis or update purposes. Being able to access the whole database at once without constraints offers significant advantages for many applications, such as network analysis and outage management.

Providing this seamless access to a database which may be tens or even hundreds of gigabytes in size, and obtaining good performance, is obviously not a simple task. There are two key issues in achieving this. The first is a good spatial indexing mechanism. The most common approach is to use some form of quadtree index. There is extensive literature on this topic - for example see Samet, 1990. The second key issue is the under-lying DBMS architecture. The server-oriented archi-tecture used by standard commercial DBMSs like Oracle, Ingres and Sybase is fundamentally unsuitable for providing the required performance in a networked environment with many users. A client-oriented DBMS architecture can provide an order of magnitude better performance for this sort of application. This DBMS architecture is actually far more important in achieving good performance in multi-user networked environments than the spatial indexing mechanism used, but it has recei-ved far less attention in the literature. See Newell and Batty, 1993, for more details on this topic.

In summary, the important things to look for in a system data model in this area are that it is seam-less, that it allows unconstrained access to the whole database for update or analysis, and that it delivers good performance in a production environment.

Spatial Object Versus Spatial Attribute Model

As we mentioned earlier, most system data models are based on some variant of the point-line-area model, in which each object belongs to one of these categories. Some systems have extended this model, for example by adding a "control point feature" which is like a point, but has two connection nodes. This is suitable for modelling certain electrical objects like transformers and simple switches. However, the basic approach is still that each object has a (single) spatial type. Each object will then have a number of alpha-numeric attributes defined for it (size, material, equipment number, etc.).

A much more flexible model, especially for utility applications, can be obtained by looking at the spatial aspects of an object in a different way. Instead of insisting that an object has a single spatial type, we can allow an object to have multiple spatial attributes, each of which has a spatial type such as point, line or area. This simple step of moving spatial information from an object level to an attribute level gives lots of new modelling possibilities. We will look at some examples to illustrate this.

Depending on the application, you may wish to regard a road either as a line or as an area. If doing some kind of route analysis, you will be interested in tracing along its centreline. If looking at access to properties along the road, you will be interested in the right of way area associated with the road. With the traditional spatial object model you would need to model these two things as separate objects, and typically you would need to write some specific code to create and maintain the relationship between these two objects. With the spatial attribute model, you can simply give the road two spatial attributes, a centerline which is linear, and a right of way which is an area.

A very common requirement in utility applicat-ions is to be able to display an object at a location which is offset from the location where it really exists. For example, many electric utilities display transformers offset from the cable to which they are attached. This can be very simply handled by the spatial attribute model: one spatial attribute can be used to store the actual location, and another spatial attribute can store the location where its picture is to be displayed. This applies to many situations, for example where multiple conductors are running through the same duct, and you may want to dis-play each conductor offset by a different amount.

Another area in which the spatial attribute model greatly simplifies modelling is in the hand-ling of multiple representations of the same object. It is particularly common in utility applications for the same object to appear on multiple different types of map, including various schematics. Again this can be handled simply by defining multiple spatial attributes on an object: one which repres-ents the actual location, and additional ones which represent the position of the object in each type of schematic representation. There are additional modelling techniques which can be useful in handling schematics, such as the use of multiple worlds: we will return to this subject later.

Basic Network Topology Modelling

The way in which network topology is modelled is obviously of fundamental importance to utility applications. With the traditional model, a linear object typically has two nodes, one at each end. Connected objects are defined as those sharing a node, so other objects can only be connected to the end of a linear object. If an new service line needs to be connected in the middle of a cable, for example, the cable must be split into two separate cables to model the connectivity correctly. This can lead to having to split something which is really a single object into many different objects, typically replicating the attributes on every instance, which causes problems in terms of data storage, data maintenance and performance. The following diagram shows the sort of situation we are talking about:

[ Figure (drawing) not available ]

These issues can be overcome by using a two level linear network model. With this approach, every high level linear object - we call this a chain - is made up of one or more (continuous) low level linear objects - we call these links. In the above drawing, the main (secondary) cable geometry would be a single chain consisting of nine links, and each service cable geometry would be a chain with just a single link. The links define the connectivity, but the chains are the spatial attributes associated with an object - so we can just have a single secondary cable object with one set of attributes, rather than having to create nine secondary cable objects. This approach obviously allows points to be connected in the middle of a chain too, for example a single primary conductor could have many transformer connection points along its length.

Modelling Complex Network Topology

Tracing through a linear network is a fundamental requirement for many utility applications. Common requirements for controlling a trace include stopp-ing at specified objects, possibly qualified by attribute, for example stopping at all open valves or switches. Another important requirement for electrical networks in particular is being able to do directional tracing - upstream or downstream.

For simple networks, these requirements can be met by the traditional linear network model consisting of links and nodes, where a link runs between two nodes, and two links are connected if they share a common node. However, utility networks often include objects whose connectivity cannot be easily modelled using this simple model, in particular various types of switching or control facilities. For example, consider the following diagram of a transfer switch:

[ Figure (diagram) not available ]

This switch has three connections, one input and two outputs. The switch can be in one of two positions. In position 1, shown in the drawing, current from the input goes out on output 1, and in position 2 it goes out on output 2.

There are several things which can help us produce an elegant solution to the problem of modelling these complex objects. The first is the spatial attribute model: we could model our transfer switch with three point geometries (spatial attributes), to represent the input connection and the two output connections. The second is that we need some way of defining the behaviour of this object within a trace - the trace needs to know that if it reaches the input connection point, then if the position attribute of the transfer switch is equal to "Position 1" then it should continue tracing from the output 1 connection point, and otherwise it should continue tracing from the output 2 connection point.

This is a situation where object-oriented programming is very useful. With conventional procedural programming, we would need to modify our tracing code each time we wanted to handle a special case object like this, which makes it impossible to create a general purpose trace routine, and causes support and maintenance problems. In an object-oriented system, we define the trace behaviour on each object, so that the general trace code does not need to be modified and new object behaviour can be introduced very easily. For example, our trace code could be set up to check whether any object it hit had a special method defined called trace_outputs, and if it did then it would call this method to get a list of nodes from which it continue the trace (a method is similar to a function in a procedural programming language - see Batty, 1993, for more information on object-oriented programming in GIS). The following is an example of what this method would look like for the transfer switch:

method transfer_switch(trace_input)         if trace_input = input_connection         then                 if self.position = 1                 then                          return {output_connection1}                 else                         return {output_connection2}                 endif         elif trace_input = output_connection1 and                  self.position = 1         then                  return {input_connection}         elif trace_input = output_connection2 and                  self.position = 2         then                  return {input_connection}         else                 return {}         endif  endmethod  

This method can contain any kind of program-ming logic, so the behaviour can be extremely sophisticated if necessary. This gives us an elegant way of handling the transfer switch.

The transfer switch is a fairly simple example - we also need to model more complex devices such as the following switch cabinet (this particular example is an S&C model PMH9):

This is displayed as a single object on the map, but internally it contains four switches which can be operated independently. Each switch controls current on three phases (A, B and C). The left hand diagram shows all three phases combined, and the right hand diagram shows the phases separately. The two switches on the left hand side are group operated switches, which means that they are either open on all three phases or closed on all three phases, and the two switches on the right hand side are fuse switches, which can be independently open or closed on each phase. The trace behaviour we want needs to recognise the positions of all these switches and derive the correct output for a given input appropriately.

The behaviour of the PMH9 switch cabinet is significantly more complex than that of the transfer switch, so we really need something more to help us model that. For this we will introduce another couple of new concepts: multiple worlds and hypernodes.

[ Figure (diagram) not available ]

A good approach to this problem is to model the internal structure of the switch cabinet as a separate set of GIS objects with their own attributes and topology - in this case we need to model bus- bars, fuse switches and group operated switches. We will lay these out in a schematic representation as in the diagram above (the simpler left hand representation is sufficient providing that we store separate attributes for the switch position on each phase, and that the tracing function used can stop based on complex predicates involving these attributes). An issue we need to resolve is where to place these objects - they provide more detail than we really want in our main geographic data-base. This is one area where the concept of multiple worlds is useful. A world is an independent coordinate system within the same database. Many different worlds can be created, and it is possible for a world to be related to an object. In this case we create a new world which we think of as being owned by the switch cabinet. We then create the objects representing the internals of the switch cabinet in its internals world. These can be created from a list of standard templates, or objects can be created and edited individually. These objects are connected using ordinary topological rules, and normal trace constraints will apply, like not going through open switches.

The one thing which is still missing is a link between the cables which are connected to the switch cabinet, which exist in the main GIS world, and the internal switches and busbars, which exist in a separate world belonging to the switch cabinet. This is where the hypernode comes in. A hyper-node is an object which has two point geometry attributes, and special tracing behaviour defined on it, similar to that which we defined on the transfer switch. In this case the special behaviour is quite simple - it just says that if the trace hits a point belonging to a hypernode, then it should continue tracing from the other point belonging to the hyper-node. In this way a hypernode can be used to make a trace jump ("through hyperspace" - hence the name!) from one point to another. The two points (or "ends") of a hypernode can be in different worlds. Hence we can add hypernodes which connect the cables coming into the switch cabinet to the appropriate connection points in the internal model. The nice thing about this approach is that we do not have to define any special trace behaviour on any objects, even though we are modelling some very complex behaviour. We simply specify that the switch cabinet is an object which has internals, and all the special behaviour we need is already defined on the hypernode, which is a standard system object. For a more detailed discussion of the use of multiple worlds, see Newell and Doe, 1994.

Schematics

We have already mentioned that it is a common requirement in utility applications for an object to have multiple representations, appearing not only on one or more types of geographical map, but also potentially on various types of schematic diagram. Several of the modelling techniques which have already been discussed are very useful in handling schematics. Multiple spatial attributes can be used to store the location of the object in each schematic. The geometry for each different schematic can be stored in a different world, to provide a clean separation between each schematic and the geographic representation.

In some types of schematic there may not be a one to one correspondence between objects in the geographic world and objects in a schematic. For example, many cables shown in the geographic world may be combined into a single line section in a schematic. The spatial attribute model is not sufficient to handle this case: we need to be able to handle (explicit) relationships between objects. The ability to define and maintain explicit relation-ships of various types (such as one to many and many to many) is important for many aspects of data modelling. The system should be able to automatically enforce rules relating to relationships, like referential integrity, or more complex rules. For example, in the case of a schematic line section which is related to multiple cables there must be a mechanism for ensuring that the schematic is updated appropriately if any of the cables are modified. These data modelling require-ments can be met using a DBMS feature known as a trigger, which is discussed in the next section.

Maintaining a Complex Model

In a GIS there are often complex relationships which need to be maintained, and complex rules which need to be validated whenever certain objects are updated. It is difficult to consistently validate rules via specific code in the application, because we need to ensure that validation is always done, whichever mechanism is used to update the record. For example, whether the record is created or updated by a data translator, or via one of a number of interactive menus, we always want to make sure that the same validation is done. This can be implemented by the use of a DBMS which supports triggers. A trigger is a function (or method in object-oriented terms) which is invoked whenever a specified object or attribute is inserted, updated or deleted. Ideally, it should be possible to invoke the full range of GIS functions from within a trigger, and it should be possible to cause the current transaction to be rolled back if an invalid condition is found within a trigger.

Triggers can be used for a wide variety of functions. At a very simple level, for example, a trigger could be defined to create a given anno-tation at a standard offset from a certain type of point whenever the point was inserted or updated. A trigger could also be used to implement complex connectivity rules, for example checking that the phases of two connected cables are compatible, and returning an error condition which will roll back the current transaction if they are not. A trigger could also implement more complex functionality such as updating an associated schematic geometry when the geographical representation of a record is updated.

An Application Example

This section looks briefly at some design issues involved in a common utility application, outage management, as an example of the sort of trade-offs which need to be considered when designing a data model.

Outage Management

We will consider the design of an outage management system for a radial electricity network. The basic idea of this application is to record calls from customers whose power is out and from this information predict which device is most probably causing the outage. The customer calls may be entered into a separate system by the telephone operators, and then passed on to the GIS for outage analysis. In a hierarchical network, several customers will be fed from a single transformer. A number of transformers will typically be on a section of network which is isolated by a fuse switch (i.e. if that fuse switch is out, all customers served from all transformers downstream of that fuse switch will be out). Further up the hierarchy there will be other devices such as reclosers which are possible causes of an outage.

In order to predict which device or devices are the likely cause of an outage, we need to look at the pattern of calls. If we receive several calls from customers served by the same transformer, then we would predict that the transformer was the probable cause of the outage. However, if we had predicted several transformers beneath the same fuse switch, then we would change our prediction to say that it was most likely that the fuse switch was out rather than all of the individual transformers. Exactly how many devices need to be predicted before a device further upstream is predicted depends on the type of device upstream and a range of other factors. A detailed discussion of all the design requirements is beyond the scope of this paper. However, it is sufficient to know that for any predictable device on the network, we need to be able to efficiently identify the switchable devices immediately downstream of that device, the transformers directly fed from this device, and if the device is predicted as being out we need to be able to calculate the number of customers and the total load (kVA) downstream of that device.

To meet the requirement of being able to efficiently identify the immediately downstream predictable devices and transformers from a given device, we have several options. We could dynamically trace downstream each time we needed this information. This could potentially involve tracing downstream for some distance, which could be a performance issue. A second possibility is that we could construct a separate network, using a similar approach to that which was discussed for schematics, which contained only the devices we were interested in for outage management, connected by linear objects which could be formed from an aggregation of several cables in the detailed geographic network. We would still have to do a trace each time we needed the downstream devices, using the "outage network" but performance should be better as the network is simpler. However, creating a separate network has a data storage overhead, and we need to write some application code (probably a set of triggers), which creates and maintains the second network automatically. A third option is that we could maintain a set of explicit relationships which models the hierarchy of predictable devices. This would again have some storage overhead, and would need some triggers writing to maintain the hierarchy, but would probably give the best performance of any of the approaches.

So we have (at least) three possible approaches to this problem. The simplest one in terms of the data model and application development require-ments is likely to be the least efficient in terms of performance. This is a case where prototyping is very useful to try the different approaches. This application was recently implemented by the author, and the first approach which was tried as a prototype was the first option described, which was most attractive because of its simplicity. The performance obtained with this approach was tested and found to be good, so it was decided that it was not worth prototyping the other options. This sort of situation occurs quite frequently, where you have a choice of deriving complex relationships between objects on the fly, or of creating more explicit relationships which will give you better performance when querying the relationship, but which has overheads in terms of data storage, application development and performance of updates. On a case by case basis you need to evaluate the pros and cons of the different options, and this may often involve prototyping some of the options.

Managing The Data Model

This section briefly discusses technology which can help in the development and maintenance of a data model, in particular the use of CASE technology.

Problems in Designing and Maintaining a Data Model

One of the largest costs in most large GIS implementations is the cost of customising the system to meet an organisation's specific requirements. In turn, designing and maintaining the data model for the GIS is typically one of the most significant elements of this customisation. Data modelling is a complex task for most applications, but this is particularly true for GIS as we have seen already. GIS projects typically have quite long life cycles, and the technology is relatively new to most users, which means that at the outset of the project they typically do not realise the full capabilities of the system. Both of these things contribute to the fact that requirements are very likely to change during the course of the project, and these often require the data model to change. Also as new applications are developed and added to the system, there may well be requirements for further changes to the data model. Since GIS projects involve capturing large amounts of data, it is critical that these changes to the data model can be made without losing any data which is already stored in the system.

With most traditional GIS software, it has been very difficult to address these issues. Typically the design of the data model takes a long time and is difficult to subsequently change. This usually means that a very long period of time is spent at the beginning of the project doing requirements analysis and data model design to try to make sure that it is exactly right (which of course it never will be) before any other work begins, as it is so difficult to make changes subsequently. This section briefly discusses how a CASE tool can be used to address these issues, and how this in turn radically changes the way in which one can approach the problem of customising a GIS.

What is a CASE Tool?

The acronym CASE stands for Computer Aided Software Engineering, and it is used to describe a variety of computer-based tools which can be used to assist in the design and development of computer programs. Such tools have been developed for various purposes, including the analysis and documentation of procedures and data flows, and the design and documentation of a data model. It is the latter function which we consider here: the use of a tool which can be used to define a graphical representation of a data model, in the form of an entity-relationship diagram which we discussed earlier. By clicking on individual objects it is possible to define more detailed information about them, such as what attributes they have, what the types of these attributes are, and so on. With some CASE tools, it is possible to automatically generate code which will create the data structures which have been designed.

Compatibility Between the CASE Tool and the DBMS

For doing a high level conceptual design, you do not necessarily need a close correspondence between the CASE tool being used and the DBMS which will eventually be used to implement the actual system. However, if the CASE tool is to be used to do the physical database design and to actually generate code, then clearly a much closer link is required between the CASE tool and the DBMS, and application development environment, used to implement the actual system.

This requirement can really be split into two. The first is that the CASE tool supports all the data modelling concepts supported by the DBMS: all its datatypes, all the types of relationships it supports, and other concepts which we have discussed such as triggers. In particular for GIS the CASE tool needs to understand the way in which spatial information and relationships are stored in the DBMS. The CASE tool may extend to covering aspects of the user interface of the application, such as which fields are visible to the user and what sort of interface is used to edit objects and their individual fields.

The second requirement is that the CASE tool must be able to generate code in an appropriate format which will implement the data model which has been designed. Providing that common data modelling concepts are supported, it is obviously technically possible for one CASE tool to output code in multiple formats suitable for different DBMSs.

These requirements are to a certain extent independent of each other. If the first one is met but not the second then it is at least possible to use the CASE tool to do the physical design for the system, but the code to implement this then has to be written manually. On the other hand, it is possible to have a CASE tool which supports a subset of the concepts supported by the DBMS (with GIS, for example, a CASE tool which supports alphanumeric datatypes but not spatial datatypes), which could be made to output code in an appropriate format for the DBMS, but further development work would then be required in the DBMS environment (outside the CASE tool) to incorporate any of these additional concepts in the application. In this situation it is likely to be much more difficult to usefully maintain the data model using the CASE tool after its initial creation, since the CASE tool does not know about any of the changes which have been made to the data model within the DBMS environment. Clearly, the CASE tool is much more useful if it meets both of these requirements in full.

Maintenance of the Data Model

While the requirements in the previous section demand a very close link between the CASE tool and the DBMS, the biggest benefits the author has found from the use of a CASE tool come from taking this integration one stage further and providing the ability to update the data model of an existing DBMS which is already populated with data, without losing any existing data. This is particularly important in GIS projects, since as we mentioned, they tend to last for a long time, requirements are particularly prone to change during the course of the project, and typically the database will contain large amounts of data when these changes have to be made.

It is highly desirable to be able to make data model changes on a test version of the database so that they can be tested before applying them to the production version. Ideally one would like to be able to do this without replicating data in the master database.

Use of an Incremental Development Methodology

If suitable tools are available to allow the data model of a populated database to be easily changed, this can significantly change the approach which is taken to an application development project. Instead of taking the traditional approach of trying to completely design the data model at the beginning of the project, which typically takes a very long time, it is possible to start with a much simpler core data model and develop it over time in parallel with the development of application prototypes. This allows benefits to be delivered to users much more quickly. For a more detailed discussion of the use of CASE tools with GIS, see Kendrick and Batty, 1994.

Conclusion

This paper has covered a range of issues relating to GIS and AM/FM data modelling. We started by considering how best to represent spatial and topological relationships when designing an application data model, which is not specifically covered by any of the common approaches to data model design. We then considered how new features in the GIS system data model could help the user model certain things more easily. From this perspective it is important that GIS vendors continue to look at enhancing their system data models rather than just continuing to use the simple point-line-area spatial object model which is still the most commonly used approach. It is also important for users to consider the system data model of any system which they are evaluating. Finally, we discussed CASE technology which can simplify the creation and maintenance of a data model, and which in doing so can radically change the approach which is taken to a GIS development project, by allowing the data model to be developed incrementally rather than having to completely design it at the beginning of the project.

References

Batty, P.M, 1993: Object-Orientation - some objectivity please!: Proceedings of GIS 93 Conference, Birmingham, UK.

Kendrick, G., and Batty, P.M., 1994: Use of an Integrated Case Tool for GIS Customisation: Proceedings of EGIS 94.

Newell, R.G., and Batty, P.M., 1994: GIS databases are different: Proceedings of AM/FM Conference XVII, pp 279-288.

Newell, R.G., and Doe, M., 1994: Discrete Geometry with Seamless Topology in a GIS.

Samet, H., 1990: The design and analysis of spatial data structures: Addison-Wesley, Reading, Massachussetts, 493 p.

No comments: