Monday, August 29, 2011

Smallworld Technical Paper No. 7 - Object-orientation: Some Objectivity, Please!

by Peter Batty Senior Applications Consultant Smallworld Systems

Abstract

"Object" would appear, to most people, to be a fairly innocuous, uninteresting word. Strange, then, that the objective of many GIS vendors these days appears to be to mention the word object as frequently as possible, objectivity being no object in labelling a system as object-oriented, to the extent that the word object is in danger of becoming an object of ridicule in the industry. Many people object to this rather objectionable state of affairs, where the word object is used as frequently and in as many different contexts as in this abstract, so the object of this paper is to try to introduce some objectivity into the use of the word object in relation to GIS.

The paper explains the difference between object-based systems, object-oriented interfaces, object-oriented programming and object-oriented databases. It concentrates in particular on explaining object-oriented programming, using real examples from a GIS application which the author has just implemented in an object-oriented environment.

The paper also asks the question, "So what?" Even if the user can work out in what respects a particular system is object-oriented, need they be concerned about the answer?

Peter Batty is a Senior Applications Consultant with Smallworld Systems of Cambridge, England. He has seven years of experience in the GIS industry- Most of this time was spent working for IBM in both the UK and the USA, and he moved to Smallworld Systems in 1992. His experience includes a wide range of GIS development and implementation projects in many different industries and countries. He has written and presented many articles and papers on technical issues in GIS. He has a BA in Mathematics and an MSc in Computing from Oxford University.

Outline

The term object-orientation has become so widely used in GIS that describing a system as object-oriented has become fairly meaningless. Rather than try to produce a single definition of what constitutes an object-oriented system, this paper attempts to outline the various ways in which the terms object and object-oriented are used in GIS, and produces a summary checklist which can be used to clarify exactly what a vendor means when they describe their system as object-oriented.

Topics which will be covered include the following:

  • Object-based (as opposed to sheet-based or tile-based) systems
  • Object-centred (as opposed to geometry-centred) systems
  • Object-oriented user interfaces
  • Object-oriented programming
  • Object-oriented databases

The first two topics are largely a question of defining terminology and are fairly straightforward to understand. The third topic is rather vague, and is only briefly discussed. The fourth area of object-oriented programming is somewhat more complex to understand and it is this area that this paper will primarily focus on. The final topic of object-oriented databases is not well defined, and various definitions are discussed. Most of the (sensible) definitions of an object-oriented database relate back to object-oriented programming, which emphasises the importance of understanding this area in order to make sense of all the other definitions of object-orientation which one may meet.

The following sections explain each of the above terms, and in each case discuss the relevance of the topic to GIS.

Object-based Rather Than Map-based or Tile-based Systems

Perhaps the lowest level of functionality which is sometimes described as object-oriented in the context of GIS is what this author would describe as an object-based or feature-based approach to storing geographic data. Many GIS, especially those derived from CAD systems, split the database into map sheets, tiles, or geographic partitions. In such systems, any feature which crosses a map or tile boundary needs to be physically stored as multiple geometric objects, although the system will usually contain some functionality to make this split largely invisible to the end user.

In contrast, an object-based system does not partition the database into tiles, and stores geographic objects or features as a fundamental unit in the database. In such a system, linear or area objects never need to be artificially split because they cross a tile boundary.

Although, as mentioned above, systems which use a tile-based approach can make this reasonably transparent to the end user, extra code is required to do this, and typically application development is more complex because of the need to handle special cases where objects are split. There are also potential data integrity problems in trying to ensure that all parts of the object are correctly maintained. It is therefore generally agreed that an object-based approach is preferable to a tile-based approach. The main reason for using a tile-based approach is that this makes it simpler to achieve reasonable performance. However, modern spatial indexing techniques make it possible to get very good performance with a seamless object-based approach.

An object-based system as described in this section has nothing to do with the term object-orientation as it is used in the rest of the computer industry. However, in the experience of this author, many of the GIS which are described as object-oriented by their vendors fall into this category.

Object-centred Systems (as Opposed to Geometry-centred Systems)

Another sense in which the word object is used in relation to GIS data modelling is the term "object-centred" used by Newell (1). He contrasts what he terms object-centred and geometry-centred data models for use in GIS.

A geometry-centred model is one in which the primary classification of objects is a geometric one - for example each object is either a point, line or area. Each of these geometric types are then subdivided into classes which represent objects in the real world, for example a line might represent a road, a river or a gas pipe, and have appropriate attributes associated with it in each case.

In contrast, in an object-centred model, the primary classification of objects is based on the real world - so an object might be a road or a school. This object has multiple attributes, which may be either alphanumeric or geometric. Hence an object could have multiple different geometries of different types with this model. For example, a road might have a line geometry representing its centreline, which could be used for route tracing applications, and an area geometry representing its extent, which might be used in cadastral applications. This approach also facilitates generalisation, by allowing multiple geometric representations of an object which can be used at different scales.

In general, an object-centred data model provides a number of advantages over a geometry-centred model. However, as with the previous case, this use of the word object has nothing to do with object-orientation. At the risk of confusing the issue, it is possible to have an object-oriented system which is either geometry-centred or object-centred as defined in this section (real-world-object-centred might be a more accurate, if rather long, description for the latter data modelling approach). These two approaches are just different ways of classifying the objects within the system.

Object-oriented User Interfaces The term object-oriented interface is a somewhat nebulous one. Some people use it to describe any graphical user interface which makes use of windows, icons, etc., such as Microsoft Windows or X-windows. However, almost all GIS use a standard windowing system, so this is hardly a distinguishing factor in comparing different systems.

Some people use the term object-oriented interface in more specific ways, for example to describe the sort of interface used by systems such as the Macintosh, where the general approach is to select an object first and then choose an action to be carried out upon it, rather than choosing an action first and then an object. However, there is no general agreement on a precise definition of an object-oriented interface.

It is also true to say that, while user interfaces are obviously important in GIS, they are not really a major consideration in terms of the use of object-orientation in GIS, so we will not consider them any further here.

Object-oriented Programming

As stated earlier, understanding object-oriented programming is really the key to understanding definitions of object-orientation in relation to other areas such as databases, so it is this area that this paper will focus on.

One of the main challenges in explaining object-oriented programming is to find examples which are detailed enough to show how it can give significant benefits in practice, without being too long and difficult to understand. This section attempts to provide some such examples, based on real GIS applications.

The language used in the following examples is Smallworld Magik. This paper will just explain the minimum amount about the language syntax which is necessary to understand the examples, since the aim is to explain the important concepts of object-oriented programming in general, rather than any specific language. For a more detailed introduction to the Magik language, see (2).

First we will introduce the basic ideas of object-oriented programming: objects, classes, messages and methods. We will then look in turn at the concepts of encapsulation, polymorphism and inheritance, which are defined by most authors to be the key things which characterise an object oriented programming language.

Objects, Classes, Methods and Messages

Somewhat predictably, the idea of an object is central to object-oriented programming. An object is an item of data, very much like a variable (or constant) in a conventional programming language. Every object belongs to an object class, which is analogous to a data type in a conventional language. So for example, the number 1 is an object belonging to the class integer, and the letter x is an object belonging to the class character. These basic classes are defined as part of the system, as are slightly more complicated classes analogous to other data types in conventional languages, such as arrays.

However, one of the most important things about object classes is that new classes can be defined by the programmer, based on existing classes. For example, we could define a coordinate class, specifying that each coordinate has an x component and a y component, each of which are floating point numbers. This is similar to defining a structure in a conventional programming language such as C. We can access the components of an object as shown in the following example, which creates a new coordinate object with x coordinate of 100 and a y coordinate of 200, and then prints out the x and y coordinates separately:

c << coordinate.new(100, 200) write("x = ", c.x) write("y = ", c.y) 

This would produce the output:

x = 100 y = 200 

The first line creates a new coordinate object and stores this in a variable called c (<< is the Magik assignment operator, like = in C or := in Pascal). In the second line, the expression c.x sends the message x to the object c, and this returns the value 100 (we will look further at messages in a moment). Components of an object, like x and y in this example, are known as slots in Magik or instance variables in Smalltalk.

As an aside, in Magik a slot does not have a fixed type - we could store a character string like "Hello" in the x component of a coordinate if we wanted to (although this might not be a good idea in this case, we will look at examples later where this capability is very useful). In some other languages like C++, the type (or class) or a slot has to be declared in advance and cannot be changed (this is known as strong typing).

Now we will look at messages and methods. So far everything we have discussed relating to objects has an equivalent in conventional languages, like C, which support the definition of composite data types or structures. However, a key difference in an object-oriented programming language is that an object class not only defines the data stored in objects of that class (as we have just briefly discussed), but it also defines all the functions which can operate on objects of that class. These functions are known as methods in an object-oriented system. Data in an object can only be accessed via methods defined on its object class. The significance of this will be discussed in the next section on encapsulation. A method is invoked on an object by sending a message to that object, which causes a method of the same name to be invoked. The distinction between messages and methods can be confusing at first, but the same message could be sent to objects of different classes and result in different methods being executed, because the method was defined differently on each class. This will be discussed further in the section on polymorphism.

To finish this section, we will look at a few examples of invoking methods on objects by sending messages to them. The Magik syntax for sending a message to an object is of the form

object name.message_name

Many methods will return a value (more accurately, they return an object). For example, suppose we had an object called a_road. The following example shows several Magik expressions in the left hand column and the object which is returned on the right:

"High Street"   a_road.name               (a character string                             object)                             A chain object (a   a_road.centre_line        chain is a basic                             geometric object in                             the Smallworld GIS)    a road.centre_line.length 255.0 (the length of                             the road centre-line                             in metres - this                             example shows how we                             can send another                             message to an object                             which is returned from                             another method . . .                             expressions like this                             are evaluated from                             left to right). 

As well as simply returning objects, as shown so far, methods can change data or cause other actions. For example, the message draw() will invoke a method which draws an object on all current windows in the GIS:

a_road.draw()

Parameters can be passed to a method - for example, the method draw_on() will draw an object on a specified window:

a_road.draw_on(a_window)

Some methods create or change objects. In our first example we saw the method new(), which creates a new object:

c << coordinate.new(100, 200)

There is a special message syntax for assigning data to slots, as in the following example:

c.y << 300

This sets the y component of the coordinate c to 300.

Encapsulation

We mentioned in passing in the previous section that the only way that data within an object (i.e. data in a slot) can be accessed or changed is via methods defined on that object's class. This is known as encapsulation, and we will consider its significance in this section. The most important thing about encapsulation is that it provides a well-defined and strictly enforced external interface to an object. This makes it possible to change the internal implementation of an object without affecting any of the other code which uses the object. This is a great advantage when building large and complex systems. We will look at some examples of how encapsulation could be used.

First consider the coordinate example we have already looked at. Our coordinate object class has two slots, and we have methods x and y which allow us to access these slots, and methods x<< and y << which allow us to directly assign values to those slots. For some operations it may be more convenient to work with coordinates expressed in terms of a polar coordinate system, as a radius and angle. We could define two new methods on the coordinate object class, called radius and angle, as follows:

method coordinate.radius   return sqrt(x*x + y*y) endmethod  method coordinate.angle   return atan2(y, x) endmethod 

Now executing the following . . .

c << coordinate.new(3, 4) write("x = ", c.x, " y = ", c.y,       " radius = ", c.radius, "angle = ", c.angle) 

Would produce this output . . .

x = 3 y = 4 radius = 5 angle = 0.9273

Notice that there is no visible difference between the methods which directly access data in the slots (x and y) and the methods which access derived data (radius and angle). At the moment we have no way of setting the radius or angle directly though, as we have not defined methods to do this. However, we could do this as follows:

method coordinate.radius << new radius   current_angle << self.angle # self.angle tells this                                     # object to send the                                     # message angle to itself   x << x * new radius * cos(current_angle)   y << y * new_radius * sin(current angle) endmethod  method coordinate.angle << new angle   current_radius << self.radius   x << x * current radius * cos(new angle)   y << y * current radius * sin(new_angle) endmethod 

We can now change and access radius and angle just as though they were slots. If we now discovered that our application was using the polar form of the coordinates much more than the cartesian form, we could redefine our coordinate object to have slots called radius and angle instead of x and y, for efficiency, and define appropriate methods x, y, x<< and y<< so that all the methods which were previously defined were still available. Any existing programs using coordinates would run without any change, even though the underlying implementation of coordinate has completely changed. Note that slot access and update methods need not exist for all slots in an object, so slots can be hidden from the external programming interface for an object.

Encapsulation is a technique which can actually be used in non-object-oriented language, but it is usually not enforced by the language itself. For example, one could define a coordinate data structure in a language like C, and define functions called set_x, set y, get x and get_y, which were analogous to the slot access methods we have described. However, we are entirely reliant on the discipline of the programmers who use this data structure whenever they access or update a coordinate, they must use the specially provided access functions for doing so, rather than accessing the underlying data structure directly. An object-oriented system strictly enforces this principle of encapsulation.

Polymorphism

Polymorphism is the ability for the same variable to refer at different times to different classes of object. We have found this particularly useful in GIS applications, where there is often a requirement to handle heterogeneous groups of objects. We can send a message to an object without knowing its class, and the appropriate method for that class will be invoked on the object.

We will consider as an example a function to carry out Quality Assurance (QA) on electrical network data which has just been captured. We have a set of rules such as the following, for each object class which is relevant:

  • All low voltage (LV) joints must have at least 1, and no more than 4, cables connected to them.
  • All pole mounted transformers must have at least 1, and no more than 2, lines or cables connected to each of the low voltage and high voltage connections.

If we find an object which does not satisfy all the specified rules, we want to tell the user, highlight the object, and change the currently displayed area so that the object is in the centre of the screen. The way in which we will implement this function is to define a method called valid? on each relevant object class, which returns true or false depending on whether the object satisfies all the rules or not.

At capture time we run interactive checks to ensure that the only object which can be connected to an LV joint is an LV cable, so all we need to check at this stage is the number of objects which are connected to the LV joint. In this particular data model, an LV joint has a single point geometry called location. Thus our validation method can be written as follows:

method lv_joint.valid?   num cables << self.location.all   connected_geometry.size   if num_cables < 1 or num_cables > 4 then     return false   else     return true   endif endmethod 

In the first line, self.location returns the point geometry associated with this Iv_joint object. We then send this point object the method all_connected_geometry, which returns a set containing all the geometries which are connected to that object. In this case we do not wish to look at the individual items in this set, we just want to know the size of the set, so we just send the set the message size. All these methods are already defined in the standard class libraries (i.e. class definitions and methods) which are provided with the system. This last object which is returned (the size of the set, i.e. the number of cables connected to this joint) is assigned to the variable num_cables. We then do a simple test on the value of num cables to check whether this is valid or not, and return true or false accordingly.

We can define a similar method for pole mounted transformers as follows. This is slightly more complicated since this object has two point geometries, called Iv_connection and lv connection. These represent the distinct connection points for low voltage and high voltage cables or lines belonging to this transformer. Again we validate interactively that only LV cables and LV lines can be connected to the Iv_connection, and that only HV cables and HV lines can be connected to the hv connection, so we just need to check the number of objects connected to each of these geometries.

method pm_transformer.valid?   num_lv_conns << self.lv_connection.all_connected._geometry.size   num_hv_conns << self.hv_connection.all_connected_geometry.size   if min(num lv_conns, num_hv_conns) < 1 or      max(num_lv_conns, num_hv_conns) > 2 then      return false   else      return true   endif endmethod 

This method is very similar in principle to the last one, except that in this case we have to check the connections to each of the two geometries belonging to the object.

We can now define our QA validation function as follows:

method qa_menu.validate objects()   for an_object over grs.objects_inside_area(current_qa_area)   loop     if not an_object.valid? then       grs.current object << an_object       an_object.goto()       grs.show_message("Invalid object found")       return     endif   endloop   grs.show_message("QA completed   successfully") endmethod 

This is the complete code for this application. We define the validation function as a method on an object called qa_menu, which is a menu we have created from which the user will initiate the QA function by pressing a button. The qa_menu object has a couple of slots which are referred to in this method. The first is called grs, which is the graphics system we are currently running - this is quite a complex object which is essentially the whole GIS application, which has slots referring to the current database, all the menus displayed, the currently selected object, etc. There is also a slot which stores the QA area we are currently working within. We just want to check objects inside this area, so we send the message objects_inside_area() to the graphics system. This is a special type of method called an iterator method - it returns objects one at a time to the loop which follows. Inside the loop, we send the message valid? to each object which is found inside the area. This is where polymorphism is important - even though we don't know the type of object which has been returned (we could find out, but we don't need to), we can send it the same message, valid?, and the appropriate method called valid? gets invoked depending on the class of the object. This example should clarify the difference between a message and a method. We have defined two distinct methods called valid?, one on the class lv_joint and one on the class pm_transfomer. However, we can send exactly the same message called valid? to an object of either class and the appropriate method will be invoked.

If the object fails the validation test then we make it the current object in the graphics system, which causes its geometry to be highlighted and its object class and attributes to be displayed. We then send the object the message goto(), which causes this object to be displayed in the current graphics view, and finally we display an alert message to the user and exit from the loop (and the method) with a return statement.

The great beauty of this approach is that we can add a new object class to our application, define a method called valid? for it, and the validation code will work immediately without requiring any changes. You don't even need to compile or link anything. In contrast, with a conventional procedural language it would be very hard to write an equivalent QA function which could be extended to accommodate new object classes and rules without having to modify the source code of the validation routine itself. Allowing customers to directly modify product source code is highly undesirable for a software vendor (and indeed for the customer, as it makes support and problem resolution much more difficult), so it is much easier to produce systems which can be easily and cleanly extended in an object-oriented environment like the one we are discussing.

As another aside, one important point which has been touched on in passing is that the Magik programming environment is interactive. One can be running the GIS, modify a validation method like those above while the system is running, and immediately test the effects of the change without having to compile or link anything. The same is true of Smalltalk, but not of C++, which requires you to compile, link, and re-run your application before you can test the change. Having an interactive programming environment makes a huge difference to development productivity.

Inheritance

The third main area which characterises object-oriented programming is inheritance. Inheritance allows new object classes to be defined in terms of existing object classes, inheriting both data structure (i.e. definition of slots) and behaviour (definition of methods) from the defining parent class or superclass. A class which inherits from another class is said to be a subclass of its parent. It is possible to define additional slots and additional methods on a subclass. It is also possible to define a method in a subclass with the same name as a method in its parent class, and this new method will override the method from the parent class. We will look at examples of all these things shortly. In overview though, the inheritance mechanism provides a very powerful way of writing generic code which can be shared by many classes, whilst at the same time allowing any differences from this generic behaviour to be easily defined in subclasses. This results in much smaller amounts of code overall, which again greatly helps the reliability and maintainability of a system.

The value of inheritance is most apparent in quite complicated systems, so it is difficult to illustrate its full benefits in a short paper such as this. To illustrate the basic concept of inheritance though, we will return to our coordinate example. Suppose that for some applications we need to handle 3-D coordinates, and that for the most part these will be used in the same way as 2-D coordinates (displayed on 2-D maps etc), but that in some cases 3-D coordinates will have additional, or different, behaviour.

First we will define a few examples of behaviour on 2-D coordinates. We will assume that we have the access methods x and y which we used before.

We can define a method to measure the distance between two coordinates as follows:

method coordinate.distance_to(another_coordinate)   dx << self.x - another_coordinate.x   dy << self.y - another_coordinate.y   return sqrt(dx*dx + dy*dy) endmethod 

We could also define a method to check whether a coordinate was inside a bounding box (this is a horizontal rectangular area, often used for initial area comparisons in a GIS, which is defined by the its bottom left comer (xmin, ymin) and its top left corner (xmax, ymax)).

method coordinate.inside?(a_bounding box)   bb << a bounding box   if self.x >= bb.xmin and self.x <=bb.xmax and      self.y >= bb.ymin and self.y <= bb.ymax   then     return true   else     return false   endif endmethod 

There would obviously be a lot more methods defined on a coordinate in practice, but these will suffice for this example. A simple example of creating and using some coordinates and related objects is as follows:

# Create a coordinate c1 << coordinate.new(5, 5)  # Create another coordinate c2 << coordinate.new(5, 15)  # Create a bounding box bb << bounding box.new(0, 0, l0, 10)  # Check if c1 is inside the box write(cl .inside?(bb))  # Check if c2 is inside the box write(c2.inside?(bb))  # Calculate the distance from c1 to c2 write(cl .distance_to(c2)) 

This would produce the following output:

True False 10 

We can now define a subclass of coordinate called 3d_coordinate which inherits from coordinate and has an additional slot called z. This will immediately inherit all the methods we have defined on coordinate, so the operations we have defined above will still work in the same way, accessing the x and y coordinate of the 3d_coordinate and ignoring the z coordinate. We could define a new method to calculate the 3d distance between two 3d coordinates as follows:

method 3d_coordinate.3d_distance_to(another_3d_coordinate)   dx << self.x - another 3d_coordinate.x   dy << self.y - another 3d_coordinate.y   dy << self.z - another_3d_coordinate.z   return sqrt(dx*dx + dy*dy + dz*dz) endmethod 

In this way we can easily extend the behaviour of existing classes. We can also modify the behaviour of a subclass relative to its parent by overriding methods. We will look at a different example to illustrate this. As mentioned earlier, Magik allows multiple inheritance, i.e. inheritance from more than one parent class. It is possible to define special classes called mixins, which do not have any slots but are just used to define behaviour (methods) which can be inherited by other classes.

The example we will consider is a data conversion application. We will look at defining methods which specify how objects are interactively created. Within Smallworld GIS, there is standard functionality provided to allow the user to create and manipulate an object called a trail, which is just a general piece of geometry. The trail is a multi-point line, and functions are provided to add points to the trail, move and delete them, generate points by raster line following, etc. The geometry in the trail is used to define the geometry of objects which are added to the system, such as cables or poles. Point objects can be defined either with a single point trail, for an object with fixed orientation, or with a two point trail for an object with variable orientation, where the first point defines the centre of the object and the direction from the first to the second point defines the orientation of the object. This is illustrated in the following diagram:

[Fig not available at this time]

There are various types of behaviour common to point objects in this data capture application, so we define a class called point_object, on which we will define behaviour common to point objects which can be inherited by application objects such as joints, poles and transformers.

We will define a general method for creating the main geometry of a point object from a trail which will cover both of the cases above. It turns out to be useful to allow a point object to be added at the end of a long trail in certain situations, for example when digitising linear objects such as cables. We will therefore specify that point objects without orientation will be added at the location of the last point in the trail with an orientation of zero, whilst point objects with orientation will be added at the last but one point in the trail, and the orientation of the last trail segment will define the orientation of the object. This is illustrated in the following diagram:

[Figure not available at this time]

In this application, when the user presses the insert button, a new object of the current type is created with no geometry, and then this object is sent the message create_geometry_from_trail(), so that the appropriate default geometry will be created from the current trail. Since we have two different sets of behaviour, point objects with and without orientation, we can define two new classes called point_object_with_orientation and point_object_without_orientation on which we can define the appropriate behaviour to create geometry from the trail. Both of these classes inherit from point_object, so that any behaviour which applies to any point object (with or without orientation), can be defined on the point object class, and it will automatically be inherited by these two subclasses.

We now define our methods as follows:

method point_object_without_orientation.        create_geometry_from_trail(grs)   new_point <<   point.new_at(grs.trail.coords.last)   self.default_geometry << new_point endmethod  method point_object_with_orientation.        create_geometry_from_trail(grs)   trail << grs.trail   if trail.size > 1 then     new_point <<     point.new_at(trail.coords[trail.size - 1])     new_point.orientation <<     trail.segment_angle   else     new_point <<     point.new_at(trail.coords.last)   endif   self.default_geometry << new_point endmethod 

The first method creates a new point at the last coordinate in the trail. This is done by sending the graphics system object, grs, the message trail which returns a trail object. This in turn is sent the message coords, which returns a vector (array) of coordinates, and this is sent the message last, which returns the last element of any ordered collection. So we now have a coordinate, and we create a new point at this coordinate (a point has more information than a coordinate, such as an orientation, and information on other geometries which are connected to that point). No orientation is specified for the point here, since the default orientation is zero, which is what we want in this case. We then assign the default geometry of the new object to the point we have created. This assignment causes user-definable rules to be invoked to connect this geometry to other specified geometries within a given tolerance, as appropriate.

The second method is similar, but in this case we define the location of the point to be at the last but one point of the trail, provided that the trail has more than one point. To do this we use the indexing method [n1 which accesses the nth element of any ordered collection. We also assign an orientation to the point, which we obtain by sending the trail the standard message segment angle, which returns the angle of the last segment in the trail. If there is only one point in the trail, we create the new point in the same way as for a point object without orientation.

When we define application point objects like joints, poles and transformers, each of them will inherit either from point object_with_orientation or point object_without_orientation. We could also define a new create_geometry_from_trail() method on any of these specific objects if we wished its behaviour to be different in terms of how its geometry was created from the trail. For example, we might wish to regard a substation as a point object with orientation, since like the other point objects we have considered it is a valid end point for a cable, so it shares behaviour in this respect. However, we wish to represent the primary geometry of a substation as a rectangular area geometry, of a size which depends on the voltage level of the substation.

We would like to define the location of the substation by placing a point at its bottom left corner and making a second pointing to indicate its angle. This could be done with the following method:

method substation.create_geometry_from_trail(grs)   trail << grs.trail    	# Define the bottom left corner and the angle   # from the trail   if trail.size > 1 then     base_coord <<     trail.coords[trail.size-1]     orientation << trail.segment_angle   else     base_coord << trail.coords.last     orientation << 0   endif    # Set the substation size (in mm) depending   # on the voltage   if self.voltage = "LV" then     xsize << 5000     ysize << 3000   else     xsize << 12000     y size << 8000   endif    # Now create the relevant area geometry   new_area <<   area.new_rectangle(base_coord, xsize,   ysize, orientation)   self.default_geometry << new_area endmethod 

So now our substation object has all the behaviour of a point object with orientation, except for the way in which its geometry is created from the trail. It can be seen that in this way inheritance gives us a very powerful technique for sharing code between object classes - we only need to write additional code for a new object class where its behaviour differs from its parent class. This example also illustrates the flexibility of a "real world object centred" data model rather than a "geometry centred" data model: we can define a range of objects to be regarded as "point objects" for the purposes of this application, even though they have different geometry types.

Object-oriented Databases

Whilst there is a reasonable degree of agreement as to what constitutes object-oriented programming, as described in the previous section, there is less agreement as to what constitutes an object-oriented database. Some people, including some well known figures in the GIS industry, seem to use the term for any database which can store "blobs" (binary large objects, such as images), in addition to traditional data types such as numbers and character strings.

An alternative definition is that it is a system which provides a persistent store for objects in an object-oriented programming environment, so that objects continue to exist when a program finishes running. To be regarded as a proper database management system (DBMS), such a system should also support multi-user access to the data, and handle the associated issues of concurrent update, and also provide other standard database functions such as security, backup and recovery. The interface to objects in such an object-oriented database, from a programming point of view, is usually exactly the same as the interface to non-persistent objects.

The advantages of an object-oriented DBMS are essentially an extension of those for object-oriented programming: with such a DBMS it is possible to use all the same data modelling techniques on objects which need to be stored in the database.

However, whilst there is general agreement in the computer industry that object-oriented programming is a good thing, database experts seem to be divided over the virtues of object-oriented databases. There is a well-developed theory behind relational databases, a lot of experience has been gained with them, and there are established standards such as SQL. No such formal theory has been developed for object-oriented databases. There are still some unresolved issues with object-oriented databases, such as providing a general query language and optimising queries. There is therefore a school of thought which says that rather than regarding object-oriented databases as something completely independent of current database technology, relational database systems should be extended to accommodate object-oriented ideas, and to allow them to be used within an object-oriented programming environment.

Smallworld has taken an approach which uses a version managed relational database management system, with an object-oriented programming interface added to it. Accessing objects in the database is very similar to accessing non-database objects, but with a couple of restrictions. The first restriction is that slots in database objects have to have a fixed type (class) which is declared in advance, as with any relational database. This is in contrast to non-database objects in Magik, whose slots can be used to store any object of any class. The second restriction is that in the current version of the system, behavioural inheritance (inheritance of methods) is supported on database objects, but structural inheritance (inheritance of slots) is not. However, exploratory work has been done on structural inheritance and it is planned to support this in a future release of the product.

A database table is regarded as a collection in Magik. A collection is a general class which stores a group of objects. There are many standard subclasses of collection which are provided with the system, such as sets, arrays, ordered collections, etc. Database tables form a class called ds_collection (datastore collection). There are various standard methods which apply to all collections, for example size, which returns the number of elements in a collection, and elements(), which is an iterator method which returns all the elements of the collection in turn.

The following example gives a brief flavour of how database objects can be accessed. Suppose that we have a cost attribute in pipes, which we wish to set in all pipes based on other attributes in the pipe (in reality we would probably set this interactively, triggered by any change in the pipe, but this sort of batch update is a reasonable illustration of access to the database).

for p over pipe_table.elements() loop   material_cost << material_table.at(p.material).unit_cost   p.cost << p.length * material_cost endloop 

In this example we loop over each of the objects (records) in the pipe table. For each one we obtain the material cost by looking in another table. The method at() returns the database object at a specific primary key value in a table, which is a very efficient means of accessing a record. We can also access records using generic SQL-like predicates. The record object returned has slots like any other object, so we access the unit_cost slot of the material record returned. We then assign the cost attribute of the current pipe database record to the pipe length multiplied by the material cost. Since this is a database object, this value is automatically stored in the database.

Summary

We have discussed a number of uses of the term object-oriented in relation to GIS. The following is a summary set of questions which you should ask of anyone who calls their system object-oriented in order to clarify what they mean:

  1. Does it store objects as a fundamental unit in the database, with no need to split objects across tile boundaries or partitions? This is what we called an object-based system: we would not call such a system object-oriented.
  2. Does it have a "real world object centred" data model rather than a "geometry centred" model, as described above? The answer to this question has no bearing on whether or not a system is object-oriented.
  3. Does it provide an object-oriented programming environment which supports the following:
    • a) Encapsulation
    • b) Polymorphism
    • c) Inheritance
  4. Does it provide a set of standard class libraries which can be extended by the customer?
  5. Does it provide a database system which supports each of the previous concepts?

We will not be pedantic about trying to specify a precise set of answers to these questions which mean that a system is object-oriented or not, since this seems a rather pointless exercise.

Conclusion: So What?

Even if you can obtain answers to these questions, what does all this mean? To an end user of the system, it really makes little direct difference. When sitting in front of a system, you cannot tell whether or not it is object-oriented. The primary benefits of object-orientation are in ease of customisation and maintenance of the system, so the person who really sees the benefits of object-orientation is the application developer. In turn, this of course benefits the end user, who can expect to see applications delivered, and bugs fixed, much more quickly in an object-oriented system. It is Smallworld's experience after several years' work developing a GIS using an object-oriented programming environment that this approach is significantly more productive than traditional approaches to development.

References

1. Richard G. Newell. Practical experiences of using object-orientation to implement a GIS, Proceedings of GIS/LIS 92.

2. Arthur Chance, Richard G. Newell and David G. Theriault. An Overview of Smallworld Magik, Smallworld Technical Paper no. 9.

No comments: