Brown Bag: Geoprocessing Semantics

In today’s brown bag we tried to find out what we exactly understand as geoprocessing. The discussion originated in a paper we are currently writing for next year’s AGILE conference.

Geoprocessing is sometimes simply explained as transformation of geospatial data using common Desktop GIS. This view limits geoprocesses to basic operators like applying a buffer, clipping data, merging or intersecting layers, or converting in between different formats. We refer to such operations as geoprocessing primitives, since they can’t be decomposed into further subprocesses. In this sense, even sophisticated geoprocesses like an unsupervised classification of a satellite image or a Kriging interpolation of temperature values are primitives

This paragraph raised some discussions. Is it really clever to distinguish between primitives and composites. In the end every computer process is composed of other processes, until we arrive at the core arithmetics which we can also call the atomic functions. An interpolation as given in the example is not a primitive. Inverse Distance Weighting involves distance calculation, which itself is a primitive (composed of multiplication, division, substraction, and addition). Still, the dinstinction is needed, and we argued we should probably not defined by looking at the individual processes it is composed of.  In this sense, it is perhaps better to identify primitives by trying to specify the dependency between input and output (using functional descriptions).

Two Geoprocessing Profiles are identical if the implementations produce the equal output. This may sound trivial for geoprocessing primitives such as a point-in-polygon test. It always returns true if the point is within the boundaries of the polygon. But computing the polygon’s center already depends on the implementation and can vary significantly. Geoprocessing Profiles can be equivalent to other profiles if they share equal purpose and semantics.

A Geoprocessing Profile describes common geoprocessing types. Identity of a geoprocess is an interesting issue. In the discussion we clarified this a bit more: The parameters (input and output) of a geoprocess can have real world semantics, the geoprocess itself does have a purpose. If two geoprocesses have similar semantics and share the same purpose, we call them equivalent. If all implementations of either geoprocess also produce the same values, we two profiles identical.  Two web services are equivalent if they both implement the same profile (but the implementations can vary). If two webservices are identical, they are one (e.g. one algorithm deployed into two web services). Of course, the same algorithm may be developed in different programming languages.

Galton explains the mutual dependence of geographic objects and processes. An object enacts a process and objects are identifed by the processes they are enacting. The process of flowing water depends on matter (the water) and potential energy (which is in Galton’s view matter as well). On the other hand, we understand a water body to be a river if there is water flowing down a river bed. This is also true for dried out rivers which only carry irregularily water. The potential of water flow matters here.   The streamflow model can be understood accordingly: the input data are, amongst others, about the objects enacting the geographic process of water flow.

Well, I expected that not everyone will agree with this. But we agreed on the following: A process has semantics if it simulates reality. Every process can be simulated in reality (arithmetics with apples in the basket), but only few processes simulate reality (streamflow model simulates the flow of water).  We already have problems to describe (explaining identity, equivalence, hierarchy) and making sense of geographic processes. For computer processes it’s even worse. We therefore argue that a solution suggesting to describe and search geoprocesses is deemed to fail. Galton’s mutual dependence of geographic objects and geographic processes can provide a way out. As you can describe geographic processes by identifying the geographic objects (which is basically everything not a process) which enact the process, we can describe computer processes by identifying the parameter consumed and produced by the process. If a geoprocessing aims to simulate a geographic process, the input parameters have to represent all geographic objects enacting the geographic process. If a user searches for a geoprocess, he has not in the geographic process in mind, but an geographic object enacting (or enacted by) the process. Hence, we should look for data semantically annotated with the geographic object, and then discover all geoprocesses which produce such data.

Posted Friday, November 6th, at 6:00 AM (∞). This post has comments.
blog comments powered by Disqus