Spatial Data Integration: A Case Study of Map Conflation

with Census Bureau and Local Government Data
 
 
 
 

Hoseok Kang

Civil and Environmental Engineering and Geodetic Science

The Ohio State University, Columbus, Ohio 43210, USA.

kang.77@osu.edu
 
 

University Consortium for Geographic Information Science (UCGIS)

Summer Assembly, June 2001





 
 
 
 

Abstract
 
 

As spatial data resources become more abundant, it becomes more important for local governments to be able to effectively integrate those data to get valuable information. The Census Bureau's TIGER map data sets play an important role for local governments because they anchor demographic data that can be used for statistical and spatial analysis and that will be used for apportionment and allocation of federal funds. Nowadays, many local governments maintain their own large-scale spatial data in the form of countywide digital orthophotos, parcel maps to manage their properties, and road centerline maps. Local governments have realized that there are discrepancies in these various spatial data sets, and it is tedious work to correct them manually by overlaying the TIGER and local government data. The data come from different sources, are at different scales, and have different positional accuracies. From the viewpoint of local governments, it is critical to examine Census Bureau map data and to have errors removed prior to conducting the census enumeration. Otherwise, the errors might have remained for an entire decade until the next major update of the spatial database for the next major census. The object of this paper is to development and implementation of an efficient and practical map conflation procedure for local government and Census Bureau map data.
 
 

Many efforts have been  made to improve positional accuracy and shape fidelity of linear and area  features of TIGER. This research employs conflation procedures, integrating  imagery and map data sets from different sources, to form a new positionally  accurate and topologically consistent spatial data set. This data integration  system is based on geometric principles rather than on attribute information  matching. It is implemented in ArcView as an interactive cartographic system.  The system consists of extensive geometric, topological, and statistical matching strategies for spatial objects, visual verification of countywide digital orthophotos for quality control and editing,  piecewise linear homeomorphism functions for rubber-sheeting, and linear feature matching.
 
 

With the help of this research tool, administrators in Delaware County, Ohio successfully updated the county's 2000 collection blocks, corrected inaccurate addresses and identified missing housing units and their locations. This research allows the local governments to correlate their in-house detailed parcel data with demographic data at the block level. This new information permits very interesting and intricate statistical, sociological, and spatial analysis on growth and change patterns. The success of these methods for combining spatial information to facilitate analysis will lead to more extensive use of these map conflation methods in the future.
 
 
 



 
 
 

1. Introduction
 
 

This paper describes the algorithm of map conflation  as one data integration strategy to integrate spatial data that come  from multiple sources, are at different scales, and have different positional  accuracies. This proposed algorithm does not try to provide a general model  for spatial data integration, but it does attempt to provide a general model  of map conflation for local government. As spatial data resources become more abundant, it becomes more important to be able to effectively relate those data sources with other valuable attribute information files such as census attribute data. However, in order to accomplish this task, the geography of the census tracts, block groups, or blocks must be tied to the local and more accurate sources, typically from digital photography, road centerline  coverages, parcel boundaries, and so on.
 
 

Delaware County, Ohio, close to Columbus metropolitan area, is the fastest growing county in the state. The county's primary interest in conflating the census block boundaries to its GIS base is to generate Census 2000 Collection Blocks (from the block coverage) for every additional address submitted to the Census Bureau as part of the Local Update of Census Addresses (LUCA) program.  The goal of the LUCA program is to provide an opportunity to local governments to review and correct the Census Bureau's Master Address File (MAF). This makes the population more accurately counted so that the county can receive more federal and state funds. Secondly, the map conflation also allows the county to correlate the block level census data to the parcel level geography so that the county and others can implement very interesting and intricate statistical, sociological, and spatial analysis on growth and change patterns.
 
 

There are multiple reasons for the extensive difference between the geography of Census block boundaries and Delaware County's GIS base. The first reason has to do with the scale of early coverage. While Census block boundaries are created from 1":100,000" scale paper maps, the digital orthophotos are captured at 3 scales of 1": 1,200", 2,400", and 4,800" and so is the road centerline and other county coverages used in the conflation process. Jagged effects cannot be avoided when a small-scale map is made larger to change its scale (i.e., the small scale map is just enlarged or magnified without adding details).  The second reason relates to map purpose. Since the purpose of Census enumeration is to provide very accurate attribute information rather than accurate positional information, the positional accuracy for mapping has not been emphasized. However, from the viewpoint of local government, positional accuracy is also important to them as well as attribute information because of property management. The third reason relates to generalization. Since the real world is too complex for our immediate and direct understanding and there is a limited space on a paper map, generalization must be applied when a paper map is created. It is an ill posed problem to recover the original feature from the generalized one without extra information. This can be described as a conflict between analog and digital based data.
 
 

There might exist many other factors that make map conflation complex and not easily defined. Since map conflation is an ill posed problem as discussed above, it is extremely difficult to find a general model and fully automated systems to implement it. Delaware county census block conflation stages are based on Alan Saalfeld's method [1] without iteration. Linear feature matching after rubber sheet transformation is adopted to compensate for having left out the iteration process. Additional background on this research is also found in [2].
 
 
 
 

 

2. Map Conflation Algorithm
 
 

Much research on map conflation has been published   [1][3][4][5]. The first system of map conflation was developed by the Bureau of Census to transfer attributes from their own digital cartographic files to USGS linework files. In this prototype system, the same category of spatial data layers is used. The conflation strategy proposed is as follows [1].
 
 

1. Identity a few matching pairs of point features.

2. Rubber sheet one map to bring it into exact alignment on the few matching pairs 

identified in step 1.

3. Repeat, until no new matches are found:

    A. Compute all nearest neighbor pairs as candidate matches.

    B. Apply configuration measures to confirm, disallow, or defer match classification

         of candidates found in previous step.If no new candidates are confirmed, relax the

         configuration match/similarity criteria and replay them.

    C. Rubber-sheet to align confirmed matches of previous step.
 
 

One main concept of the above strategy is that if some points and their destination location are known, other related points can be transformed based on piecewise linear rubber-sheeting transformations using simplicial coordinates [3]. This concept is very similar to image morphing using triangulation in computer graphic community where match pairs are manually  selected to change one image to another image gradually. However, in the viewpoint of map conflation, how match pairs are found and confirmed automatically is very critical. If wrong match pairs are selected and processed, there will be gross unremovable errors in subsequent stages of the iteration process.
 
 

Candidate match pairs are selected based extensive geometry and topology relationship in the proposed algorithm for Delaware county map conflation project. First, multiple candidates are selected based on proximity.  For each candidate points, neighborhood, connectivity, proximity of connected  points, and spider function are surveyed. Among those candidates, the best  fit one is selected. In order to confirm match pairs, the proposed map conflation algorithm uses a semi-automatic quality processes. This process also picks match pairs based on orthophotos where reference data do not exist. Weighted piecewise linear transformation can be considered in the process of transformation. In traditional piecewise rubber-sheet linear map transformation, edges of each triangle play an important role in transformation. One of characteristics of this transformation is that the neighborhood between edges (imaginary control line) of triangles and points inside of a triangle is maintained no matter what the movement of neighbor control points and the closeness of a point transformed with neighbor control points. It will completely follow the meaning of rubber-sheet of map if the movement and the closeness of neighbor control points are considered because the transformation is dependent on all control points (global transformation). However, it is inefficient or impractical to consider all control points for transformation because there are main weights in control points close to a point needed to transfer. Natural neighbor algorithm would be adopted to find weighted control points [6]. However, this is on going research, not applied or implemented to this proposed algorithm. In here, traditional rubber sheeting is used. After transformation, linear feature matching is conducted. Finding two linear features are not easy task in two complicated graphs where geometry and topology of two graphs are different. Candidate liner features are created from two end nodes which are used in match pairs. Several criteria are used to determine similarity between them and then replace one to other if similarity condition is passed.  Finally, edit process is implement in the final QC process to align unsolved  features. The proposed algorithmic methods for Delaware county map conflation project are as follows.
 
 

1. Identify control points from Census block coverage.

2. Find Match pairs from Delaware GIS-base using multiple criteria.

    - Multiple candidates based proximity.

    - Proximity.

    - Neighborhood.

    - Connectivity.

    - Spider function.

    - Weight of each criterion.

3. Semi-automatic quality control for match pairs.

4. Triangulation of match pairs.

5. Weighted (on going research) or non-weighted piecewise linear transformation.

6. Liner feature matching based on match pairs.

    - same end points

    - edge #

    - distance (between polylines, between points)

    - curvature

7. Quality control (edit).
 
 
 
 
 

3. Implementation and Result
 
 

The above procedures were implemented using the following spatial data.
 

Source Map:

1990 Tiger Block coverage (in ArcView shape file format)
 
 

Reference Frames:

DALIS Project's  Digital Orthophotos, Road Center Line, Townships, Municipalities, and  Subdivision coverages.

USGS's Railroad  and Hydrographic (Polygon and Arc) Coverages
 
 

There were 2206 selected matched pairs, and ultimately 1318 points were selected after quality control.  Most rejected pairs were from hydrographic area where finding matching pairs is difficult even for humans because the natural shape is irregular, the map generalization process has been applied excessively, and riverside could be changed by natural erosion.
 
 

To provide visualization for map conflation, ArcView   is used. Quality control processing for verifying matched pairs and editing unresolved features were implemented in ArcView and some Avenue programs were added to help processing. Other operations are implemented by C++ programs. USGS data were only used for visual aid in QC process.  A logical design of the conflation process is provided.
 
 

To show the results of each process, the clickable diagram   is provided. Click on any interesting step in the following figure. The related figures will be shown.
 


Figure 1. Clickable diagram for map conflation



The proposed map conflation system offers an automated  approach but some human interaction (visual verification of machine-proposed matched pairs and some manual editing) is needed at QC and final stage, respectively. However, this approach reduced types of editing such as line stretching,  line translation, and line rotation to only line stretching.
 
 
 
 
 
 

4. Conclusions and Future Research
 

In this paper, the conflation stages for the Delaware county   map conflation project were shown.  Fully automated solution was attempted, but due to hard problems, a small amount of human interaction was necessary in the proposed  system. This system's operations consist of automatically detecting match pairs, applying quality control, triangulation, piecewise-linear rubber-sheeting, linear feature matching, and finally quality control for unsolved features. This system provides many benefits  to county level GIS- and mapping mangers by drastically reducing human interaction in map data integration. Even though this system provided reliable results for Ohio's Delaware county, more research will be needed to find solutions that can be even more fully automated. Planned future research will investigate the following:

- How to optimize the number of control points (balance the number of control points between oversampling and undersampling  area).

- How to use image processing to find interesting points or features on orthophotos.

- How to weight piecewise linear transformations.

- How to do more advanced linear feature matching.
 
 
 
 
 

5. References

[1] Alan Saalfeld, 1993, Conflation: Automated Map Conflation, Center for  Automation Research, CAR-TR-670, (CS-TR-3066), University of Maryland, College Park.
 

[2] Shoreh Elhamin and Hoseok Kang, 2000, Lessons Learned : Addressing And  LUCA in Delaware County, Ohio, ESRI User Conference 2000. also appeared in URISA 2000 conference.
 

[3] Alan Saalfeld, 1985, A Fast Rubber-Sheeting Transformation  Using SimplicalCoordinates,  The American Cartographer, Vol.12, No. 2, pp.169-173.
 

[4] Maureen P. Lynch and Alan Saalfeld, 1985, Conflation: Automated Map Compilation - A Video Game Approach, AUTOCARTO 7 Proceedings. pp. 343- 352.
 

[5] Alan Saalfeld, 1987, Joint Triangulations and Triangulation Maps, Proceedings of the 3rd Annual ACM Symposium on Computational  Geometry, Waterloo, Ontario, Canada, pp. 195-204.
 

[6] Kokichi Sugiharai, 1999, Surface Interpolation Based on New Local Coordinates, Computer-Aided Design,pp. 51-58.
 
 
 

Conflictfind match pairs based on mutilple criteriaquality controlpiecewise linear transformationlinear feature matchingmanual editauto editfinal result