with Census Bureau and Local Government Data
Hoseok Kang
Civil and Environmental Engineering and Geodetic Science
The Ohio State University, Columbus, Ohio 43210, USA.
kang.77@osu.edu
University Consortium for Geographic Information Science (UCGIS)
Summer Assembly, June 2001
Abstract
As spatial data resources
become more abundant, it becomes more important for local governments to be
able to effectively integrate those data to get valuable information. The
Census Bureau's TIGER map data sets play an important role for local governments
because they anchor demographic data that can be used for statistical and
spatial analysis and that will be used for apportionment and allocation of
federal funds. Nowadays, many local governments maintain their own large-scale
spatial data in the form of countywide digital orthophotos, parcel maps to
manage their properties, and road centerline maps. Local governments have
realized that there are discrepancies in these various spatial data sets,
and it is tedious work to correct them manually by overlaying the TIGER and
local government data. The data come from different sources, are at different
scales, and have different positional accuracies. From the viewpoint of local
governments, it is critical to examine Census Bureau map data and to have
errors removed prior to conducting the census enumeration. Otherwise, the
errors might have remained for an entire decade until the next major update
of the spatial database for the next major census. The object of this paper
is to development and implementation of an efficient and practical map conflation
procedure for local government and Census Bureau map data.
Many efforts have been
made to improve positional accuracy and shape fidelity of linear and area
features of TIGER. This research employs conflation procedures, integrating
imagery and map data sets from different sources, to form a new positionally
accurate and topologically consistent spatial data set. This data integration
system is based on geometric principles rather than on attribute information
matching. It is implemented in ArcView as an interactive cartographic system.
The system consists of extensive geometric, topological, and statistical
matching strategies for spatial objects, visual verification of countywide
digital orthophotos for quality control and editing, piecewise linear
homeomorphism functions for rubber-sheeting, and linear feature matching.
With the help of this
research tool, administrators in Delaware County, Ohio successfully updated
the county's 2000 collection blocks, corrected inaccurate addresses and identified
missing housing units and their locations. This research allows the local
governments to correlate their in-house detailed parcel data with demographic
data at the block level. This new information permits very interesting and
intricate statistical, sociological, and spatial analysis on growth and change
patterns. The success of these methods for combining spatial information
to facilitate analysis will lead to more extensive use of these map conflation
methods in the future.
1. Introduction
This paper describes the algorithm of map conflation
as one data integration strategy to integrate spatial data that come
from multiple sources, are at different scales, and have different positional
accuracies. This proposed algorithm does not try to provide a general model
for spatial data integration, but it does attempt to provide a general model
of map conflation for local government. As spatial data resources become
more abundant, it becomes more important to be able to effectively relate
those data sources with other valuable attribute information files such as
census attribute data. However, in order to accomplish this task, the geography
of the census tracts, block groups, or blocks must be tied to the local and
more accurate sources, typically from digital photography, road centerline
coverages, parcel boundaries, and so on.
Delaware County, Ohio, close to Columbus metropolitan
area, is the fastest growing county in the state. The county's primary interest
in conflating the census block boundaries to its GIS base is to generate
Census 2000 Collection Blocks (from the block coverage) for every additional
address submitted to the Census Bureau as part of the Local Update of Census
Addresses (LUCA) program. The goal of the LUCA program is to provide
an opportunity to local governments to review and correct the Census Bureau's
Master Address File (MAF). This makes the population more accurately counted
so that the county can receive more federal and state funds. Secondly, the
map conflation also allows the county to correlate the block level census
data to the parcel level geography so that the county and others can implement
very interesting and intricate statistical, sociological, and spatial analysis
on growth and change patterns.
There are multiple reasons for the extensive difference
between the geography of Census block boundaries and Delaware County's GIS
base. The first reason has to do with the scale of early coverage. While
Census block boundaries are created from 1":100,000" scale paper maps, the
digital orthophotos are captured at 3 scales of 1": 1,200", 2,400", and 4,800"
and so is the road centerline and other county coverages used in the conflation
process. Jagged effects cannot be avoided when a small-scale map is made
larger to change its scale (i.e., the small scale map is just enlarged or
magnified without adding details). The second reason relates to map
purpose. Since the purpose of Census enumeration is to provide very accurate
attribute information rather than accurate positional information, the positional
accuracy for mapping has not been emphasized. However, from the viewpoint
of local government, positional accuracy is also important to them as well
as attribute information because of property management. The third reason
relates to generalization. Since the real world is too complex for our immediate
and direct understanding and there is a limited space on a paper map, generalization
must be applied when a paper map is created. It is an ill posed problem to
recover the original feature from the generalized one without extra information.
This can be described as a conflict between analog and digital based data.
There might exist many other factors that make map conflation
complex and not easily defined. Since map conflation is an ill posed problem
as discussed above, it is extremely difficult to find a general model and
fully automated systems to implement it. Delaware county census block conflation
stages are based on Alan Saalfeld's method [1] without iteration. Linear
feature matching after rubber sheet transformation is adopted to compensate
for having left out the iteration process. Additional background on this
research is also found in [2].
2. Map Conflation Algorithm
Much research on map conflation has been published
[1][3][4][5]. The first system of map conflation was developed by the Bureau
of Census to transfer attributes from their own digital cartographic files
to USGS linework files. In this prototype system, the same category of spatial
data layers is used. The conflation strategy proposed is as follows [1].
1. Identity a few matching pairs of point features.
2. Rubber sheet one map to bring it into exact alignment on the few matching pairs
identified in step 1.
3. Repeat, until no new matches are found:
A. Compute all nearest neighbor pairs as candidate matches.
B. Apply configuration measures to confirm, disallow, or defer match classification
of candidates found in previous step.If no new candidates are confirmed, relax the
configuration match/similarity criteria and replay them.
C. Rubber-sheet
to align confirmed matches of previous step.
One main concept of the above strategy is that if some
points and their destination location are known, other related points can
be transformed based on piecewise linear rubber-sheeting transformations using
simplicial coordinates [3]. This concept is very similar to image morphing
using triangulation in computer graphic community where match pairs are manually
selected to change one image to another image gradually. However, in the
viewpoint of map conflation, how match pairs are found and confirmed automatically
is very critical. If wrong match pairs are selected and processed, there
will be gross unremovable errors in subsequent stages of the iteration process.
Candidate match pairs are selected based extensive
geometry and topology relationship in the proposed algorithm for Delaware
county map conflation project. First, multiple candidates are selected based
on proximity. For each candidate points, neighborhood, connectivity,
proximity of connected points, and spider function are surveyed. Among
those candidates, the best fit one is selected. In order to confirm
match pairs, the proposed map conflation algorithm uses a semi-automatic quality
processes. This process also picks match pairs based on orthophotos where
reference data do not exist. Weighted piecewise linear transformation can
be considered in the process of transformation. In traditional piecewise rubber-sheet
linear map transformation, edges of each triangle play an important role
in transformation. One of characteristics of this transformation is that
the neighborhood between edges (imaginary control line) of triangles and
points inside of a triangle is maintained no matter what the movement of
neighbor control points and the closeness of a point transformed with neighbor
control points. It will completely follow the meaning of rubber-sheet of
map if the movement and the closeness of neighbor control points are considered
because the transformation is dependent on all control points (global transformation).
However, it is inefficient or impractical to consider all control points
for transformation because there are main weights in control points close
to a point needed to transfer. Natural neighbor algorithm would be adopted
to find weighted control points [6]. However, this is on going research,
not applied or implemented to this proposed algorithm. In here, traditional
rubber sheeting is used. After transformation, linear feature matching is
conducted. Finding two linear features are not easy task in two complicated
graphs where geometry and topology of two graphs are different. Candidate
liner features are created from two end nodes which are used in match pairs.
Several criteria are used to determine similarity between them and then replace
one to other if similarity condition is passed. Finally, edit process
is implement in the final QC process to align unsolved features. The
proposed algorithmic methods for Delaware county map conflation project are
as follows.
1. Identify control points from Census block coverage.
2. Find Match pairs from Delaware GIS-base using multiple criteria.
- Multiple candidates based proximity.
- Proximity.
- Neighborhood.
- Connectivity.
- Spider function.
- Weight of each criterion.
3. Semi-automatic quality control for match pairs.
4. Triangulation of match pairs.
5. Weighted (on going research) or non-weighted piecewise linear transformation.
6. Liner feature matching based on match pairs.
- same end points
- edge #
- distance (between polylines, between points)
- curvature
7. Quality control (edit).
3. Implementation and Result
The above procedures were implemented using the following
spatial data.
Source Map:
1990 Tiger Block coverage
(in ArcView shape file format)
Reference Frames:
DALIS Project's Digital Orthophotos, Road Center Line, Townships, Municipalities, and Subdivision coverages.
USGS's Railroad
and Hydrographic (Polygon and Arc) Coverages
There were 2206 selected matched pairs, and ultimately
1318 points were selected after quality control. Most rejected pairs
were from hydrographic area where finding matching pairs is difficult even
for humans because the natural shape is irregular, the map generalization
process has been applied excessively, and riverside could be changed by natural
erosion.
To provide visualization for map conflation, ArcView
is used. Quality control processing for verifying matched pairs and editing
unresolved features were implemented in ArcView and some Avenue programs were
added to help processing. Other operations are implemented by C++ programs.
USGS data were only used for visual aid in QC process. A
logical design of the conflation process is provided.
To show the results of each process, the clickable diagram
is provided. Click on any interesting step in the following figure. The related
figures will be shown.
Figure 1. Clickable diagram for map conflation
The proposed map conflation system offers an automated
approach but some human interaction (visual verification of machine-proposed
matched pairs and some manual editing) is needed at QC and final stage, respectively.
However, this approach reduced types of editing such as line stretching,
line translation, and line rotation to only line stretching.
4. Conclusions and Future Research
In this paper, the conflation stages for the Delaware county map conflation project were shown. Fully automated solution was attempted, but due to hard problems, a small amount of human interaction was necessary in the proposed system. This system's operations consist of automatically detecting match pairs, applying quality control, triangulation, piecewise-linear rubber-sheeting, linear feature matching, and finally quality control for unsolved features. This system provides many benefits to county level GIS- and mapping mangers by drastically reducing human interaction in map data integration. Even though this system provided reliable results for Ohio's Delaware county, more research will be needed to find solutions that can be even more fully automated. Planned future research will investigate the following:
- How to optimize the number of control points (balance the number of control points between oversampling and undersampling area).
- How to use image processing to find interesting points or features on orthophotos.
- How to weight piecewise linear transformations.
- How to do more advanced linear feature matching.
5. References
[1] Alan Saalfeld, 1993, Conflation: Automated Map Conflation,
Center for Automation Research,
CAR-TR-670, (CS-TR-3066), University of Maryland, College Park.
[2] Shoreh Elhamin and Hoseok Kang, 2000, Lessons Learned : Addressing
And LUCA in Delaware County, Ohio, ESRI User Conference 2000.
also appeared in URISA 2000 conference.
[3] Alan Saalfeld, 1985, A Fast Rubber-Sheeting Transformation
Using SimplicalCoordinates, The
American Cartographer, Vol.12, No. 2, pp.169-173.
[4] Maureen P. Lynch and Alan Saalfeld, 1985, Conflation: Automated Map
Compilation - A Video Game Approach, AUTOCARTO 7 Proceedings. pp. 343- 352.
[5] Alan Saalfeld, 1987, Joint Triangulations and Triangulation Maps, Proceedings
of the 3rd Annual ACM Symposium on Computational Geometry, Waterloo,
Ontario, Canada, pp. 195-204.
[6] Kokichi Sugiharai, 1999, Surface Interpolation Based on New Local Coordinates,
Computer-Aided Design,pp. 51-58.