Disease Cluster Detection Methods: The Impact of Choice of Shape on the Power of Statistical Tests

Geoffrey H. Smith
Department of Geography
University of Iowa
Email: geoffrey-smith@uiowa.edu
Abstract
To effectively detect disease clusters, spatial analysis methods should identify clusters when they exist and reject them when they are false. The methods should as often as possible identify true clusters and not identify false clusters. Recent advances in cluster detection methods have relied on circles as the basic shape for these analyses. These methods have been tested on both actual and synthetic data, but not on a credible data set consisting of a non-circular cluster. The objective is to see if cluster detection methods based on circles can also detect such non-circular clusters.  To test the ability of these methods to detect true clusters and not detect false ones, a synthetic data set was constructed.  In a region with a non-uniform distribution of 8,689 people at-risk, 72 deaths were simulated both inside a sinuous cluster (1,020 people at-risk in a 0.5-mile buffer of a transportation route) as well as outside the cluster (the remaining 7,669 people at-risk outside the buffer).  Three methods for identifying disease clusters were tested on this data: Openshaw's GAM/K, Kulldorff and Nagarwalla's Spatial Scan Statistic, and Rushton and Lolonis' significance map. I will report the results of each method with respect to three key criteria:  sensitivity, specificity, and positive predictive value.  Because no method successfully identified the sinuous cluster, new methods for detecting non-circular clusters will be discussed.

Introduction

Geographers and other investigators of public health make maps of mortality and disease incidence for a multitude of reasons. They may be interested simply in the extent of disease found in the community, or perhaps they are interested in linkages between disease incidence and certain environmental factors. They also may be interested in the locations of areas of high disease incidence and low disease incidence. These areas of high incidence are sometimes called "clusters" or "hot spots." A cluster is an area with an excess number of observed cases relative to the expected number of cases (the number of cases one would expect if the people in the area had the same risk as a reference population). One of the problems encountered in detecting these clusters is that when the shape of analysis does not match the shape of the cluster, background noise will be included in the analysis and will make the cluster's presence harder to detect.

One observation of many cluster detection methods that detect clusters in heterogeneous populations is that so far they have relied on circles to detect clusters, and their power to detect non-circular clusters may well be lower than their power to detect circular clusters (Kulldorff and Nagarwalla 1995, Openshaw et al. 1999(2), Rushton and Lolonis 1996, Wartenberg and Greenberg 1990)? The theoretical likelihood of a cluster being roughly circular may be quite low. Some hypothetical examples of non-circular clusters include a cluster underneath a plume of smoke which follows the prevailing wind direction from a factory or a cluster that follows the path of a river, highway, or watershed (Abler et al. 1971, Chakraborty and Armstrong 1995, Gould 1993, Tango 2000).

Cluster detection methods have previously been tested on actual data, whose underlying clusters are of unknown shape (Fotheringham and Zhan, Hill et al. 2000, Kulldorff and Nagarwalla 1995), and synthetic data, whose underlying clusters are usually circular (Alexander and Boyle 1996, Tango 2000). For example, Alexander and Boyle (1996) generated synthetic clusters based on a certain number of "parents" whose related cases were dispersed in a circular pattern. However, some tests of statistical power on sinuous clusters have been reported. Tango (2000) has tested cluster detection methods on a sinuous cluster, albeit with a high relative risk (RR = 5). The purpose of this paper is to investigate how well presently available circular cluster detection methods detect non-circular clusters and to propose an improved method for detecting non-circular clusters.

Background

Cluster detection methods are designed to distinguish "true" clusters from "false," or chance, clusters. When investigating clusters, the investigator should be concerned with the cluster detection method's Type I and Type II error rate as well as its statistical power. Making Type I errors means reporting "false" clusters and unnecessarily alarming the public. Making Type II errors means missing true clusters and possibly endangering the public's health. These rates are inversely related such that the higher the probability of making a Type I error, the higher the power and the less the probability of making a Type II error. A useful way to compare cluster detection methods and other exploratory data analysis methods is by their susceptibility to either Type I or Type II errors. Biostatisticians and epidemiologists frequently use summary statistics such as sensitivity, specificity, and positive predictive value to evaluate screening and diagnostic tests (Figure 1) (Neutra et al. 1992, Gordis 1996). Sensitivity is the proportion of true positives that a screening test can successfully detect. Specificity is the proportion of true negatives that a screening test can successfully detect, and positive predictive value is the proportion of true positives out of all positive results (Figure 1). Any screening or diagnostic test must have a high sensitivity and specificity to be valid, but investigators should be aware of the relative costs between Type I and Type II errors. The investigator can vary the Type I and Type II error rates by varying the sensitivity, specificity, and positive predictive value of the cluster detection method.
 
 
"Truth"
Condition Present Condition not Present
Screening Test Positive tp
(true positive)
fp
(false positive)
PPV
tp/(tp+fp)
Screening Test Negative fn
(false negative)
tn
(true negative)
NPV
tn(fn+tn)
Sensitivity
tp/(tp+fn)
Specificity
tn/(fp+tn)
Figure 1: Evaluating Screening and Diagnostic Tests (after Gordis 1996)

Comparison of the Results of Cluster Detection Methods on Synthetic Data Sets

The best way to measure the susceptibility of cluster detection methods to Type I and Type II errors is to test these methods on synthetic data since the locations of the actual clusters are known and can be assessed. One must measure true positives, false positives, false negatives, and true negatives in order to test the susceptibility of cluster detection methods to Type I and Type II errors. In the early 1990's the International Agency for Research on Cancer (IARC) tested several cluster detection methods on fifty such fabricated data sets with a known number of circular clusters in each data set. The study incorporated two quadrat count methods (Information & Statistics Division of the National Health Service of Scotland (ISD) and Potthoff-Whittinghill) and three "distance-based" methods (GAM/K, Cuzick-Edwards, and Besag & Newell). Because the IARC utilized synthetic data sets with known "true" clusters, the editors of the study were able to assess the validity of the distance-based cluster detection methods. Although the Besag-Newell method had the highest sensitivity (92%), this method also had many false positives, and its positive predictive value was only 36% (Table 1). GAM-K had a sensitivity of 80% and a positive predictive value of 87% whereas Cuzick-Edwards had a sensitivity of 42% and a positive predictive value of 66%. GAM/K also had the lowest "distance to the 'parents'" or center of the clusters, which means the centers of the clusters that GAM/K detected were closest to the "true clusters, in the non-random simulated data sets (6.42 km compared with 32.9 km for Besag-Newell and 15.8 km for Cuzick-Edwards)." The mean distance for Besag-Newell was large because of the many false positives this method reported. Alexander and Boyle noted that GAM/K had trouble identifying clusters that were centered less than 5 km from "built-up" or urbanized areas (13/29 detected). However, of the five methods compared (distance and quadrat count methods), GAM/K was the only one to have a sensitivity of over 40% for detecting clusters in data sets with 10-18 cases.

Table 1: Results of the International Agency for Research on Cancer's Comparisons of Distance-Based Cluster Detection Methods
Method Sensitivity Positive Predictive Value Average Distance to Parents (km)
Besag-Newell 92% 36% 32.9
Cuzick-Edwards 42% 66% 15.8
GAM-K 80% 87% 6.42
Source: Alexander and Boyle 1996

Cuzick and Edwards' method uses the k nearest neighbors to compare cases to randomly selected controls from the population data and did not perform well (Cuzick and Edwards 1996). The Geographic Analysis Machine (GAM) maps clusters of data whose observed incidence is significantly greater than the expected value. GAM measures the observed incidence of disease in circles of increasing radii centered on a grid overlay. The data to be analyzed can consist of either multiple points representing the geocoded street addresses of individuals at risk or small areas such as post code areas or census enumeration districts mapped on an x-y coordinate system. GAM counts the frequency of observed cases for a series of circles with increasing radii by a number of steps determined by the user and then performs a statistical test (Poisson, bootstrapping, or Monte Carlo simulations) at a significance level determined by the user to compare the observed counts to the expected number of cases. The strength of the resulting clusters is indicated by Zmax, a measurement of "excess" (observed - expected) (Openshaw et al. 1999 (2)). The results of Zmax are interpolated with an Epanechnikov kernel to create a density surface using the circle radius as the bandwidth. The investigator can choose a cluster detection rule (Zmax > x) in order to vary the Type I and Type II error rates to which GAM/K and all other cluster detection methods are susceptible.

The editors of the IARC study felt that distance-based methods may be worth performing after a global test has indicated the presence of clustering somewhere in the study region. Despite the variability in the distance-based methods' sensitivity and positive predictive value, the editors of the study failed to advocate one single method over the others (Alexander and Boyle 1996(2)). Openshaw (1996b) declared that global test results are unstable and depend on "size of the study region, location of study region boundaries, the nature of the clustering process, and the scale of analysis" (163).

Since the IARC study, researchers at the Center for Computational Geography (CCG) at Leeds, UK, have constructed various GAM derivatives to address more complicated cluster detection problems. These include GAM/K-T, which explores temporal as well as spatial permutations, MAPEXplorer (MAPEX) which uses genetic algorithms to detect clusters, and Geographical Data Miner (GDM/1), an expansion of MAPEX which handles event characteristics and includes GIS coverages (Openshaw et al. 1999(1)). These four methods were tested on synthetic data with circular clusters generated by a similar method to that used by the IARC (Alexander et al. 1996). The results of these studies, shown in Table 2, indicate that MAPEX performed better than the other three methods although the authors did not indicate how close the detected clusters were to the synthetic clusters. The IARC study compared the validity of three different distance-based cluster detection methods by measuring their sensitivities and positive predictive values on synthetic data sets. The CCG compared the validity of four different GAM derivatives by measuring their sensitivities on synthetic data sets generated by the same procedure used in the IARC study.

Table 2: Results of CCG Comparison of 4 Cluster Detection Methods on Synthetic Data Sets
 
Method Sensitivity
GAM/K 64.81%
GAM/K-T 59.26%
MAPEX 77.78%
GDM1 55.56%
Source: Modified from Openshaw et al. 1999(1)

The Problem of Multiple Comparisons

The most frequently cited criticism of GAM by statisticians is its sensitivity to multiple testing or multiple comparisons (Kulldorff and Nagarwalla 1995). Because of the multitude of hypotheses tested, 100(a)% of the hypotheses tested are potentially false positive and are likely to be labeled "significant" by chance alone. Responding to criticisms of GAM's susceptibility to multiple testing problems, Openshaw et al. (1999 (2)) stated that GAM was not designed to be a confirmatory tool, only an exploratory one which generates potential clusters. Moreover, the multiple tests performed by GAM are not independent because the circles overlap. Openshaw also offered a geographical argument: if the clusters are not randomly located but are in fact located in plausible locations, then the presence of these clusters might not result from chance. For example, what is the probability of a small area near a putative environmental hazard having a disease rate x times the regional average? If a cluster detection method finds a cluster near a putative environmental hazard, and the rate in that cluster is greater than those generated by Monte Carlo simulations or according to some hypothesis test, then the cluster may not be spurious and chance may not be the only factor in the genesis of that cluster. However, to avoid circular reasoning, the focused test cannot be based solely on the data tested by the cluster detection method!   Some other a priori knowledge must be implemented for this subsequent test.

Turnbull et al. (1990) modified GAM's algorithm by replacing the circles with increasing radii with spatially adaptive filters-circles whose radii depend on population counts. The circles in each calculation all have the same population but different radii depending on the local population density (Turnbull et al. 1990, Talbot et al. 2000). Kulldorff and Nagarwalla (1995) assert that Bonferroni type procedures, which adjust for multiple comparison procedures and which cannot be used with GAM, can be used with Turnbull's program as long as the population size is held constant. Seeing a need for a "unique test statistic" to "assess quantitatively the overall significance of the results" of a cluster detection test like GAM or that of Turnbull et al, Kulldorff and Nagarwalla (1995) developed a likelihood ratio test constructed on ideas generated by Turnbull and Openshaw. The "likelihood ratio" quantifies the relative strength of rare disease clusters. Kulldorff's spatial scan statistic performs a limited number of significance tests: it tests the null hypothesis against the alternative hypothesis that the most likely, the second most likely, and the third most likely circular clusters are not due to chance.

Kulldorff compared the performance of his program with those of Openshaw and Turnbull on the same data set of actual leukemia cases in upstate New York. Openshaw found four clusters whose significance were not tested, Turnbull found one significant cluster, and Kulldorff found two significant clusters. Kulldorff's major criticism of Openshaw's methods is that he cannot test for significance or "likelihood" because he cannot use Bonferroni methods to mitigate the effects of multiple testing (Kulldorff and Nagarwalla 1995). The results of GAM, Turnbull et al, and Kulldorff and Nagarwalla on the Upstate New York cancer incidence are unsatisfactory in that sensitivity, specificity, and positive predictive value cannot be computed without knowing which clusters are "true" and which are "false." Furthermore, because real data was used, we have no knowledge of the shapes of the actual clusters. The spatial scan statistic has recently been compared with Tango's method (Tango 2000) on synthetic data, both circular and sinuous. Although Tango did not report the spatial scan statistic's performance on this sinuous cluster, he did note that "Kulldorff's scan test tends to identify, as the most likely cluster, a much larger cluster than expected from the observed disease map by absorbing neighbouring regions with non-elevated risks of disease occurrence. Of course, it depends on the distances and the spatial relations among clusters."

A Comparison of Three Methods for Detecting a Sinuous Cluster

The IARC study compared the validity of three different distance-based cluster detection methods by measuring their sensitivities and predictive values positive on synthetic data sets. The CCG compared the validity of four different GAM derivatives by measuring their sensitivities on synthetic data sets generated by the same procedure used in the IARC study. However, to the author's knowledge, GAM/K and Kulldorff's spatial scan statistic have never been compared on synthetic data and thus their comparable validity has not been assessed. Furthermore, none of these cluster detection methods has been tested on non-circular clusters although Tango (2000) compared his method with the spatial scan statistic on a group of three clusters, one of which was sinuous but had an exceedingly high relative risk (RR = 5). Our intention is to compare the power of GAM/K, Kulldorff's spatial scan statistic, and Rushton and Lolonis' (1996) significance map on a simulated sinuous cluster with a lower relative risk (RR = 2). I intend to compare Tango's method with the other three methods at a later date.  The locations of deaths and populations are analyzed by three different cluster detection methods-the spatial scan statistic, GAM/K and the significance map. The validity of the screening tests are compared by measuring each test's sensitivity, specificity, and positive predictive value. Because the spatial scan statistic is designed to test for the likelihood of the most likely circular cluster, we do not expect it to have high sensitivity, specificity, and positive predictive value. Although GAM/K and the significance map use circles to measure counts of observed and expected, they can still possibly have reasonable values of sensitivity, specificity, and positive predictive value

Our cluster model follows the hot spot model described by Wartenberg and Greenberg (1990) wherein a subregion of the study area has a greater relative risk than the remainder of the study area. In our case the relative risk is set at two, and the hot spot is a buffer of a sinuous linear feature in the study area. For the data set, the population at-risk consisted of individual births in Polk County, Iowa, during 1993 and 1994 (n=8,689) (Figure 2). To simulate a sinuous hot spot with a relative risk of 2, we choose the Union Pacific (Iowa Interstate R.R. Ltd.) railroad to be the health hazard and create a buffer with an arbitrary width of one-half mile. The buffer contains 1,020, or 11.7%, of the at-risk population within its boundaries. Our hot spot model simulates infant mortality in Des Moines, Iowa in 1993-94, when 72 infants died, for a rate of 8.3/1,000. 72 individual deaths are generated from the 8,689 individuals at-risk following a modification of the uniform probability distribution such that each person at risk outside the hot spot has a weight, or risk, of 1 and those inside the hot spot have a weight of 2.


Figure 2: At-Risk Population with Sinuous Health Hazard

Sensitivity is measured as the population at risk detected by the significance test at a given significance level that is actually in the hot spot (true positives (tp)) as a percentage of all of the population at risk that is in the hot spot (true positives plus false negatives (tp + fn)) (Figure 1). The specificity is measured as the population at risk not detected by the significance test (true negatives (tn)) at a given significance level as a percentage of all of the population at risk that is not in the hot spot (false positives plus true negatives (fp + tn)). The positive predictive value is measured as the population at risk detected by the significance test at a given significance level that is actually in the hot spot (true positives (tp)) as a percentage of all of the population at risk that is detected by the significance test (true positives plus false positives (tp + fp)).

The Synthetic Hot Spot

Although the total number of deaths for the entire study area was pre-determined, the locations of the deaths were not. Sixteen individual deaths occurred within the hot spot (rate = 15.7/1,000, expected = 8.466, p < 0.01) and 56 individual deaths occurred outside the hot spot (rate = 7.3/1,000, expected = 63.653, p = 0.186) for an overall rate of 8.3/1,000 (Figure 3).A density distribution of disease rates can be computed when the data set consists of individual deaths represented as geocoded point data derived from a population at risk also represented as geocoded point data. A 0.4 mile grid was superimposed upon the deaths and the population at risk, and overlapping circular filters of 0.8 mile radii were generated following Rushton and Lolonis (1996) using DMAP, a freely available software package that computes disease rates for input into a GIS (Figure 4, University of Iowa 1997). A disease rate for each point in the 0.4 mile grid that had at least 50 people at risk was calculated.

Figure 3:  Synthetic Deaths
 


Figure 4: The regular lattice grid and the spatial filter areas used to measure mortality rates in the study area

An investigator can interpolate point data on a surface using a variety of spatial interpolation algorithms (Burrough 1986). Because these algorithms differ, the resulting map will be slightly different depending on which algorithm is chosen. In this case the resulting grid was interpolated by using a Triangular Irregular Network (TIN) to create a Digital Elevation Model (DEM) in a commercially available GIS (Figure 5, Caliper Corporation 1994). Although this map shows areas of high rates, these rates must be tested to see how likely chance processes could generate similar results.


Figure 5: Geographical Distribution of Disease Rates Defined by the 0.8 Mile Filter

Spatial Scan Statistic

The Spatial Scan Statistic (Kulldorff and Nagarwalla 1995) detected no clusters at the 0.05 significance level, but true positives were identified at higher significance levels (Table 3). The locations of the three clusters are shown in Figure 6.

Table 3: Results of Spatial Scan Statistic
 
Cluster p value Radius (ft) tp fp fn tn Sensitivity Specificity PPV
A 0.119 460 0 7 1020 7662 0.00% 99.91% 0.00%
B 0.148 98 2 7 1018 7662 0.20% 99.91% 22.22%
C 0.995 295 9 7 1011 7662 0.88% 99.91% 56/25%

Notes:  tp (true positive), fp (false positive), fn (false negative), tn (true negative)


Figure 6: Results of Spatial Scan Statistic
Note: All clusters have radii of less than 500 feet.

The spatial scan statistic did not reject the null hypothesis of the existence of one circular cluster. As Tango (2000) points out, if there are actually many small clusters in the study area, the spatial scan statistic will detect one large cluster which encompasses the small clusters and those areas outside the clusters which do not have elevated risk. In this hypothetical case, the scan statistic would have a high sensitivity but a low specificity and a low positive predictive value. However, in these results, the spatial scan statistic never achieved a high sensitivity because of the shape of the actual cluster and the low relative risk of the actual cluster.

GAM/K Results

Next, the data set was analyzed with Openshaw's Geographical Analysis Machine (GAM/K). Following user instructions, GAM/K searched for clusters with radii between 500 and 2000 meters, and it ignored circles with a population at-risk of less than 50. Sequential Monte Carlo testing was used to test for significance although Poisson probability and the bootstrap method were other available options. The output of this analysis consisted of a 500 x 500 raster grid of Zmax scores ranging from 0 to 0.253. Because GAM/K is a form of exploratory spatial analysis, methods of representing the results are discretionary, and the investigator can adjust the display of the results based on Zmax (Openshaw 1996a). For example, if a cluster identification rule of Zmax  greater than or equal to 0.145 is applied, the specificity would be 100%, and the positive predictive value would be 100%, but the sensitivity of the test would be only 5.12% (Figures 7 and 8(A), Table 4). Sensitivity, specificity, and positive predictive value are measured by classifying the population at-risk using the same method used for the spatial scan statistic results. The investigator can change the cluster identification rule according to the relative costs of Type I and Type II errors. To achieve maximum sensitivity and positive predictive value the investigator must use the Zmax where the sensitivity and positive predictive value curves in Figure 7 intersect (Zmax greater than or equal 0.0075). A Zmax greater than or equal 0.0075 identification rule results in a sensitivity of 38.81%, a specificity of 91.58%, and a positive predictive value of 38.10% (Table 4, Figure 8(B)). To achieve maximum sensitivity and specificity, the investigator must use the Zmax where the sensitivity and specificity curves in Figure 7 intersect (Zmax greater than or equal 0.000001). A Zmax greater than or equal 0.000001 identification rule results in a sensitivity of 65.4%, a specificity of 69.10%, and a positive predictive value of 22.03% (Table 4, Figure 8(D)).

Figure 7: Selected Results of Varying Zmax of GAM/K
 
Table 4: Results of Selected Zmax levels of GAM/K
Zmax
true pos
false pos
false neg
true neg
sensitivity
specificity
PPV
0.246340 4 0 1011 7672 0.39% 100.00% 100.00%
0.145277 52 0 963 7672 5.12% 100.00% 100.00%
0.06948 93 6 922 7666 9.16% 99.92% 93.94%
0.031582 137 38 878 7634 13.50% 99.50% 78.29%
0.012633 193 267 822 7405 19.01% 96.52% 41.96%
0.007500 397 645 626 7019 38.81% 91.58% 38.10%
0.003790 488 1068 535 6596 47.70% 86.06% 31.36%
0.000632 649 1500 374 6164 63.44% 80.43% 30.20%
0.000001 669 2368 354 5296 65.40% 69.10% 22.03%
0.000000 1023 7664 0 0 100.00% 0.00% 11.78%


Figure 8:  Results of GAM/K at selected levels of Zmax

Significance map

Next, the significance of the rates (0.4 mile grid and 0.8 mile filter) was tested against 1000 Monte Carlo simulations. These results were interpolated using a Triangular Irregular Network (TIN) to create a Digital Elevation Model (DEM), the same method used for mapping the rates (Figure 5). Because testing the rates against 1000 simulations is, like GAM/K, also a form of exploratory spatial analysis, methods of representing the results are also discretionary, and the investigator can adjust the results based on level of significance. Sensitivity, specificity, and positive predictive value are measured by classifying the population at-risk using the same method used for the spatial scan statistic and GAM/K results. For example, if a significance level of 95% were applied, then the specificity would be 99.84%, and the positive predictive value would be 92%, but the sensitivity of the test would be only 13.53% (Figure 9, Table 5, Figure 10(A)). To achieve maximum sensitivity and positive predictive value the investigator must use the significance level where the sensitivity and positive predictive value curves in Figure 9 intersect, which is a significance level of 77.5%. A significance level of 77.5% results in a sensitivity of 38.81%, a specificity of 91.58%, and a positive predictive value of 38.10% (Table 5, Figure 10(C)). Similarly, to achieve maximum sensitivity and specificity, the investigator must use the significance level where the lines in Figure 9 cross, which is a significance level of 45%. A significance level of 45% results in a sensitivity of 64.80%, a specificity of 63.58%, and a positive predictive value of 19.14% (Table 5, Figure 10(D)).

Figure 9: Results of Varying Significance of 1,000 Monte Carlo Simulations for Rates of 0.8 mile Spatial Filter
 
 
Table 5: Selected Results of Varying Significance of 1,000 Monte Carlo Simulations for Rates of 0.8 mile Spatial Filter
Significance Level
true pos
false pos
false neg
true neg
sensitivity
specificity
PPV
95%(A) 138 12 882 7657 13.53% 99.84% 92.00%
85% 255 221 765 7448 25.00% 97.12% 53.57%
77.5%(C) 359 663 661 7006 35.20% 91.35% 35.13%
60% 559 1716 461 5953 54.80% 77.62% 24.57%
55% 598 1986 422 5683 58.63% 74.10% 23.14%
45%(D) 661 2793 359 4876 64.80% 63.58% 19.14%
35% 702 3901 318 3768 68.82% 49.13% 15.25%
25% 763 5324 257 2345 74.80% 30.58% 12.53%
15% 959 6633 61 1036 94.02% 13.51% 12.63%
5% 1019 7131 1 538 99.90% 7.02% 12.50%


Figure 10:  Results of Significance Map (0.8 Mile Filter) at Selected Levels of Significance

Conclusions

By varying the Zmax level of the GAM/K results and the significance level of the results from spatial scan statistic and the significance map, the investigator can affect the Type I and Type II error rates. If the investigator is concerned more with Type I errors than Type II errors, then a maximum positive predictive value should be attempted (Table 6) If the investigator wants to achieve a balance between Type I and Type II errors, then a compromise between maximum specificity and maximum positive predictive value (Table 7, Figure 11) or between maximum specificity and maximum sensitivity (Table 8) is suggested. When attempting to maximize sensitivity and positive predictive value, GAM/K has the highest sensitivity (38.81%), but the spatial scan statistic has the highest specificity (99.1%) and PPV (56.25%) (Table 7). The measurements of the significance map method are close to those of GAM/K and the differences may be too small to be significant. The significance map and GAM/K results are sensitive to the choice of interpolation method and should differ slightly if different interpolation algorithms are implemented . The spatial scan statistic's sensitivity in this study was unacceptably low. These results are most likely a result of the hypothesis it tests: the presence of one circular cluster in the study area. I propose to test Turnbull's method (1990) on the same cluster to test the hypothesis that the spatial scan statistic's poor results were a result of the hypothesis it tests and not of its basic method.
 
 
Table 6: Comparison of the Three Cluster Detection Methods: Maximize Only Positive Predictive Value
Test
level
tn
fp
fn
tn
sensitivity
specificity
PPV
SaTScan
0.995
9
7
1011
7662
0.88%
99.91%
56.25%
GAM/K 0.1453 52 0 963 7672 5.12% 100.00% 100.00%
Signficiance Map 95% 138 12 882 7657 13.53% 99.84% 92.00%

 
Table 7: Comparison of the Three Cluster Detection Methods: Maximize Sensitivity and Positive Predictive Value
Test
level
tn
fp
fn
tn
sensitivity
specificity
PPV
SaTScan
0.995
9
7
1011
7662
0.88%
99.91%
56.25%
GAM/K
.007500
397 645 626 7019 38.81% 91.58% 38.10%
Signficiance Map 77.5% 359 663 661 7006 35.20% 91.35% 35.13%


Figure 11:  Comparison of GAM/K and Signficiance Map:  Maximize Sensitivity and Positive Predictive Value
 
 
Table 8: Comparison of the Three Cluster Detection Methods: Maximize Sensitivity and Specificity
Test
level
tn
fp
fn
tn
sensitivity
specificity
PPV
SaTScan
0.995
9
7
1011
7662
0.88%
99.91%
56.25%
GAM/K .000001 669 2368 354 5296 65.40% 69.10% 22.03%
Signficiance Map 45% 661 2793 359 4876 64.80% 63.58% 19.14%

Compared with the IARC results, none of the three methods in this study performed as well on this sinuous cluster as GAM/K did on circular clusters (sensitivity 80%, PPV 87%, Alexander and Boyle 1996(2)). The circular clusters in the IARC study were modeled on cancer, which is a rarer disease than infant mortality, the disease on which the cluster in this study was modeled.  At present, the spatial scan statistic has an unsatisfactory sensitivity (0.88%) when attempting to detect a sinuous cluster, and GAM/K and the spatial filter method have low positive predictive values when attempting to detect a sinuous cluster (Tables 7 and 8).

The spatial filter method's results may improve if the filter were spatially adaptive (Silverman 1986). In practice, how does one create an a priori conceivable risk region? One should not commit to one particular size or shape of filter . For example, the filters could be ellipses. The investigator could vary the length of the major and minor axes as well as the rotation of the elliptical filter (Figure 12). Although this process is more computationally demanding than using circular filters, certain ellipse rotations and sizes would capture an area that would more closely reflect the area of a sinuous cluster. Similar variations on a rectangle might also prove fruitful.


Figure 12: Possible orientation and sizes of modifiable spatial filter

Would a researcher have been able to detect the sinuous cluster using one or a combination of the Spatial Scan statistic, GAM/K or the significance map but without prior knowledge of the cluster?    For example, even if a researcher had no a priori knowledge that the railroad may have been a hazard, the results from the significance map at the 95% and 90% significance levels may have been sufficient to warrant further tests (Figure 10), as would the results from GAM/K at Zmax = 0.0075 (Figure 8). Conceivably, would a researcher looking at the results of the significance map at the 90% significance level (Figure 10(B)) have noticed that four out of the five potential clusters are within one mile of the railroad line?  A subsequent test must be carefully constructed so as not to use "gerrymandered" regions. Throwing caution and good sense in the wind, one could construct a non-rational risk region which would be found statistically significant in a subsequent focused test based on the results of GAM/K or the significance map alone.   However, one can test the rates in rational risk regions, such as spatially adaptive filters.  One could use rational regions of varying shape or size such as watersheds or other logical linear features in the study area such as buffers of highways (Figure 13, Table 9).  One could then test how often would the rate in the filter occur by chance according to the null hypothesis through Monte Carlo simulations.  Accordingly we propose a hierarchical form of analysis in which exploratory analysis leads to hypothesized  hazards.  Subsequently, focused tests, such as Diggle's method (1990), are performed on the hypothesized hazards and not solely on the original data.


Figure 13: Spatially Adaptive Linear Features
 
Table 9: Results of Spatially Adaptive Features
Area Population Rate (per 1,000) Expected Observed Poisson
Probability
Hot spot 1020 15.7 8.466 16
< 0.01
Non-hot spot 7669 7.3 63.653 56 0.186
I-235 buffer 1406 10.0 11.651 14 0.197
River buffer 642 10.9 5.320 7 0.169
Neighborhood A 1856 9.2 15.379 17 0.284
Study Area 8689 8.3 72 72  

If the researcher tested the significance of the observed cases in a one-mile buffer of the railroad, she would find significance at the 0.05 level (Table 10). Similarly, the researcher could have performed a variety of focused tests on buffers of the railroad. However, these alpha levels of these tests would have to be adjusted for multiple comparisons.
 
Table 10: Results of Variable Railroad Filters
Filter Size (mi) Population Rate (per 1,000) Expected Observed Probability
0.5 1020 15.7 8.466 16
< 0.01
0.8 1604 14.3 13.291 23
< 0.01
1.0 2087 12.4 17.294 26 0.018
1.5 3206 10.9 26.566 35 0.047
Study Area
8689 8.3 72 72  

Cluster detection tests that use circles as their shapes of analysis can detect sinuous clusters if the tests are used as exploratory analysis instead of as confirmatory hypothesis testing. Researchers can then construct and test hypotheses based on hypothesized hazards by using focused cluster detection tests on the hypothesized regions. The significance of this paper is to report the sensitivity, specificity, and positive predictive value of circle-based cluster detection methods on a sinuous cluster. Shapes of analysis (filters) will have higher power if they coincide with the shapes and sizes of potential clusters. Methods should not be dependent on shape, but perhaps some shapes might prove more successful than others in detecting certain types of clusters.

References

Abler, R., Adams J.S., and Gould P. 1971. Spatial organization; the geographer's view of the world. Englewood Cliffs, N.J.: Prentice-Hall.

Alexander, F.E. and Boyle, P. 1996 (1). Introduction. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Alexander, F.E. and Boyle, P. 1996 (2). Overview of Results. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Alexander, F.E. and Boyle, P. 1996 (3). Editorial Comments. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Alexander, F.E. and Cuzick, J. 1992. Methods for the assessment of disease clusters. In Geographical and Environmental Epidemiology: Methods for Small-Area Studies, ed. P. Elliott, J. Cuzick, D. English, and R. Stern. Oxford: Oxford University Press.

Alexander, F. E., Williams, J., Maisonneuve, P., Boyle, P. 1996. The simulated data-sets. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Barker DJ, Osmond C. 1986. Infant mortality, childhood nutrition, and ischaemic heart disease in England and Wales. Lancet. May 10;1(8489):1077-81.

Besag, J. and Newell, J. 1991. The detection of clusters in rare diseases. Journal of the Royal Statistical Society, Series A 154(1): 143-155.

Boyle, P., Walker, A.M., and Alexander, F.E. 1996. Historical aspects of leukaemia clusters. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Burrough, P. A. 1986. Principles of geographical information systems for land resources assessment. Oxford: Clarendon Press.

Caliper Corporation. TransCAD, Newton, MA, 1994.

Chakraborty, J, and Armstrong, M P. 1995. A Composite Plume Approach to Assessing Community Vulnerability to Hazardous Material Accidents. Proceedings of the Urban and Regional Information Systems Association Annual Conference (Washington D. C.), Vol. 1, 249-261.

Choynowski, M. 1959. Maps based on probabilities. Journal of the American Statistical Association. 54: 385-388.

Cliff A. D., and Haggett. P. 1996. The impact of GIS on epidemiological mapping and modelling. in P. Longley and M. Batty, Spatial Analysis: Modelling in a GIS Environment. Cambridge, John Wiley & Sons., pp. 321-343.

Cuzick, J. and Edwards, R. 1996.Clustering methods based on k nearest neighbour distributions. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Diggle, P.J. 1990. A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point, Journal of the Royal Statistical Society, 153:349-362.

Gordis, L. 1996. Epidemiology. Philadelphia: W.B. Saunders.

Gould, P. 1993. The slow plague : a geography of the AIDS pandemic. Oxford: Blackwell Publishers.

Hill, E.G., Ding, L., and Waller, L.A. 2000. A comparison of three tests to detect general clustering of a rare disease in Santa Clara County, California. Statistics in Medicine 19: 1363-1378

Kulldorff, M. and Nagarwalla, N. 1995. Spatial disease clusters: detection and inference. Statistics in Medicine 14: 799-810.

Mayer, J.D. 1982. Relations between two traditions of medical geography: health systems planning and geographical epidemiology. Progress in Human Geography 216-230.

Neutra, R., Swan, S. and Mack, T.  1992.  Clusters galore:  insights about environmental clusters from probability theory.  The Science of the Total Environment.  127:  187-200.

Openshaw, S., Charlton, M., Craft, A.W., Birch, J.M. 1988. Investigation of leukaemia clusters by use of a geographical analysis machine. Lancet 272-273.

Openshaw, S. 1996a. Using a geographical analysis machine to detect the presence of spatial clustering and the location of clusters in synthetic data. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Openshaw, S. 1996b. Editorial Comments: The GAM-K. In Methods for Investigating Localized Clustering of Disease, ed. F.E. Alexander and P. Boyle. Lyon, France: International Agency for Research on Cancer.

Openshaw S., Turner A., Turton I., Macgill J., 1999 (1). Testing space-time and more complex hyperspace geographical analysis tools. Online at <http://www.ccg.leeds.ac.uk/sm art/hyper.html>

Openshaw, S., Turton, I., Macgill, J. 1999 (2). Using the geographical analysis machine to analyze limiting long-term illness census data. Geographical & Environmental Modeling 3 (1): 83-99.

Rosner, B. 1995. Fundamentals of Biostatistics, 4th ed. Belmont, CA: Duxbury Press.

Rothman, K.J. 1990. Keynote Presentation: A Sobering Start for the Cluster Busters' Conference. American Journal of Epidemiology 132 (Supp.): S6-S13.

Rushton, G. and Lolonis P. 1996. Exploratory Spatial Analysis of Birth Defect Rates in an Urban Population. Statistics in Medicine 15: 717-726.

Silverman, B.W., 1986. Density estimation for statistics and data analysis. New York: Chapman and Hall

Sun, Y. 2000. Template Shapes and Crime "Hot Spots" in Buffalo - A GIS Approach. Paper presented at the Association of American Geographers Annual Meeting, Pittsburgh, PA.

Tagnon I, Blot WJ, Stroube RB, Day NE, Morris LE, Peace BB, Fraumeni JF Jr. Mesothelioma associated with the shipbuilding industry in coastal Virginia. Cancer Research 1980;40:3875-9.

Tango, T. 2000. A test for spatial disease clustering adjusted for multiple testing. Statistics in Medicine 19: 191-204.

Turnbull B.W., Iwano E.J., Burnett W.S., Howe H. L., and Clark L.C., 1990. Monitoring for clusters of disease: application to leukemia incidence in upstate New York. American Journal of Epidemiology 132: S136-143.

University of Iowa. 1997. Demo: Spatial Analysis of Health Data. <http://www.uiowa.edu/~geog /health/index11.html>

Wartenberg D. and Greenberg, M. 1990. Detecting disease clusters: the importance of statistical power. American Journal of Epidemiology. 132: S156-166.