Comparing Algorithms for Generating Ensembles to Detecting Gerrymandering

Jonas Eichenlaub, 2022, Bowdoin College
Faculty Mentors: Jack O’Brien (Bowdoin), Gregory Herschlag (Duke), Jonathan Mattingly (Duke)

[What follows is a short summary of work done as a summer research project in 2021. The longer report can be found here.]

Gerrymandering is a pervasive issue in American politics and over the last decade mathematicians have made notable contributions towards its detection and avoidance. One focus of these efforts has been using computers to generate an ensemble of potential legislative maps and then comparing a real map against this distribution of maps. Vote counts from recent elections can then be plugged into both the real and simulated maps to understand whether the number of districts a given party wins under the real map is statistically likely under the distribution of generated maps. In other words, if one party wins far more seats in the real map than the mean number of seats they win in the simulations, it is likely the real map is the result of gerrymandering. Since gerrymandering is often hard to precisely define, it is easy for policymakers to circumvent any one metric of how gerrymandered a district is; the strength of this technique is that it does not rely on a single metric, instead wholistically comparing the real map with the ensemble of possible maps.

A prevalent way of creating such an ensemble of simulated maps is with a recombination (or ReCom) algorithm, which involves encoding a state as a graph with voting precincts as nodes. The graph is partitioned into districts according to constraints and preferences within the algorithm to make sure that the edges that are cut to create connected components (partitions or districts) with roughly equal populations, minimal municipality splitting, and whatever other legal criteria legislative maps must meet. By cutting different edges of the original graph, different maps are formed, which collectively create an ensemble of possible maps. Duke Professors Jonathan Mattingly and Gregory Herschlag have proposed formalizing such an approach by placing a recombination algorithm into a Metropolis-Hastings Markov Chain, where the proposal of a new map is separated from the acceptance of that proposal into the ensemble. Instead of making decisions about how maps are added to the ensemble based on the structure of the algorithm, Mattingly and Herschlag’s work applies the existing mathematical theory on Metropolis-Hastings chains to justify their approach. Each map represents a stage in the chain, and new maps are proposed with the merge-split technique described above. Maps are then added to the chain based on an acceptance probability that quantifies how well the proposed map fits within the legal criteria, which is designed to be changeable to meet the specific policy parameters of different states. By making the choices within the algorithm explicit, Mattingly and Herschlag offer an approach to detecting gerrymandering that is more defensible when used as evidence in legal challenges to gerrymandered maps.
Part of their contribution is to sample maps from possible partitions of a state rather than from possible spanning trees of the graph of the state, since each partition of a state can be formed by a different number of spanning tree configurations. My research this summer focused on determining whether sampling from partitions generates different ensembles of maps than that from the spanning tree method; more specifically, I investigated whether algorithms that sample from the space of spanning trees are less likely to generate maps that draw district lines along the border between areas with high and low population density (i.e., along the borders of cities). I ran a version of Prof. Mattingly and Herschlag’s algorithm that sampled from the partition space and a version that sampled from the spanning tree space on a variety of “test states” that I constructed. I represented each test state simply as a lattice graph, with a few nodes on the lattice replaced with smaller, denser lattice graphs representing cities. The “city” nodes along the edge of the denser lattice all had edges connecting to the same neighboring “noncity” node, which meant that drawing a district line between these two areas required cutting far more edges than drawing it between other sections of the graph (which normally would only require one cut); the motivating idea behind my investigation was that the spanning tree algorithm views this as costlier and is less likely to select maps with district lines the run between the city and noncity nodes. I varied the placement, size, and number of these “cities” to gain insight into how often each of the two versions of the algorithm cut the edges between the city and noncity nodes.

Notably, I created a test city that was a 9×9 grid, with the center node replaced by a 4×4 city. I ran both versions of the algorithm on this test state with 3, 4, 6, 8, and 12 districts. For each of the two ensembles, I then count the number of times that edges of the graph that connect a node in a city to another node fell on a legislative district border across all maps in that ensemble; this number is then divided by a tally of the total number of edges that fell on district borders. This statistic is the fraction of edges that cross district borders that go from city to anywhere that looks at the proportion of edges that are cut which connect one city node to either a noncity node or another city node. The difference between the results of this statistic from running the two algorithms on each scenario is reported in the table below, with positive results indicating it is higher under the gamma equals one algorithm.

Number of Districts346812
Ideal District Pop320002400016000120008000
DeltaCity-to-anywhere0.136720.287270.065280.062110.02969

I found that the algorithm that samples uniformly from spanning trees is less likely to make districts that have edges along the border of cities, but only under certain situations. The two main factors that determine whether such bias appears in an ensemble are the district population size and the city population size. When the city population fits perfectly within a single or multiple districts, as is the case when there are 4 districts in the above test state, the spanning tree algorithm is significantly more likely to create maps that do not break up cities compared to the partition-space algorithm; hence, a hidden bias in the original merge-split approach is evident. However, if the city population does not fit evenly into one or more districts, as is the case when there are 6, 8, or 12 districts, both algorithms are forced to break up the city and there is not a notable difference between the ensembles they generate. These results reinforce why it is important to use Mattingly and Herschlag’s approach, as their formalized algorithm does not automatically keep municipalities within the same district in a hidden and uncontrollable manner. Instead, it allows for policymakers to explicitly decide how much they want to preserve cities by incorporating this preference into the acceptance probability. Given that the redistricting process from the 2020 Census is currently underway, this insight could find application in the upcoming legal challenges to the inevitably gerrymandered maps that politicians propose.