Comparing Algorithms for Generating Ensembles to Detecting Gerrymandering

Jonas Eichenlaub, 2022, Bowdoin College
Faculty Mentors: Jack O’Brien (Bowdoin), Gregory Herschlag (Duke), Jonathan Mattingly (Duke)

[What follows is a short summary of work done as a summer research project in 2021. The longer report can be found here.]

Gerrymandering is a pervasive issue in American politics and over the last decade mathematicians have made notable contributions towards its detection and avoidance. One focus of these efforts has been using computers to generate an ensemble of potential legislative maps and then comparing a real map against this distribution of maps. Vote counts from recent elections can then be plugged into both the real and simulated maps to understand whether the number of districts a given party wins under the real map is statistically likely under the distribution of generated maps. In other words, if one party wins far more seats in the real map than the mean number of seats they win in the simulations, it is likely the real map is the result of gerrymandering. Since gerrymandering is often hard to precisely define, it is easy for policymakers to circumvent any one metric of how gerrymandered a district is; the strength of this technique is that it does not rely on a single metric, instead wholistically comparing the real map with the ensemble of possible maps.

A prevalent way of creating such an ensemble of simulated maps is with a recombination (or ReCom) algorithm, which involves encoding a state as a graph with voting precincts as nodes. The graph is partitioned into districts according to constraints and preferences within the algorithm to make sure that the edges that are cut to create connected components (partitions or districts) with roughly equal populations, minimal municipality splitting, and whatever other legal criteria legislative maps must meet. By cutting different edges of the original graph, different maps are formed, which collectively create an ensemble of possible maps. Duke Professors Jonathan Mattingly and Gregory Herschlag have proposed formalizing such an approach by placing a recombination algorithm into a Metropolis-Hastings Markov Chain, where the proposal of a new map is separated from the acceptance of that proposal into the ensemble. Instead of making decisions about how maps are added to the ensemble based on the structure of the algorithm, Mattingly and Herschlag’s work applies the existing mathematical theory on Metropolis-Hastings chains to justify their approach. Each map represents a stage in the chain, and new maps are proposed with the merge-split technique described above. Maps are then added to the chain based on an acceptance probability that quantifies how well the proposed map fits within the legal criteria, which is designed to be changeable to meet the specific policy parameters of different states. By making the choices within the algorithm explicit, Mattingly and Herschlag offer an approach to detecting gerrymandering that is more defensible when used as evidence in legal challenges to gerrymandered maps.
Part of their contribution is to sample maps from possible partitions of a state rather than from possible spanning trees of the graph of the state, since each partition of a state can be formed by a different number of spanning tree configurations. My research this summer focused on determining whether sampling from partitions generates different ensembles of maps than that from the spanning tree method; more specifically, I investigated whether algorithms that sample from the space of spanning trees are less likely to generate maps that draw district lines along the border between areas with high and low population density (i.e., along the borders of cities). I ran a version of Prof. Mattingly and Herschlag’s algorithm that sampled from the partition space and a version that sampled from the spanning tree space on a variety of “test states” that I constructed. I represented each test state simply as a lattice graph, with a few nodes on the lattice replaced with smaller, denser lattice graphs representing cities. The “city” nodes along the edge of the denser lattice all had edges connecting to the same neighboring “noncity” node, which meant that drawing a district line between these two areas required cutting far more edges than drawing it between other sections of the graph (which normally would only require one cut); the motivating idea behind my investigation was that the spanning tree algorithm views this as costlier and is less likely to select maps with district lines the run between the city and noncity nodes. I varied the placement, size, and number of these “cities” to gain insight into how often each of the two versions of the algorithm cut the edges between the city and noncity nodes.

Notably, I created a test city that was a 9×9 grid, with the center node replaced by a 4×4 city. I ran both versions of the algorithm on this test state with 3, 4, 6, 8, and 12 districts. For each of the two ensembles, I then count the number of times that edges of the graph that connect a node in a city to another node fell on a legislative district border across all maps in that ensemble; this number is then divided by a tally of the total number of edges that fell on district borders. This statistic is the fraction of edges that cross district borders that go from city to anywhere that looks at the proportion of edges that are cut which connect one city node to either a noncity node or another city node. The difference between the results of this statistic from running the two algorithms on each scenario is reported in the table below, with positive results indicating it is higher under the gamma equals one algorithm.

Number of Districts346812
Ideal District Pop320002400016000120008000

I found that the algorithm that samples uniformly from spanning trees is less likely to make districts that have edges along the border of cities, but only under certain situations. The two main factors that determine whether such bias appears in an ensemble are the district population size and the city population size. When the city population fits perfectly within a single or multiple districts, as is the case when there are 4 districts in the above test state, the spanning tree algorithm is significantly more likely to create maps that do not break up cities compared to the partition-space algorithm; hence, a hidden bias in the original merge-split approach is evident. However, if the city population does not fit evenly into one or more districts, as is the case when there are 6, 8, or 12 districts, both algorithms are forced to break up the city and there is not a notable difference between the ensembles they generate. These results reinforce why it is important to use Mattingly and Herschlag’s approach, as their formalized algorithm does not automatically keep municipalities within the same district in a hidden and uncontrollable manner. Instead, it allows for policymakers to explicitly decide how much they want to preserve cities by incorporating this preference into the acceptance probability. Given that the redistricting process from the 2020 Census is currently underway, this insight could find application in the upcoming legal challenges to the inevitably gerrymandered maps that politicians propose.

Evaluating the Enacted Georgria Congressional Redistricting Plan

We analyze the 2022 enacted Georgia congressional plan. We compare this districting plan with an ensemble generated from non-partisan redistricting policies. We find that (i) the districts are atypically polarized which leads to highly non-responsive elections (typically electing 5 Democratic and 9 Republican representatives), and that (ii) Democrats have been packed in the urban Atlanta area of the state and cracked from the North-Eastern part of the state, in both cases reducing their overall voting power.

The details of our findings are available here.

Evaluating the Enacted North Carolina Redistricting Plans

We analyze the enacted North Carolina congressional plan (CST-13), the enacted North Carolina House General Assembly plan (SL 2021-175), and the currently proposed North Carolina Senate General Assembly plan (SST-13).

We compare these three districting plans with the ensembles we generated from non-partisan redistricting policies. See here for the congressional analysis and here for the general assembly analysis; these documents contain information on the non-partisan criteria considered along with the resulting ensemble of maps.

A comparison of the enacted CST-13 congressional map with a non-partisan ensemble of maps under a range of historic elections.

As an example, we look at the number of elected Democrats that would have occurred under a range of historic elections.  Under each election, we compare the ensemble of non-partisan maps with the enacted map.  The enacted congressional plan elects 4 Democratic representatives under a large range of statewide Democratic vote fractions.  This level of non-responsiveness is not seen in the ensemble of plans.

For detailed analysis, see our reports:

Nonpartisan Distribution on NC Congressional Plan using 2020 Census.

We have performed an analysis of the geopolitical landscape of the North Carolina State using the 2020 Census data. We provide a
summary plot derived from an explicit distribution on redistricting
plans. The distribution favors plans with compact districts which keep counties intact.

As the character of the votes considered swing from more Republican to more Democrat, we see a gradual increase in the number of seats won by the democratic party. This responsiveness to the changing opinion of the electorate is consistent with what has been observed when maps are drawn without partisan considerations; either by sampling or by bipartisan committees.

A more detailed analysis can be found in “Analysis of the Geopolitical Landscape for the 2020 North Carolina Congressional Districts.

The details of the distributions are given in the accompanying
document “Methods used to Analyze 2020 North Carolina State Congressional Redistricting Landscape.”

The Geopolitical Landscape of the North Carolina General Assembly

We have investigated two possible policies for the current redistricting cycle in North Carolina.  Both policies respect the county cluster rule, respect one-person-one-vote, and prioritize compact districting plans.  One of the policies preserves municipalities when redistricting and the other does not consider municipal preservation.  This is in response to the North Carolina Joint Redistricting Committee’s criteria in which they state they “may” consider.

We use these policies to generate two probability distributions (i.e. we quantify preferences between redistricting plans), and we reveal the typical partisan behavior of non-partisan maps drawn according to these policies.

We outline our methods in this document.

We then reveal the expected partisan behavior of the State House and State Senate under a variety of historic elections.

The expected behavior may be compared with the enacted and proposed plans.

We have also released the underlying data used to construct our figures in the above document.

Greg Herschlag and Jonathan Mattingly

North Carolina’s 2020 County Clusters for the General Assembly

As discussed in our previous post, county clusters are used in the North Carolina General Assembly districts to reduce splitting counties. With the release of the 2020 census data, we can now establish all possible county clusters.

We report our findings in a document written with Christopher Cooper (Western Carolina University), Blake Esselstyn (FrontWater LLC and Mapfigure Consulting), and Rebecca Tippett (Carolina Demography, UNC at Chapel Hill).

The document is linked HERE.

To summarize, we find that 36 of 50 state Senate districts will reside county clusters that are now completely determined.  The remaining districts reside in four regions of the state, each with two possible choices of county clusters.  When a county cluster only contains a single district, the district is the cluster.  In the Senate, ten of the 50 districts are determined via clustering.

In the state House, 107 of 120 districts will reside in county clusters that are completely determined.  The remaining districts reside in three regions of the state, each with two possible choices of county clusters.  Eleven of the 120 districts are determined via clustering.

We have also examined incumbency within the new clusters.  Unless incumbents change address, we find that 4 Senate districts must contain two incumbents (i.e. 4 districts will “double bunk” incumbents).  We also find that 5 House districts must contain two incumbents.

We stress that the county clustering process adds a constraint to the redistricting process.  Largely, the county clusters do not determine districts and there is still room for the legislature to draw either fair or gerrymandered maps.

Gregory Herschlag
Jonathan Mattingly

County Clustering: Looking towards the 2020 Census


The decennial redistricting cycle is an important moment in our democracy. Drawing districts can have a dramatic effect on how our elections are interpreted and the people’s ability to hold their representatives accountable.

Today, roughly 1 month before the 2020 Census is release, we and 3 other North Carolinians are releasing an analysis around the coming 2020 Census and how it will affect the county clusters used to draw the legislative districts for the N.C. State Legislature.

The study can be found HERE

For more information on what county clusters are and how they affect districting, see here and here.

We do not intend our analysis to have much predictive value.  A main takeaway from the study is that the clusters are very sensitive to fluctuations in the county populations; the official county populations will not be released until mid-august.  Nonetheless, it is clear that the current county clusters will change in the coming redistricting cycle and that there will likely be multiple options for which clusters we implement. We hope that our note will inform public discussion around this process.

Quoting from the study, the main takeaways are:

    1. We expect that county clusters based on the 2020 Census population will be very different from those used in the last decade.
    2. There is significant variation in the county clusters across the different population estimates for 2020. Hence, caution should be exercised when drawing conclusions from the specifics of the maps we have included. We note the few clusters which seem to be stable across the population estimates.
    3. Using the 2010 population figures we found 2 possible clusterings for the N.C. Senate. The number of clusterings for the 2020 population estimates ranged from 12 to 33.
    4. Using the 2010 population figures we found 4 possible clusterings for the N.C. House. The number of clusterings for the 2020 population estimates ranged from 2  to 16.

Some members of the study team will be releasing further commentary on what we have seen here during the coming month.  We expect to release a similar discussion once the official 2020 Census population figures are released.  Please watch this site and the following venues for further commentary:

Jonathan Mattingly
Gregory Herschlag

Guidance on the Ensemble Method for Analysing Redistricting Plans

[TLDR : See this PDF for guidance:  Ensemble Method Guidance ]

As the country gears up for the next redistricting cycle we have attempted to collect our thoughts and guidance on using the ensemble method to critique a particular redistricting plan. This summarizes our current thinking stretching back to 2013-2014 as one of the originators of the Ensemble Method; it reflects our experience gained through providing expert guidance, including live testimony, to the court in a number of cases.

The document begins with the following summary:

The ensemble of maps translates the specific redistricting design criteria into an understanding of what a redistricting plan should look like if the specified criteria are followed without ulterior motive. The criteria are formalized in a distribution on redistricting plans from which a sufficiently rich ensemble is drawn to capture the properties of the distribution. If the criteria are non-partisan then the ensemble gives a non-partisan normative collection against which other plans can be compared for their partisan and non-partisan properties.

The rest of the text can be found in this PDF document: Ensemble Method Guidance.

A slide deck on ensemble analysis, indices and county clusters

With the upcoming redistricting cycle, there are a number of questions about how ensembles can, will, and should be used to audit and restrict gerrymandering.

We have prepared a slide deck illuminating how the ensembles have been used to legally challenge plans during the last redistricting cycle.  We have also examined some of the new indices that are being put forward as measures of gerrymandering.  Under a certain set of votes, it is possible to “game” these indices, in the sense that plans can be typical of an ensemble according to multiple indices while still being gerrymanders.

Finally, for the North Carolina general assembly, we demonstrate the fragility and possible choices when it comes to county clustering.

The slides are available here.