Country for Old Men

For 140 years the oldest inaugurated U.S. President was "Old Granny" William Henry Harrison, age 68.

It turns out this was too old for 1841, and he succumbed to a bacterial infection from the White House water supply his first month in office.

Doctors misdiagnosed him with pneumonia, which people misattributed his long, cold inauguration, and the White House kept up the tradition of pumping its water from downstream the local poop swamp, poisoning first families for another 20 years.

Frankly, I'm not sure doctors back then were very smart, and their whimsical approach to medicine (not to mention hygeine) probably killed more patients than many diseases. As McHugh and Mackowiak (2014) note, the laudenum Harrison's physician administered "might have converted a serious illness into a fatal one."

After holding for nearly a century and a half, Harrison's record has been broken only four times (including all of the last three elections, if you include the president-elect).

fig 0: Record-breaking inaugurations

Assuming he's still alive, Trump will be 79 in January 2025, beating Biden's record (79 in 2021) by 159 days.

N.B. This doesn't count presidents who aged into the oldest president record. By that metric Harrison's record was broken by Buchanan in 1860.

N.B. This chart code assumes there are no leap years and ever year is exactly 365.24 days long. Don't worry about it.

Modern medicine has ushered in a golden era of elderly presidents. Lots of ink has been spilled on this phenomenon, but it's almost all on the so what? side with nothing on the how come?

Our friend Harrison's story suggests the reason it took 200 years to get a 70-year-old president is simply medical, and as medicine got better at preserving the elderly a wave of old presidents became inevitable. Since Harrison's time the average American life expectancy has approximately doubled; is that sufficient to fully explain the phenomenon? This could be called the biological theory of old presidents; it is our null hypothesis.

The alternative is a political theory of old presidents and might go something like this: there is less individual agency in politics nowadays, so it's no longer absurd to fill seats with mentally incapacitated party apparatchicks. The specifics are out of scope here; all that matters is that something sociological is happening too.

Hints

Let's start with a simple visualization. Is there an obvious trend in how old new presidents are?

fig 1: Years since inauguration: before, during, and after holding office (and in one case between terms)

Wikidata SPARQL Query

A query for everyone who held the position (P39) of US President (Q11696), with some extra properties. Wikidata Link

 
SELECT ?person ?personLabel ?birth ?death ?termN ?start ?end WHERE {
    ?person wdt:P569 ?birth;
            p:P39 ?term.
    OPTIONAL { ?person wdt:P570 ?death. }
    ?term ps:P39 wd:Q11696;
          pq:P580 ?start;
          pq:P1545 ?termN.
    OPTIONAL { ?term pq:P582 ?end. }
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?person

First impressions: none of the trends look very monotonic. The age-at-inauguration started around 60 and trended younger for a while. Lots of founding fathers also had pretty long retirements compared to their successors. After the Civil War the blue and purple bars seem to trend upward, with larger gains around the 1970's.

fig 2: Term & death timeline, with linear regressions

N.B. The purple "Death" curve does not count people who are still alive

The inauguration age \(f_1\) trends up, but not by much, and not nearly as much as the overall lifespan \(f_2\). It's hard to look closer without better data:

Only counting dead presidents in the lifespan line \(f_2\) excludes a statistically significant population that is old and still alive. This is right censoring.
The youngest dead president (H.W. Bush) was born in 1924. Therefore we've excluded a full 100 years of people (the century with the best actuarial data!)
A V-shape has emerged: after Harrison, inauguration ages go down for a while before increasing again. Retirements also seem shorter during this period. I expect lifespan trends to be mostly monotonic, so the Vs make me think we're measuring a lot of noise, and possibly some specific cultural shifts (perhaps voters with a memory of Harrison preferred younger candidates).
If you stop the count before Reagan broke Harrison's record, the slope of the first-order linear regression becomes \( f'_{1}(x) = -0.0813 \cdot x + 77.4 \), meaning the average lifetime of inaugurated U.S. president actually shrank over the first 200 years. Either our small dataset is misleading us or we should RETVRN to Fecal Irrigation.
Since 1900 term lengths appear to be getting longer.

Looks like 45 data points over 250 years is not nearly enough.

More Offices

Broadening the scope of the query to include other federal offices gives us more significant data. We can focus only on the age of a politician while they're in office, and since each datapoint is now an average the curves are smoother.

I'm using wikidata because I really like SPARQL and the immense scope makes it easy to tinker with different queries. However there are mistakes. I've corrected some manually (mostly the obvious outliers) and most of my queries have sanity checking to exclude bogus rows but some errors remain, particularly in obscure figures from over a century ago. For our purposes this is good enough.

fig 3: Average age per year per branch (w/ linear regressions)

N.B.

Regressions are done on yearly averages. One person's impact is weighted by the number of years they're in office, and their per-year contribution is weighted by how many days they were in that office that year.

The N values include overlap (e.g. many senators also served in the house).

If the background color makes the colors hard to discern, open the image directly to see it with a plain white background. This applies to all graphs on this page.

This makes the trends clearer. You can see strong increase as a specific congress ages, and declines gradually as they retire or sharply in some election years.

Clearly the major age changes happen in bursts, probably due to specific political events that are hard to generalize. This is interesting (and merits further inquiry), but it doesn't answer our main question. Just because the age in the senate was level in the 50's and grew rapidly in the 80's doesn't mean it doesn't track population trends in the long run. The proximal causes of a spike are certainly political, but in the long run the age may have increased anyway.

We need a long-term lifespan metric to compare these politician curves to.

Lifetime data

A more rigorous way to compare life expectancy across groups is with a Survival function \( S(t) \), which estimates the probability a member of a group will live to age \( t \). We'll approximate it as \( \hat{S} \) with the Kaplan-Meier Estimator, used frequently for medical trials (published in the most-cited statistics paper).

The lifelines python package provides an implementation, including a logrank test function. It also computes the median survival time (\( \hat{S} ( MST ) = .5 \)), the "half life" of a population.

For example, fig 4: Survival functions of US Senators, segmented by 40-year generations. Although people are categorized by the start of their first term, the survival functions measure time-to-death, not time-to-retirement. The logrank test is used to compare consecutive generations (p < 0.05 means two survival functions are distinct).

fig 4.1: US Representatives

fig 4.2: US Presidents ( low sample size )

N.B. Note the high uncertainty intervals lifelines generates because the sample sizes are so small.

fig 4.3: US Mayors ( starting 1650 )

N.B. Note the dates go back to the 17th century and I've used 60-year generations instead. The colors do not correlate to the same time periods as the other graphs.

A few things to note about these graphs:

The x axis starts at age 35; the graph is uninteresting before that. Lines that don't converge nicely to \(y = 0\) mostly suggest much of the population is still alive, but sometimes also signify the dataset contains some members who are missing a death date on wikipedia and eluded my vetting.
There's very little clear difference between the 17^th, 18^th and early 19^th centuries. Almost all the gains were made in the 20^th century (when germ theory had really taken off).

Ivan Illich on this "medical watershed":

The year 1913 marks a watershed in the history of modern medicine. Around that year a patient began to have more than a fifty-fifty chance that a graduate of a medical school would provide him with a specifically effective treatment (if, of course, he was suffering from one of the standard diseases recognized by the medical science of the time). Many shamans and herb doctors familiar with local diseases and remedies and trusted by their clients had always had equal or better results.
Ivan Illich, Conviviality, 1973, page 1. No citation provided.
The confidence interval mostly reflects sample size: fig 4.2: US Presidents has huge margins of error because of its tiny N's, and the tail end of the last generation is shaky and uncertain because much of that population is still alive. I don't think this is useful information and it makes lines harder to distinguish, so I will omit it from survival charts below.
The curves in later genertions have a steeper tail. Clearly we've made a lot of progress in keeping octogenarian politicians alive.

This is how we'll quantify the lifespan of populations.

The Joe Blow Index

Unfortunately we can't just compare presidents to the average American lifetime. The set of politicians excludes everyone who died too young to be elected, which introduces a significant bias. Child mortality (< 5 y/o) in the 18^th century was close to 50%; this has a significant effect on the average lifespan (~40 then) but no effect on the politician statistics. If we had actuarial tables from back then this would be much easier.

Instead we'll pull lifetime data from other, similar occupations for comparison, building a dataset of regular, "white-collar" Average Joes. We can use Wikidata again to bulk-request Americans with specific occupations organized by birth and filter by lifespan.

The most popular occupations are: politician, lawyer, writer, baseball player, painter, actor, judge, and businessperson. This is hand-wavey, but we want a group that is demographically comparable: similar gender breakdown, similar class representation, similar geographical distribution, similar lifestyle. We won't include baseball players since athletes could plausibly live either longer or shorter than white-collar workers. We won't include politicians and judges, who are mostly already represented by the \(S_{politician}\) curve we're measuring. These professions are more heavily modern-weighted than politicians, but this is fine since we'll break them into generations before comparing.

The mix I chose to represent "normal" people who are probably demographically and lifestyle-wise similar to politicians is painters, journalists, writers, lawyers, actors, university teachers, historians, and business people. The breakdown is here:

fig 5: "White Collar" Americans born after 1700, who lived to at least 35. wikidata query

The set of US politicians includes presidents and their cabinets, congress, mayors, and state legislators: 46k individuals in total.

Seems like enough. Now we can divide them into populations based on their birth year and derive survival functions.

fig 6: Politicians vs Normals, Survival function comparison across 50-year generation

fig 6-1: Politicians vs Normals, Median Survival Times and logrank test results:

population	politician MST	normal person MST	p
1750-1800	70.52	70.28	0.99
1800-1850	71.80	72.25	0.01
1850-1900	73.85	75.17	<0.005
1900-1950	86.46	84.96	<0.005
1950-2000			<0.005

This looks definitive. In the 18^th century politicians had an essentially indistinguishable lifespan from normal "white-collar" Americans. Over time this has changed. Although average Joes are clearly living longer, our lifespans haven't increased commensurately with politicians. The divide is clearly growing. The last (well, the latest, time will tell if it's the last) generation of politicians has a clear advantage in longevity. A whopping 90% of them live past 75. Incredible.

Caveats

There are some limitations to my approximations.

First, there's a fundamental issue with using wikidata for lifetime approximation: by definition, it includes people of a certain notability, and this I think introduces a bias toward people who lived longer and had more resources. Our Joe Blow Index then must be an overestimate of average lifespans, meaning the difference between politicians and the rest of us should be greater than what I've measured.

The gender breakdown in Congress also has an effect. Women live significantly longer than men, so as Congress increasingly includes women its life expectancy goes up. Across these generations the politicians set has gone from 100% to 77% male; a comparable effect appears in our joe blows, who have gone from 97% to 61%. So although the greater number of women accounts for some of the lifespan increase across generations, it does not contribute to the difference between politicians and normals; if we limited our Joe Blows to only include as many women as are in the politician set, the difference would not decrease.

The occupations in the Joe Blow Index are probably less demanding and less likely to be the target of assassinations. Once again this advantages the workers, and suggests that if anything I've under-approximated the true disparity.

Retirement

All the survival functions so far have measured time-to-death, and the disparity above reflects lifespans. Part of the spirit of the original inquiry concerns the ages of politicians in office, and we can apply some of these tools to retirement age as well.

Wikidata has lots of birth and death data for Congresspeople but is often missing term properties like start and end dates. This dataset is much more complete; it also includes wikidata entity ids and divides its data between current and historical legislators. Joining them gives us enough information to derive survival functions for each congress:

fig 7: Longevity per Congress

N.B.

Median survival times are calculated for each population that is half retired/deceased; averages are only included when the entire population is retired/deceased.

Whether a politician is retired is determined by which congress-legislators dataset they're in. Their date of retirement is the end of the last term they were in office. (This is why almost all median retirement times are (close to) multiples of 2 years, since the majority of members served some number of full 2- or 6-year terms.) Whether they retired voluntarily, died in office, or were voted out is not considered.

Averages almost always exceed medians because of the outliers who survive or remain in office for a very long time.

This illustrates that time spent in Congress is increasing. But in contrast with the lifetime curves, it's clear that most of the lifespan gains are spent retired. This is visible for presidents in figure 1.

For more insight on the "Average Time to Retirement" line, see Congressional Careers: Service Tenure and Patterns of Member Service, 1789-2023 (2023) from the Congressional Research Service, in particular Figure 1 . They attribute the rise to a post-Civil War "professionalization" of Congress and an increase in the rate politicians run for and win reelection.

posts / Country for Old Men