Corkboard

Mapping political connections with web automation, AI and Obsidian
COMES WITH DATA
Deep-dives
Author

Lukas Kikuchi, Sean Greaves

Published

July 31, 2025

TL;DR

Researching connections between named entities quickly becomes overwhelming as the number of entities increases.

To address this, we built a pipeline that combines web scraping with LLMs to identify websites where a meaningful connection between two entities is likely.

🚨 Disclaimer: The pipeline surfaces potentially relevant connections from online sources; each connection should then be independently verified.

For this demo, we ran the pipeline on 858 House of Lords peers and 34 neoliberal, libertarian, or alt-right institutions cited in Quinn Slobodian’s Hayek’s Bastards to uncover any meaningful links between them. The results of this curated web search are archived within an Obsidian vault, which we make free to download (~1MB), open and explore.

🔎 If you have your own list of entities to research and would like help, please reach out: adu@autonomy.work

The internet has made one-man intelligence agencies of all of us. Armed with some names and too much free time, digging around and connecting the dots is a relatively frictionless if time-consuming experience. On occasion, you might find something interesting. Yet it is common to run into hard scaling limits when attempting to map the connections between entities within large networks.

What do we mean by networks? Almost anything can be represented as a network. You could make a network by mapping connections between politicians and offshore companies, topics and writers, locations and events etc.

In this blog, we focus on a single use case:

Given a list of entities, such as people or organisations, are any of them meaningfully connected to one another, and if so, how?

In a world where everything is connected if you look hard enough, what do we mean by meaningfully connected? Well, if prompting an LLM we might say something like this:

A connection is meaningful when it reflects significant cooperation, influence, shared control, or material advantage, for example:

  • Ownership stakes, major investments, or funding flows
  • Leadership, board, advisory, or employment roles that link the entities
  • Contractual partnerships, joint ventures, or co‑development deals
  • Lobbying, advocacy, or coordination aimed at affecting policy or regulation
  • Legal proceedings or rulings directly involving both parties
  • Operational integration (e.g., supply‑chain dependencies, technology licensing)
  • Any other clear, substantive relationship beyond mere co‑mention or coincidence

However even a well crafted prompt won’t coax today’s LLMs into finding every meaningful tie within a large roster of names. Some providers have equipped their models with the ability to run web searches to patch their blind spots, but a few dozen queries barely dent the problem. If we wanted to gather context from the web on every possible relationship that could exist within a set of 200 entities, we would need to run 19,900 web searches. This of course doesn’t even account for the multiple naming conventions for a single entity which would significantly increase the number of searches we’d need to execute.

Thankfully automating tens of thousands of web searches is relatively cheap and fast to run nowadays, perhaps in minutes or hours. In our case, the real challenge lies in sifting through the search results for specific evidence of a meaningful connection. Even search results containing exact‑string matches for both entities often prove irrelevant due to spurious co-occurrence or named entity disambiguation:

Spurious Co-occurrence

Both names show up in the same source, but not in relation to one another. They might sit in two unrelated articles that merely share a webpage or appear in a long list of people. In such cases, the co‑mention does not satisfy our “meaningful connection” criteria.

Named Entity Disambiguation

A single name can point to multiple real‑world actors, so we need extra context to tell which one the text refers to e.g. David Cameron the Yale political scientist versus David Cameron the former UK prime minister.

In our experience, named entity disambiguation is the more difficult challenge to solve at scale. Accurately distinguishing between two entities with the same name often requires consulting multiple sources and, at times, drawing on prior knowledge or domain expertise.

By contrast, filtering out cases of spurious co‑occurrence is generally more straightforward to automate. From a workflow perspective, an LLM only needs to assess the relationship between two entities within a single text; content that can typically be included in its entirety (or in key excerpts) within a single prompt.

Combined with asynchronous processing, it becomes feasible to filter thousands of websites in minutes, at costs in the range of tens to hundreds of pounds. This process identifies a smaller subset of websites where a meaningful connection between the two entities appears in the main body of text. The more complex task of resolving named entity disambiguation can then be handled by a human analyst, working only on this reduced set.

To illustrate the potential for this use-case, we developed a pipeline to perform this task and devised a case-study to test drive it.

Hayek’s Bastards and the House of Lords

What connections exist between the House of Lords and some of the institutions pivotal in cultivating today’s alt‑right politics from strands of neoliberalism?

We set about identifying meaningful connections between 858 House of Lords peers and a selection of institutions from Quinn Slobodian’s recent book, Hayek’s Bastards. Why would we do this and who are these bastards?

Hayek’s Bastards offers a particular history of today’s alt‑right that suggests it did not spring up in opposition to neoliberalism but rather mutated from its core, evolving into a strain obsessed with racial hierarchy and hereditarian ideas of intelligence. Slobodian charts this evolution through a web of neoliberal think tanks, lobby groups and billionaire benefactors, chiefly in the United States, who re‑engineered the market absolutism of Friedrich Hayek and Ludwig von Mises into a politics of exclusion. Except for outliers such as Britain’s Institute of Economic Affairs, most of these actors are American, prompting us to ask how far their influence and networks extend within the UK’s political elite.

The following institutions from Hayek’s Bastards were chosen to cross reference against the set of peers from the House of Lords:

Name Description Country
American Enterprise Institute centre-right think-tank US
American Immigration Control Foundation immigration reduction campaign group US
American Renaissance white supremacist magazine US
Atlas Network libertarian NGO US
Bradley Foundation conservative funder US
British Eugenics Society non-profit learned society UK
Carthage Foundation right-of-centre funder US
Cato Institute libertarian think-tank US
Center for Libertarian Studies libertarian anarcho-capitalist non-profit US
Competitive Enterprise Institute libertarian think-tank US
Conservative Political Action Conference political conference US
Eigentümlich Frei new-right publisher Germany
Federation for American Immigration Reform anti-immigration non-profit US
Friedrich Hayek Society non-profit members association Germany
Heritage Foundation conservative think-tank US
Hoover Institution public policy think-tank US
Human Diversity Foundation race science publisher US
Institute of Economic Affairs right-wing free market think-tank UK
Institute of Humane Studies non-profit promoting liberalism US
Institut für Staatspolitik new-right think tank Germany
Junge Freiheit conservative nationalist newspaper Germany
John Randolph Club paleoconservative think-tank US
Mises Institute non-profit promoting Austrian economics US
Manhattan Institute conservative think-tank US
Mankind Quarterly journal UK/US
Mercatus Center libertarian, free-market-oriented think tank US
Mont Pèlerin Society liberal academic society US
Noontide Press far-right publisher US
Pioneer Fund race science funder US
Property and Freedom Society anarcho-capitalist political organisation Turkey
Quarterly Journal of Austrian Economics peer-reviewed academic journal US
Rockford Institute conservative think-tank US
Washington Summit Publishers white nationalist publisher US
VDARE far right website US

Methodology

Our pipeline implements the following steps.

1. Assemble Source Lists

The list of current House of Lords peers was sourced from They Work for You whilst the list of institutions from Hayek’s Bastards was sourced from reading the book.

2. Expand & Normalise Names

To capture every relevant hit, we expanded each entities’s name to all common variants. Searching “Lord Cameron of Chipping Norton” for instance, surfaces different results than “David Cameron”. For every entity we ran a quick web query (e.g., "Lord Cameron of Chipping Norton" peer conservative "House of Lords"), passed the snippets to an LLM, and let it return the full alias list e.g. “David William Donald Cameron”, “David W. D. Cameron” etc. This single step increased our original 858 peers into 3,459 distinct names, dramatically widening the search space.

3. Construct Search Queries

We paired every peer alias with every institution name, producing 152,592 unique search strings. For example, to probe links between Lord Frost and the Heritage Foundation we searched "Lord Frost" "Heritage Foundation".

4. Execute Web Searches

Running the 152,592 queries produced roughly 100K hits, yet a hit count tells us nothing about the strength or nature of any link. Google snippets give only a headline, URL, and a few teaser lines. This is usually far too thin to judge a real connection. Take these results:

  1. Last Exit to Freedom? Britain After Brexit and the Future of …
    Lord Frost began life as a professional diplomat but entered the … Secondary Navigation. Sign-up for weekly texts from The Heritage Foundation …

  2. Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe …
    Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe-wide customs scheme – as it happened … Heritage Foundation Photograph: …

  3. Project 2025: Tory Candidates Have Ties to Group Drafting …
    Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe-wide customs scheme – as it happened … Heritage Foundation Photograph: …

All three mention both names, yet none clarifies how (or even whether) they relate. To verify any relationship we clearly need to scrape the full page content for every result that cited both entities.

5. Retrieve & Filter Pages

Not every page will surrender its text. Paywalls, logins, and anti‑bot defences routinely shut scrapers out. For instance, the most common domain within the search results was Scribd, a website for sharing documents that requires an account and offers no public API. After testing the most popular domains, we attempted to scrape the 32,601 URLs that seemed at least partly accessible. From those we successfully pulled usable text from 22,279 pages.

Data Release

As we were unable to retrieve relevant text from the majority of search results, we are releasing this data for any curious researchers to explore.

📂 Download edges.zip (~13MB)

6. Craft LLM Prompts

The key obstacle when prompting LLMs is context control; the model needs enough surrounding text to judge whether two entities are truly connected, but not so much that the prompt becomes unnecessarily costly or cluttered. With shorter texts we can simply include the entire article within a prompt. However web‑scraped pages within our dataset range from a six‑word job title to sprawling, million‑word tomes that mention our targets only in passing. Feeding all of that to the LLM is both inefficient and unfocused.

We therefore devised a scalable prompt‑building strategy that distills each page to the minimum context required for reasonably reliable inference, regardless of length.

To keep prompts both focused and economical, we rely on a lightweight three‑snippet strategy, always passing the model the following sections from each article:

  1. Lead paragraph: usually the first 100–150 words to orient the model on the subject matter and tone.

  2. Entity‑A window: a symmetric slice of text (± W words) centred on the first mention of entity A.

  3. Entity‑B window: the same‑sized slice around the first mention of entity B.

The window width W balances cost and context; smaller windows save tokens but may miss clues whilst larger ones capture more evidence at higher cost. Because the three snippets are concatenated in order (lead → A → B), the model receives a compact narrative arc that mirrors how a human reader might be introduced to these entities.

To decide which chunks to keep, we first set a global token budget G as the maximum number of tokens we can spend on context. With G fixed, the following table shows how we choose what to send to the LLM:

Scenario Decision rule Context sent to the LLM
Short article Article length ≤ G Entire article
Entities close together (early) Offset from start to later first‑mention ≤ G One continuous chunk from start to entity mentions
Entities far apart Offset from start to later first‑mention > G If entity A is near the top  →  2 chunks: combined (lead + A‑window) and B‑window
Otherwise  →  3 chunks: lead, A‑window, B‑window

Anchoring every decision to G (total allowance) and W (per‑entity context) improves the likelyhood that each prompt supplies just enough relevant information without overrunning token limits.

7. Infer Entity Connections

After the LLM flagged passages that it judged as depicting a meaningful link between two entities, we extracted relevant named entities and suggested relevant entities from those passages and matched each one to its canonical Wikipedia entry. This grounding step standardised the names and anchored our graph to reliable, publicly maintained IDs.

8. Archive Results in Obsidian

All curated data was archived to an Obsidian vault, providing an interactive workspace where you can browse search hits and navigate the entity graph.

Data Release

To explore the dataset, you must first download Obsidian, download and unzip the dataset below and then follow the instructions to create a vault from an existing folder (in this case the folder you just unzipped).

📂 Download corkboard_vault.zip (~1MB)

Every hit comes with an LLM‑generated synopsis and a summary receipt that records the model used to generate the synopsis, the fraction of the article included within the prompt, and whether this fraction of text was supplied as a single continuous block or split into windows. This metadata gives researchers an additional measure to judge the reliability of each synopsis before deciding to open the original page. Each synopsis should be interpreted only as a guide to what may exist on the webpage and absolutely not as fact as it was generated by an LLM. Similarly the content of each webpage should be carefully judged as to whether of not the information is reliable:

Single search result in Obsidian with synopsis

After manually removing links that appeared to refer to other persons or organisations, we identified potentially meaningful connections between 20 organisations from Hayek’s Bastards and 152 peers from the House of Lords.

Obsidian graph view of connections between nodes
Most Cited Organisations Connections
Institute of Economic Affairs 95
Heritage Foundation 42
American Enterprise Institute 24
Atlas Network 23
Cato Institute 20
Most Cited Peers Connections
Lord Hannan of Kingsclere 11
Lord Elliott of Mickle Fell 10
Lord Moynihan of Chelsea 9
Lord Gove 8
Lord Hintze 7