Last Exit to Freedom? Britain After Brexit and the Future of …
Lord Frost began life as a professional diplomat but entered the … Secondary Navigation. Sign-up for weekly texts from The Heritage Foundation …Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe …
Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe-wide customs scheme – as it happened … Heritage Foundation Photograph: …Project 2025: Tory Candidates Have Ties to Group Drafting …
Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe-wide customs scheme – as it happened … Heritage Foundation Photograph: …
Researching connections between named entities quickly becomes overwhelming as the number of entities increases.
To address this, we built a pipeline that combines web scraping with LLMs to identify websites where a meaningful connection between two entities is likely.
🚨 Disclaimer: The pipeline surfaces potentially relevant connections from online sources; each connection should then be independently verified.
For this demo, we ran the pipeline on 858 House of Lords peers and 34 neoliberal, libertarian, or alt-right institutions cited in Quinn Slobodian’s Hayek’s Bastards to uncover any meaningful links between them. The results of this curated web search are archived within an Obsidian vault, which we make free to download (~1MB), open and explore.
🔎 If you have your own list of entities to research and would like help, please reach out: adu@autonomy.work
The internet has made one-man intelligence agencies of all of us. Armed with some names and too much free time, digging around and connecting the dots is a relatively frictionless if time-consuming experience. On occasion, you might find something interesting. Yet it is common to run into hard scaling limits when attempting to map the connections between entities within large networks.
What do we mean by networks? Almost anything can be represented as a network. You could make a network by mapping connections between politicians and offshore companies, topics and writers, locations and events etc.
In this blog, we focus on a single use case:
Given a list of entities, such as people or organisations, are any of them meaningfully connected to one another, and if so, how?
In a world where everything is connected if you look hard enough, what do we mean by meaningfully connected? Well, if prompting an LLM we might say something like this:
A connection is meaningful when it reflects significant cooperation, influence, shared control, or material advantage, for example:
- Ownership stakes, major investments, or funding flows
- Leadership, board, advisory, or employment roles that link the entities
- Contractual partnerships, joint ventures, or co‑development deals
- Lobbying, advocacy, or coordination aimed at affecting policy or regulation
- Legal proceedings or rulings directly involving both parties
- Operational integration (e.g., supply‑chain dependencies, technology licensing)
- Any other clear, substantive relationship beyond mere co‑mention or coincidence
However even a well crafted prompt won’t coax today’s LLMs into finding every meaningful tie within a large roster of names. Some providers have equipped their models with the ability to run web searches to patch their blind spots, but a few dozen queries barely dent the problem. If we wanted to gather context from the web on every possible relationship that could exist within a set of 200 entities, we would need to run 19,900 web searches. This of course doesn’t even account for the multiple naming conventions for a single entity which would significantly increase the number of searches we’d need to execute.
Thankfully automating tens of thousands of web searches is relatively cheap and fast to run nowadays, perhaps in minutes or hours. In our case, the real challenge lies in sifting through the search results for specific evidence of a meaningful connection. Even search results containing exact‑string matches for both entities often prove irrelevant due to spurious co-occurrence or named entity disambiguation:
Both names show up in the same source, but not in relation to one another. They might sit in two unrelated articles that merely share a webpage or appear in a long list of people. In such cases, the co‑mention does not satisfy our “meaningful connection” criteria.
A single name can point to multiple real‑world actors, so we need extra context to tell which one the text refers to e.g. David Cameron the Yale political scientist versus David Cameron the former UK prime minister.
In our experience, named entity disambiguation is the more difficult challenge to solve at scale. Accurately distinguishing between two entities with the same name often requires consulting multiple sources and, at times, drawing on prior knowledge or domain expertise.
By contrast, filtering out cases of spurious co‑occurrence is generally more straightforward to automate. From a workflow perspective, an LLM only needs to assess the relationship between two entities within a single text; content that can typically be included in its entirety (or in key excerpts) within a single prompt.
Combined with asynchronous processing, it becomes feasible to filter thousands of websites in minutes, at costs in the range of tens to hundreds of pounds. This process identifies a smaller subset of websites where a meaningful connection between the two entities appears in the main body of text. The more complex task of resolving named entity disambiguation can then be handled by a human analyst, working only on this reduced set.
To illustrate the potential for this use-case, we developed a pipeline to perform this task and devised a case-study to test drive it.
Hayek’s Bastards and the House of Lords
What connections exist between the House of Lords and some of the institutions pivotal in cultivating today’s alt‑right politics from strands of neoliberalism?
We set about identifying meaningful connections between 858 House of Lords peers and a selection of institutions from Quinn Slobodian’s recent book, Hayek’s Bastards. Why would we do this and who are these bastards?
Hayek’s Bastards offers a particular history of today’s alt‑right that suggests it did not spring up in opposition to neoliberalism but rather mutated from its core, evolving into a strain obsessed with racial hierarchy and hereditarian ideas of intelligence. Slobodian charts this evolution through a web of neoliberal think tanks, lobby groups and billionaire benefactors, chiefly in the United States, who re‑engineered the market absolutism of Friedrich Hayek and Ludwig von Mises into a politics of exclusion. Except for outliers such as Britain’s Institute of Economic Affairs, most of these actors are American, prompting us to ask how far their influence and networks extend within the UK’s political elite.
The following institutions from Hayek’s Bastards were chosen to cross reference against the set of peers from the House of Lords:
Name | Description | Country |
---|---|---|
American Enterprise Institute | centre-right think-tank | US |
American Immigration Control Foundation | immigration reduction campaign group | US |
American Renaissance | white supremacist magazine | US |
Atlas Network | libertarian NGO | US |
Bradley Foundation | conservative funder | US |
British Eugenics Society | non-profit learned society | UK |
Carthage Foundation | right-of-centre funder | US |
Cato Institute | libertarian think-tank | US |
Center for Libertarian Studies | libertarian anarcho-capitalist non-profit | US |
Competitive Enterprise Institute | libertarian think-tank | US |
Conservative Political Action Conference | political conference | US |
Eigentümlich Frei | new-right publisher | Germany |
Federation for American Immigration Reform | anti-immigration non-profit | US |
Friedrich Hayek Society | non-profit members association | Germany |
Heritage Foundation | conservative think-tank | US |
Hoover Institution | public policy think-tank | US |
Human Diversity Foundation | race science publisher | US |
Institute of Economic Affairs | right-wing free market think-tank | UK |
Institute of Humane Studies | non-profit promoting liberalism | US |
Institut für Staatspolitik | new-right think tank | Germany |
Junge Freiheit | conservative nationalist newspaper | Germany |
John Randolph Club | paleoconservative think-tank | US |
Mises Institute | non-profit promoting Austrian economics | US |
Manhattan Institute | conservative think-tank | US |
Mankind Quarterly | journal | UK/US |
Mercatus Center | libertarian, free-market-oriented think tank | US |
Mont Pèlerin Society | liberal academic society | US |
Noontide Press | far-right publisher | US |
Pioneer Fund | race science funder | US |
Property and Freedom Society | anarcho-capitalist political organisation | Turkey |
Quarterly Journal of Austrian Economics | peer-reviewed academic journal | US |
Rockford Institute | conservative think-tank | US |
Washington Summit Publishers | white nationalist publisher | US |
VDARE | far right website | US |
Methodology
Our pipeline implements the following steps.
1. Assemble Source Lists
The list of current House of Lords peers was sourced from They Work for You whilst the list of institutions from Hayek’s Bastards was sourced from reading the book.
2. Expand & Normalise Names
To capture every relevant hit, we expanded each entities’s name to all common variants. Searching “Lord Cameron of Chipping Norton”
for instance, surfaces different results than “David Cameron”
. For every entity we ran a quick web query (e.g., "Lord Cameron of Chipping Norton" peer conservative "House of Lords"
), passed the snippets to an LLM, and let it return the full alias list e.g. “David William Donald Cameron”
, “David W. D. Cameron”
etc. This single step increased our original 858 peers into 3,459 distinct names, dramatically widening the search space.
3. Construct Search Queries
We paired every peer alias with every institution name, producing 152,592 unique search strings. For example, to probe links between Lord Frost and the Heritage Foundation we searched "Lord Frost" "Heritage Foundation"
.
4. Execute Web Searches
Running the 152,592 queries produced roughly 100K hits, yet a hit count tells us nothing about the strength or nature of any link. Google snippets give only a headline, URL, and a few teaser lines. This is usually far too thin to judge a real connection. Take these results:
All three mention both names, yet none clarifies how (or even whether) they relate. To verify any relationship we clearly need to scrape the full page content for every result that cited both entities.
5. Retrieve & Filter Pages
Not every page will surrender its text. Paywalls, logins, and anti‑bot defences routinely shut scrapers out. For instance, the most common domain within the search results was Scribd
, a website for sharing documents that requires an account and offers no public API. After testing the most popular domains, we attempted to scrape the 32,601 URLs that seemed at least partly accessible. From those we successfully pulled usable text from 22,279 pages.
As we were unable to retrieve relevant text from the majority of search results, we are releasing this data for any curious researchers to explore.
6. Craft LLM Prompts
The key obstacle when prompting LLMs is context control; the model needs enough surrounding text to judge whether two entities are truly connected, but not so much that the prompt becomes unnecessarily costly or cluttered. With shorter texts we can simply include the entire article within a prompt. However web‑scraped pages within our dataset range from a six‑word job title to sprawling, million‑word tomes that mention our targets only in passing. Feeding all of that to the LLM is both inefficient and unfocused.
We therefore devised a scalable prompt‑building strategy that distills each page to the minimum context required for reasonably reliable inference, regardless of length.
To keep prompts both focused and economical, we rely on a lightweight three‑snippet strategy, always passing the model the following sections from each article:
Lead paragraph: usually the first 100–150 words to orient the model on the subject matter and tone.
Entity‑A window: a symmetric slice of text (± W words) centred on the first mention of entity A.
Entity‑B window: the same‑sized slice around the first mention of entity B.
The window width W balances cost and context; smaller windows save tokens but may miss clues whilst larger ones capture more evidence at higher cost. Because the three snippets are concatenated in order (lead → A → B), the model receives a compact narrative arc that mirrors how a human reader might be introduced to these entities.
To decide which chunks to keep, we first set a global token budget G as the maximum number of tokens we can spend on context. With G fixed, the following table shows how we choose what to send to the LLM:
Scenario | Decision rule | Context sent to the LLM |
---|---|---|
Short article | Article length ≤ G | Entire article |
Entities close together (early) | Offset from start to later first‑mention ≤ G | One continuous chunk from start to entity mentions |
Entities far apart | Offset from start to later first‑mention > G | If entity A is near the top → 2 chunks: combined (lead + A‑window) and B‑window Otherwise → 3 chunks: lead, A‑window, B‑window |
Anchoring every decision to G (total allowance) and W (per‑entity context) improves the likelyhood that each prompt supplies just enough relevant information without overrunning token limits.
7. Infer Entity Connections
After the LLM flagged passages that it judged as depicting a meaningful link between two entities, we extracted relevant named entities and suggested relevant entities from those passages and matched each one to its canonical Wikipedia entry. This grounding step standardised the names and anchored our graph to reliable, publicly maintained IDs.
8. Archive Results in Obsidian
All curated data was archived to an Obsidian vault, providing an interactive workspace where you can browse search hits and navigate the entity graph.
To explore the dataset, you must first download Obsidian, download and unzip the dataset below and then follow the instructions to create a vault from an existing folder (in this case the folder you just unzipped).
Every hit comes with an LLM‑generated synopsis and a summary receipt that records the model used to generate the synopsis, the fraction of the article included within the prompt, and whether this fraction of text was supplied as a single continuous block or split into windows. This metadata gives researchers an additional measure to judge the reliability of each synopsis before deciding to open the original page. Each synopsis should be interpreted only as a guide to what may exist on the webpage and absolutely not as fact as it was generated by an LLM. Similarly the content of each webpage should be carefully judged as to whether of not the information is reliable:

After manually removing links that appeared to refer to other persons or organisations, we identified potentially meaningful connections between 20 organisations from Hayek’s Bastards and 152 peers from the House of Lords.

Most Cited Organisations | Connections |
---|---|
Institute of Economic Affairs | 95 |
Heritage Foundation | 42 |
American Enterprise Institute | 24 |
Atlas Network | 23 |
Cato Institute | 20 |
Most Cited Peers | Connections |
---|---|
Lord Hannan of Kingsclere | 11 |
Lord Elliott of Mickle Fell | 10 |
Lord Moynihan of Chelsea | 9 |
Lord Gove | 8 |
Lord Hintze | 7 |