Corkboard – The Autonomy Data Unit Blog

TL;DR

Researching connections between named entities quickly becomes overwhelming as the number of entities increases.

To address this, we built a pipeline that combines web scraping with LLMs to identify websites where a meaningful connection between two entities is likely.

🚨 Disclaimer: The pipeline surfaces potentially relevant connections from online sources; each connection should then be independently verified.

For this demo, we ran the pipeline on 858 House of Lords peers and 34 neoliberal, libertarian, or alt-right institutions cited in Quinn Slobodian’s Hayek’s Bastards to uncover any meaningful links between them. The results of this curated web search are archived within an Obsidian vault, which we make free to download (~1MB), open and explore.

🔎 If you have your own list of entities to research and would like help, please reach out: adu@autonomy.work

The internet has made one-man intelligence agencies of all of us. Armed with some names and too much free time, digging around and connecting the dots is a relatively frictionless if time-consuming experience. On occasion, you might find something interesting. Yet it is common to run into hard scaling limits when attempting to map the connections between entities within large networks.

What do we mean by networks? Almost anything can be represented as a network. You could make a network by mapping connections between politicians and offshore companies, topics and writers, locations and events etc.

In this blog, we focus on a single use case:

Given a list of entities, such as people or organisations, are any of them meaningfully connected to one another, and if so, how?

In a world where everything is connected if you look hard enough, what do we mean by meaningfully connected? Well, if prompting an LLM we might say something like this:

A connection is meaningful when it reflects significant cooperation, influence, shared control, or material advantage, for example:

Ownership stakes, major investments, or funding flows

Leadership, board, advisory, or employment roles that link the entities

Contractual partnerships, joint ventures, or co‑development deals

Lobbying, advocacy, or coordination aimed at affecting policy or regulation

Legal proceedings or rulings directly involving both parties

Operational integration (e.g., supply‑chain dependencies, technology licensing)

Any other clear, substantive relationship beyond mere co‑mention or coincidence

However even a well crafted prompt won’t coax today’s LLMs into finding every meaningful tie within a large roster of names. Some providers have equipped their models with the ability to run web searches to patch their blind spots, but a few dozen queries barely dent the problem. If we wanted to gather context from the web on every possible relationship that could exist within a set of 200 entities, we would need to run 19,900 web searches. This of course doesn’t even account for the multiple naming conventions for a single entity which would significantly increase the number of searches we’d need to execute.

Thankfully automating tens of thousands of web searches is relatively cheap and fast to run nowadays, perhaps in minutes or hours. In our case, the real challenge lies in sifting through the search results for specific evidence of a meaningful connection. Even search results containing exact‑string matches for both entities often prove irrelevant due to spurious co-occurrence or named entity disambiguation:

Spurious Co-occurrence

Both names show up in the same source, but not in relation to one another. They might sit in two unrelated articles that merely share a webpage or appear in a long list of people. In such cases, the co‑mention does not satisfy our “meaningful connection” criteria.

Named Entity Disambiguation

A single name can point to multiple real‑world actors, so we need extra context to tell which one the text refers to e.g. David Cameron the Yale political scientist versus David Cameron the former UK prime minister.

In our experience, named entity disambiguation is the more difficult challenge to solve at scale. Accurately distinguishing between two entities with the same name often requires consulting multiple sources and, at times, drawing on prior knowledge or domain expertise.

By contrast, filtering out cases of spurious co‑occurrence is generally more straightforward to automate. From a workflow perspective, an LLM only needs to assess the relationship between two entities within a single text; content that can typically be included in its entirety (or in key excerpts) within a single prompt.

Combined with asynchronous processing, it becomes feasible to filter thousands of websites in minutes, at costs in the range of tens to hundreds of pounds. This process identifies a smaller subset of websites where a meaningful connection between the two entities appears in the main body of text. The more complex task of resolving named entity disambiguation can then be handled by a human analyst, working only on this reduced set.

To illustrate the potential for this use-case, we developed a pipeline to perform this task and devised a case-study to test drive it.

Hayek’s Bastards and the House of Lords

What connections exist between the House of Lords and some of the institutions pivotal in cultivating today’s alt‑right politics from strands of neoliberalism?

We set about identifying meaningful connections between 858 House of Lords peers and a selection of institutions from Quinn Slobodian’s recent book, Hayek’s Bastards. Why would we do this and who are these bastards?

Hayek’s Bastards offers a particular history of today’s alt‑right that suggests it did not spring up in opposition to neoliberalism but rather mutated from its core, evolving into a strain obsessed with racial hierarchy and hereditarian ideas of intelligence. Slobodian charts this evolution through a web of neoliberal think tanks, lobby groups and billionaire benefactors, chiefly in the United States, who re‑engineered the market absolutism of Friedrich Hayek and Ludwig von Mises into a politics of exclusion. Except for outliers such as Britain’s Institute of Economic Affairs, most of these actors are American, prompting us to ask how far their influence and networks extend within the UK’s political elite.

The following institutions from Hayek’s Bastards were chosen to cross reference against the set of peers from the House of Lords:

Name	Description	Country
American Enterprise Institute	centre-right think-tank	US
American Immigration Control Foundation	immigration reduction campaign group	US
American Renaissance	white supremacist magazine	US
Atlas Network	libertarian NGO	US
Bradley Foundation	conservative funder	US
British Eugenics Society	non-profit learned society	UK
Carthage Foundation	right-of-centre funder	US
Cato Institute	libertarian think-tank	US
Center for Libertarian Studies	libertarian anarcho-capitalist non-profit	US
Competitive Enterprise Institute	libertarian think-tank	US
Conservative Political Action Conference	political conference	US
Eigentümlich Frei	new-right publisher	Germany
Federation for American Immigration Reform	anti-immigration non-profit	US
Friedrich Hayek Society	non-profit members association	Germany
Heritage Foundation	conservative think-tank	US
Hoover Institution	public policy think-tank	US
Human Diversity Foundation	race science publisher	US
Institute of Economic Affairs	right-wing free market think-tank	UK
Institute of Humane Studies	non-profit promoting liberalism	US
Institut für Staatspolitik	new-right think tank	Germany
Junge Freiheit	conservative nationalist newspaper	Germany
John Randolph Club	paleoconservative think-tank	US
Mises Institute	non-profit promoting Austrian economics	US
Manhattan Institute	conservative think-tank	US
Mankind Quarterly	journal	UK/US
Mercatus Center	libertarian, free-market-oriented think tank	US
Mont Pèlerin Society	liberal academic society	US
Noontide Press	far-right publisher	US
Pioneer Fund	race science funder	US
Property and Freedom Society	anarcho-capitalist political organisation	Turkey
Quarterly Journal of Austrian Economics	peer-reviewed academic journal	US
Rockford Institute	conservative think-tank	US
Washington Summit Publishers	white nationalist publisher	US
VDARE	far right website	US

Methodology

Our pipeline implements the following steps.

1. Assemble Source Lists

The list of current House of Lords peers was sourced from They Work for You whilst the list of institutions from Hayek’s Bastards was sourced from reading the book.

2. Expand & Normalise Names

To capture every relevant hit, we expanded each entities’s name to all common variants. Searching “Lord Cameron of Chipping Norton” for instance, surfaces different results than “David Cameron”. For every entity we ran a quick web query (e.g., "Lord Cameron of Chipping Norton" peer conservative "House of Lords"), passed the snippets to an LLM, and let it return the full alias list e.g. “David William Donald Cameron”, “David W. D. Cameron” etc. This single step increased our original 858 peers into 3,459 distinct names, dramatically widening the search space.

3. Construct Search Queries

We paired every peer alias with every institution name, producing 152,592 unique search strings. For example, to probe links between Lord Frost and the Heritage Foundation we searched "Lord Frost" "Heritage Foundation".

4. Execute Web Searches

Running the 152,592 queries produced roughly 100K hits, yet a hit count tells us nothing about the strength or nature of any link. Google snippets give only a headline, URL, and a few teaser lines. This is usually far too thin to judge a real connection. Take these results:

Last Exit to Freedom? Britain After Brexit and the Future of …
Lord Frost began life as a professional diplomat but entered the … Secondary Navigation. Sign-up for weekly texts from The Heritage Foundation …
Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe …
Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe-wide customs scheme – as it happened … Heritage Foundation Photograph: …
Project 2025: Tory Candidates Have Ties to Group Drafting …
Ex-Tory Brexit minister Lord Frost rejects party’s claims over Europe-wide customs scheme – as it happened … Heritage Foundation Photograph: …

All three mention both names, yet none clarifies how (or even whether) they relate. To verify any relationship we clearly need to scrape the full page content for every result that cited both entities.

5. Retrieve & Filter Pages

Not every page will surrender its text. Paywalls, logins, and anti‑bot defences routinely shut scrapers out. For instance, the most common domain within the search results was Scribd, a website for sharing documents that requires an account and offers no public API. After testing the most popular domains, we attempted to scrape the 32,601 URLs that seemed at least partly accessible. From those we successfully pulled usable text from 22,279 pages.

Data Release

As we were unable to retrieve relevant text from the majority of search results, we are releasing this data for any curious researchers to explore.

📂 Download edges.zip (~13MB)

6. Craft LLM Prompts

The key obstacle when prompting LLMs is context control; the model needs enough surrounding text to judge whether two entities are truly connected, but not so much that the prompt becomes unnecessarily costly or cluttered. With shorter texts we can simply include the entire article within a prompt. However web‑scraped pages within our dataset range from a six‑word job title to sprawling, million‑word tomes that mention our targets only in passing. Feeding all of that to the LLM is both inefficient and unfocused.

We therefore devised a scalable prompt‑building strategy that distills each page to the minimum context required for reasonably reliable inference, regardless of length.

To keep prompts both focused and economical, we rely on a lightweight three‑snippet strategy, always passing the model the following sections from each article:

Lead paragraph: usually the first 100–150 words to orient the model on the subject matter and tone.
Entity‑A window: a symmetric slice of text (± W words) centred on the first mention of entity A.
Entity‑B window: the same‑sized slice around the first mention of entity B.

The window width W balances cost and context; smaller windows save tokens but may miss clues whilst larger ones capture more evidence at higher cost. Because the three snippets are concatenated in order (lead → A → B), the model receives a compact narrative arc that mirrors how a human reader might be introduced to these entities.

To decide which chunks to keep, we first set a global token budget G as the maximum number of tokens we can spend on context. With G fixed, the following table shows how we choose what to send to the LLM:

Scenario	Decision rule	Context sent to the LLM
Short article	Article length ≤ G	Entire article
Entities close together (early)	Offset from start to later first‑mention ≤ G	One continuous chunk from start to entity mentions
Entities far apart	Offset from start to later first‑mention > G	If entity A is near the top  →  2 chunks: combined (lead + A‑window) and B‑window Otherwise  →  3 chunks: lead, A‑window, B‑window

Anchoring every decision to G (total allowance) and W (per‑entity context) improves the likelyhood that each prompt supplies just enough relevant information without overrunning token limits.

7. Infer Entity Connections

After the LLM flagged passages that it judged as depicting a meaningful link between two entities, we extracted relevant named entities and suggested relevant entities from those passages and matched each one to its canonical Wikipedia entry. This grounding step standardised the names and anchored our graph to reliable, publicly maintained IDs.

8. Archive Results in Obsidian

All curated data was archived to an Obsidian vault, providing an interactive workspace where you can browse search hits and navigate the entity graph.

Data Release

To explore the dataset, you must first download Obsidian, download and unzip the dataset below and then follow the instructions to create a vault from an existing folder (in this case the folder you just unzipped).

📂 Download corkboard_vault.zip (~1MB)

Every hit comes with an LLM‑generated synopsis and a summary receipt that records the model used to generate the synopsis, the fraction of the article included within the prompt, and whether this fraction of text was supplied as a single continuous block or split into windows. This metadata gives researchers an additional measure to judge the reliability of each synopsis before deciding to open the original page. Each synopsis should be interpreted only as a guide to what may exist on the webpage and absolutely not as fact as it was generated by an LLM. Similarly the content of each webpage should be carefully judged as to whether of not the information is reliable:

Single search result in Obsidian with synopsis

After manually removing links that appeared to refer to other persons or organisations, we identified potentially meaningful connections between 20 organisations from Hayek’s Bastards and 152 peers from the House of Lords.

Obsidian graph view of connections between nodes

Most Cited Organisations	Connections
Institute of Economic Affairs	94
Heritage Foundation	42
American Enterprise Institute	22
Atlas Network	22
Cato Institute	18

Most Cited Peers	Connections
Lord Hannan of Kingsclere	11
Lord Elliott of Mickle Fell	10
Lord Moynihan of Chelsea	9
Lord Gove	8
Lord Hintze	7