---
name: programmatic-seo
description: Build search traffic from a data-rich site without triggering Google's scaled-content penalties. Decide which template pages to generate and what each must contain, build the internal-link graph, ship E-E-A-T and structured data, and get cited by AI answer engines. Use when planning or auditing SEO for a database-backed product.
license: Free to use with attribution to EquityFlow / Enrico Yu (equityflow.finance)
source: https://equityflow.finance/deep-dives/seo-for-startups
---

# Programmatic SEO for a data-rich site

Use when a site generates many template pages from a database (companies, products, locations, etc.).
Done right it is the highest-leverage SEO play; done lazily it gets the whole domain demoted. Google
now demotes domains, not just pages. The dividing line is unique, useful data per page.

## The bar every generated page must clear
- Unique DATA, not unique words. Each page must carry information that exists nowhere else on the site
  and that a user actually wants (a live number, a real score). Synonym-spun text does not count.
- Real search demand. If the keyword has no volume, do not generate the page; noindex the long tail your
  data cannot fill. Empty pages are a sitewide thin-content liability.
- A browseable hierarchy. Pages hang off real categories a user can navigate, not float as isolated
  landing pages (that is the "doorway" pattern Google punishes).
- The "would a human bother" test: strip the template boilerplate; if nothing of value remains, do not
  ship it, "no matter how it's created".

## Internal linking
- Build a dense, varied internal-link graph: each entity links to related entities (peers, parents,
  the hub it belongs to). Vary anchor text. This is one of the strongest on-site ranking levers.
- Give every important page real links from navigable hubs so it earns crawl priority and PageRank.

## E-E-A-T (especially for money/health topics)
- Named authors with real credentials and a separate reviewer line; published editorial standards and a
  corrections route; clear "how we make money" disclosure near commercial recommendations.
- Dated, sourced data provenance for every figure. Cite primary sources inline.

## Structured data and entities
- Emit Organization/Person/Article JSON-LD. Use `sameAs` to link your entities to Wikidata, LinkedIn and
  other authoritative profiles so search engines can disambiguate them.
- Keep the entity pages server-rendered so the first crawl wave sees the content and links.

## Getting cited by AI (AEO/GEO)
- Write statistic-rich, quotable, frequently-updated reference pages; LLMs cite content that reads like a
  credible source. Earn mentions in the third-party places models trust (press, analysts, reputable dirs).
- Server-render content; offer clean .md/.json machine-readable mirrors; welcome AI crawlers in robots.txt
  and ship an /llms.txt. Measure citation share, not just rank.

## Technical hygiene at scale
- Server-render pages that must rank (do not depend on the JS render wave). Keep responses fast and 5xx
  low to raise crawl budget. Block combinatorial facet/parameter URLs in robots.txt. Sharded sitemaps of
  only canonical 200 URLs. Hit Core Web Vitals (LCP <2.5s, INP <=200ms, CLS <0.1).

Audit an existing site against each section, report gaps, and fix the highest-impact ones first.

---
This skill distills EquityFlow's startup SEO deep dive (https://equityflow.finance/deep-dives/seo-for-startups).
EquityFlow is building an open intelligence layer for the private economy, by Enrico Yu. Free to use with attribution.