Skip to content

llms.txt API Reference

Auto-generated API documentation for the llms.txt parser module.

llmstxt

llms.txt parser — zero dependencies, stdlib only, Python 3.10+.

Part of zerodep: https://github.com/Oaklight/zerodep Copyright (c) 2026 Peng Ding. MIT License.

Parse llms.txt files per the llmstxt.org specification into structured data, and generate candidate per-page markdown URLs for content discovery.

Example::

from llmstxt import parse, find_candidates

doc = parse("""# My Project
> A cool project

Some details here.

## Docs
- [Guide](https://example.com/guide.md): The main guide
""")
print(doc.title)       # 'My Project'
print(doc.sections)    # {'Docs': [FileEntry(name='Guide', ...)]}

# With a parsed llms.txt — looks up matching entries, falls back to heuristic
matches = find_candidates("https://example.com/guide", doc=doc)
# [FileEntry(name='Guide', url='https://example.com/guide.md', ...)]

# Without llms.txt — pure heuristic URL generation
matches = find_candidates("https://example.com/docs")
# [FileEntry(name='', url='https://example.com/docs.md', ...)]

LlmsTxtError

Bases: Exception

Raised when llms.txt parsing fails due to structural issues.

Source code in llmstxt/llmstxt.py
class LlmsTxtError(Exception):
    """Raised when llms.txt parsing fails due to structural issues."""

FileEntry dataclass

A linked resource entry from an llms.txt file list.

Attributes:

Name Type Description
name str

Display name of the link.

url str

URL of the linked resource.

notes str

Descriptive text after the : separator, or empty string.

Source code in llmstxt/llmstxt.py
@dataclasses.dataclass(frozen=True, slots=True)
class FileEntry:
    """A linked resource entry from an llms.txt file list.

    Attributes:
        name: Display name of the link.
        url: URL of the linked resource.
        notes: Descriptive text after the ``: `` separator, or empty string.
    """

    name: str
    url: str
    notes: str = ""

LlmsTxt dataclass

Parsed representation of an llms.txt file.

Attributes:

Name Type Description
title str

The H1 heading (project/site name).

description str

The blockquote summary, or empty string if absent.

details str

Text paragraphs between blockquote and first H2, or empty string.

sections dict[str, list[FileEntry]]

Mapping of H2 section name to list of file entries. The special "Optional" section is excluded from this dict.

optional list[FileEntry]

Entries from the ## Optional section, or empty list.

Source code in llmstxt/llmstxt.py
@dataclasses.dataclass(frozen=True, slots=True)
class LlmsTxt:
    """Parsed representation of an llms.txt file.

    Attributes:
        title: The H1 heading (project/site name).
        description: The blockquote summary, or empty string if absent.
        details: Text paragraphs between blockquote and first H2, or empty
            string.
        sections: Mapping of H2 section name to list of file entries.
            The special ``"Optional"`` section is excluded from this dict.
        optional: Entries from the ``## Optional`` section, or empty list.
    """

    title: str
    description: str = ""
    details: str = ""
    sections: dict[str, list[FileEntry]] = dataclasses.field(default_factory=dict)
    optional: list[FileEntry] = dataclasses.field(default_factory=list)

DiscoveryResult dataclass

Result of probing a site for llms.txt and llms-full.txt.

Attributes:

Name Type Description
llms_txt str | None

Raw content of /llms.txt, or None if not found.

llms_full_txt str | None

Raw content of /llms-full.txt, or None if not found.

source_url str

The root URL ({scheme}://{netloc}) that was probed.

Source code in llmstxt/llmstxt.py
@dataclasses.dataclass(frozen=True, slots=True)
class DiscoveryResult:
    """Result of probing a site for llms.txt and llms-full.txt.

    Attributes:
        llms_txt: Raw content of ``/llms.txt``, or ``None`` if not found.
        llms_full_txt: Raw content of ``/llms-full.txt``, or ``None`` if not
            found.
        source_url: The root URL (``{scheme}://{netloc}``) that was probed.
    """

    llms_txt: str | None = None
    llms_full_txt: str | None = None
    source_url: str = ""

parse(text)

Parse llms.txt content into structured data.

Parameters:

Name Type Description Default
text str

Raw text content of an llms.txt file.

required

Returns:

Type Description
LlmsTxt

Parsed LlmsTxt object.

Raises:

Type Description
LlmsTxtError

If the required H1 title is missing.

Source code in llmstxt/llmstxt.py
def parse(text: str) -> LlmsTxt:
    """Parse llms.txt content into structured data.

    Args:
        text: Raw text content of an llms.txt file.

    Returns:
        Parsed ``LlmsTxt`` object.

    Raises:
        LlmsTxtError: If the required H1 title is missing.
    """
    text = text.replace("\r\n", "\n").strip()
    if not text:
        raise LlmsTxtError("empty input")

    # Extract H1 title
    h1_match = _H1_RE.search(text)
    if not h1_match:
        raise LlmsTxtError("missing required H1 title")
    title = h1_match.group(1).strip()

    # Split on H2 headers: [preamble, name1, body1, name2, body2, ...]
    parts = _H2_SPLIT_RE.split(text)
    preamble = parts[0]

    # Remove the H1 line from preamble before parsing description/details
    preamble = preamble[h1_match.end() :]

    description, details = _parse_preamble(preamble)

    # Parse sections
    sections: dict[str, list[FileEntry]] = {}
    for i in range(1, len(parts), 2):
        section_name = parts[i].strip()
        section_body = parts[i + 1] if i + 1 < len(parts) else ""
        sections[section_name] = _parse_links(section_body)

    # Separate "Optional" section
    optional = sections.pop("Optional", [])

    return LlmsTxt(
        title=title,
        description=description,
        details=details,
        sections=sections,
        optional=optional,
    )

find_candidates(url, doc=None)

Find candidate markdown resources for a given URL.

When doc is provided, searches all sections and optional entries for URLs that relate to url (exact match > extension variation > path prefix). If no match is found (or doc is None), falls back to heuristic URL generation based on common per-page .md conventions.

Parameters:

Name Type Description Default
url str

The page URL to look up.

required
doc LlmsTxt | None

An optional parsed LlmsTxt object to search in.

None

Returns:

Type Description
list[FileEntry]

List of FileEntry candidates, ordered by match quality.

Source code in llmstxt/llmstxt.py
def find_candidates(url: str, doc: LlmsTxt | None = None) -> list[FileEntry]:
    """Find candidate markdown resources for a given URL.

    When *doc* is provided, searches all sections and optional entries for
    URLs that relate to *url* (exact match > extension variation > path
    prefix).  If no match is found (or *doc* is ``None``), falls back to
    heuristic URL generation based on common per-page ``.md`` conventions.

    Args:
        url: The page URL to look up.
        doc: An optional parsed ``LlmsTxt`` object to search in.

    Returns:
        List of ``FileEntry`` candidates, ordered by match quality.
    """
    base = _strip_url(url)
    base_path = _url_path(url).rstrip("/")

    # ── Search llms.txt entries ──
    if doc is not None:
        all_entries: list[FileEntry] = []
        for entries in doc.sections.values():
            all_entries.extend(entries)
        all_entries.extend(doc.optional)

        exact: list[FileEntry] = []
        extension: list[FileEntry] = []
        prefix: list[FileEntry] = []

        for entry in all_entries:
            entry_base = _strip_url(entry.url)
            entry_path = _url_path(entry.url).rstrip("/")

            if entry_base == base:
                exact.append(entry)
                continue

            if entry_path == base_path + ".md" or entry_path == base_path + ".html.md":
                extension.append(entry)
                continue
            if base_path == entry_path + ".md" or base_path == entry_path + ".html.md":
                extension.append(entry)
                continue

            if entry_path.startswith(base_path + "/") or base_path.startswith(
                entry_path + "/"
            ):
                prefix.append(entry)

        results = exact + extension + prefix
        if results:
            return results

    # ── Fallback: heuristic URL candidates ──
    return [FileEntry(name="", url=u) for u in _candidate_md_urls(url)]

discover(url, *, timeout=10)

Probe a site for /llms.txt and /llms-full.txt.

Given any URL, extracts the root ({scheme}://{netloc}) and attempts to fetch both /llms.txt and /llms-full.txt. If the input URL already points to one of these files, it is still fetched (along with its sibling).

Parameters:

Name Type Description Default
url str

Any URL belonging to the target site.

required
timeout int

HTTP request timeout in seconds (per request).

10

Returns:

Type Description
DiscoveryResult

A DiscoveryResult with the raw content of whichever files were

DiscoveryResult

found (fields are None when the file does not exist or could not

DiscoveryResult

be fetched).

Example::

result = discover("https://example.com/docs/guide")
content = result.llms_full_txt or result.llms_txt
if content:
    doc = parse(content)
Source code in llmstxt/llmstxt.py
def discover(url: str, *, timeout: int = 10) -> DiscoveryResult:
    """Probe a site for ``/llms.txt`` and ``/llms-full.txt``.

    Given any URL, extracts the root (``{scheme}://{netloc}``) and attempts to
    fetch both ``/llms.txt`` and ``/llms-full.txt``.  If the input URL already
    points to one of these files, it is still fetched (along with its sibling).

    Args:
        url: Any URL belonging to the target site.
        timeout: HTTP request timeout in seconds (per request).

    Returns:
        A ``DiscoveryResult`` with the raw content of whichever files were
        found (fields are ``None`` when the file does not exist or could not
        be fetched).

    Example::

        result = discover("https://example.com/docs/guide")
        content = result.llms_full_txt or result.llms_txt
        if content:
            doc = parse(content)
    """
    parsed = urllib.parse.urlparse(url)
    root = f"{parsed.scheme}://{parsed.netloc}"

    llms_txt = _fetch_text(f"{root}/llms.txt", timeout)
    llms_full_txt = _fetch_text(f"{root}/llms-full.txt", timeout)

    return DiscoveryResult(
        llms_txt=llms_txt,
        llms_full_txt=llms_full_txt,
        source_url=root,
    )