llms.txt API Reference¶

Auto-generated API documentation for the llms.txt parser module.

`llmstxt` ¶

llms.txt parser — zero dependencies, stdlib only, Python 3.10+.

Parse llms.txt files per the llmstxt.org specification into structured data, and generate candidate per-page markdown URLs for content discovery.

Example::

from llmstxt import parse, find_candidates

doc = parse("""# My Project
> A cool project

Some details here.

## Docs
- [Guide](https://example.com/guide.md): The main guide
""")
print(doc.title)       # 'My Project'
print(doc.sections)    # {'Docs': [FileEntry(name='Guide', ...)]}

# With a parsed llms.txt — looks up matching entries, falls back to heuristic
matches = find_candidates("https://example.com/guide", doc=doc)
# [FileEntry(name='Guide', url='https://example.com/guide.md', ...)]

# Without llms.txt — pure heuristic URL generation
matches = find_candidates("https://example.com/docs")
# [FileEntry(name='', url='https://example.com/docs.md', ...)]

`LlmsTxtError` ¶

Bases: Exception

Raised when llms.txt parsing fails due to structural issues.

Source code in llmstxt/llmstxt.py

class LlmsTxtError(Exception):
    """Raised when llms.txt parsing fails due to structural issues."""

`FileEntry` `dataclass` ¶

A linked resource entry from an llms.txt file list.

Attributes:

Name	Type	Description
`name`	`str`	Display name of the link.
`url`	`str`	URL of the linked resource.
`notes`	`str`	Descriptive text after the `:` separator, or empty string.

Source code in llmstxt/llmstxt.py

@dataclasses.dataclass(frozen=True, slots=True)
class FileEntry:
    """A linked resource entry from an llms.txt file list.

    Attributes:
        name: Display name of the link.
        url: URL of the linked resource.
        notes: Descriptive text after the ``: `` separator, or empty string.
    """

    name: str
    url: str
    notes: str = ""

`LlmsTxt` `dataclass` ¶

Parsed representation of an llms.txt file.

Attributes:

Name	Type	Description
`title`	`str`	The H1 heading (project/site name).
`description`	`str`	The blockquote summary, or empty string if absent.
`details`	`str`	Text paragraphs between blockquote and first H2, or empty string.
`sections`	`dict[str, list[FileEntry]]`	Mapping of H2 section name to list of file entries. The special `"Optional"` section is excluded from this dict.
`optional`	`list[FileEntry]`	Entries from the `## Optional` section, or empty list.

Source code in llmstxt/llmstxt.py

@dataclasses.dataclass(frozen=True, slots=True)
class LlmsTxt:
    """Parsed representation of an llms.txt file.

    Attributes:
        title: The H1 heading (project/site name).
        description: The blockquote summary, or empty string if absent.
        details: Text paragraphs between blockquote and first H2, or empty
            string.
        sections: Mapping of H2 section name to list of file entries.
            The special ``"Optional"`` section is excluded from this dict.
        optional: Entries from the ``## Optional`` section, or empty list.
    """

    title: str
    description: str = ""
    details: str = ""
    sections: dict[str, list[FileEntry]] = dataclasses.field(default_factory=dict)
    optional: list[FileEntry] = dataclasses.field(default_factory=list)

`DiscoveryResult` `dataclass` ¶

Result of probing a site for llms.txt and llms-full.txt.

Attributes:

Name	Type	Description
`llms_txt`	`str \| None`	Raw content of `/llms.txt`, or `None` if not found.
`llms_full_txt`	`str \| None`	Raw content of `/llms-full.txt`, or `None` if not found.
`source_url`	`str`	The root URL (`{scheme}://{netloc}`) that was probed.

Source code in llmstxt/llmstxt.py

@dataclasses.dataclass(frozen=True, slots=True)
class DiscoveryResult:
    """Result of probing a site for llms.txt and llms-full.txt.

    Attributes:
        llms_txt: Raw content of ``/llms.txt``, or ``None`` if not found.
        llms_full_txt: Raw content of ``/llms-full.txt``, or ``None`` if not
            found.
        source_url: The root URL (``{scheme}://{netloc}``) that was probed.
    """

    llms_txt: str | None = None
    llms_full_txt: str | None = None
    source_url: str = ""

`parse(text)` ¶

Parse llms.txt content into structured data.

Parameters:

Name	Type	Description	Default
`text`	`str`	Raw text content of an llms.txt file.	required

Returns:

Type	Description
`LlmsTxt`	Parsed `LlmsTxt` object.

Raises:

Type	Description
`LlmsTxtError`	If the required H1 title is missing.

Source code in llmstxt/llmstxt.py

def parse(text: str) -> LlmsTxt:
    """Parse llms.txt content into structured data.

    Args:
        text: Raw text content of an llms.txt file.

    Returns:
        Parsed ``LlmsTxt`` object.

    Raises:
        LlmsTxtError: If the required H1 title is missing.
    """
    text = text.replace("\r\n", "\n").strip()
    if not text:
        raise LlmsTxtError("empty input")

    # Extract H1 title
    h1_match = _H1_RE.search(text)
    if not h1_match:
        raise LlmsTxtError("missing required H1 title")
    title = h1_match.group(1).strip()

    # Split on H2 headers: [preamble, name1, body1, name2, body2, ...]
    parts = _H2_SPLIT_RE.split(text)
    preamble = parts[0]

    # Remove the H1 line from preamble before parsing description/details
    preamble = preamble[h1_match.end() :]

    description, details = _parse_preamble(preamble)

    # Parse sections
    sections: dict[str, list[FileEntry]] = {}
    for i in range(1, len(parts), 2):
        section_name = parts[i].strip()
        section_body = parts[i + 1] if i + 1 < len(parts) else ""
        sections[section_name] = _parse_links(section_body)

    # Separate "Optional" section
    optional = sections.pop("Optional", [])

    return LlmsTxt(
        title=title,
        description=description,
        details=details,
        sections=sections,
        optional=optional,
    )

`find_candidates(url, doc=None)` ¶

Find candidate markdown resources for a given URL.

When doc is provided, searches all sections and optional entries for URLs that relate to url (exact match > extension variation > path prefix). If no match is found (or doc is None), falls back to heuristic URL generation based on common per-page .md conventions.

Parameters:

Name	Type	Description	Default
`url`	`str`	The page URL to look up.	required
`doc`	`LlmsTxt \| None`	An optional parsed `LlmsTxt` object to search in.	`None`

Returns:

Type	Description
`list[FileEntry]`	List of `FileEntry` candidates, ordered by match quality.

Source code in llmstxt/llmstxt.py

def find_candidates(url: str, doc: LlmsTxt | None = None) -> list[FileEntry]:
    """Find candidate markdown resources for a given URL.

    When *doc* is provided, searches all sections and optional entries for
    URLs that relate to *url* (exact match > extension variation > path
    prefix).  If no match is found (or *doc* is ``None``), falls back to
    heuristic URL generation based on common per-page ``.md`` conventions.

    Args:
        url: The page URL to look up.
        doc: An optional parsed ``LlmsTxt`` object to search in.

    Returns:
        List of ``FileEntry`` candidates, ordered by match quality.
    """
    base = _strip_url(url)
    base_path = _url_path(url).rstrip("/")

    # ── Search llms.txt entries ──
    if doc is not None:
        all_entries: list[FileEntry] = []
        for entries in doc.sections.values():
            all_entries.extend(entries)
        all_entries.extend(doc.optional)

        exact: list[FileEntry] = []
        extension: list[FileEntry] = []
        prefix: list[FileEntry] = []

        for entry in all_entries:
            entry_base = _strip_url(entry.url)
            entry_path = _url_path(entry.url).rstrip("/")

            if entry_base == base:
                exact.append(entry)
                continue

            if entry_path == base_path + ".md" or entry_path == base_path + ".html.md":
                extension.append(entry)
                continue
            if base_path == entry_path + ".md" or base_path == entry_path + ".html.md":
                extension.append(entry)
                continue

            if entry_path.startswith(base_path + "/") or base_path.startswith(
                entry_path + "/"
            ):
                prefix.append(entry)

        results = exact + extension + prefix
        if results:
            return results

    # ── Fallback: heuristic URL candidates ──
    return [FileEntry(name="", url=u) for u in _candidate_md_urls(url)]

`discover(url, *, timeout=10)` ¶

Probe a site for /llms.txt and /llms-full.txt.

Given any URL, extracts the root ({scheme}://{netloc}) and attempts to fetch both /llms.txt and /llms-full.txt. If the input URL already points to one of these files, it is still fetched (along with its sibling).

Parameters:

Name	Type	Description	Default
`url`	`str`	Any URL belonging to the target site.	required
`timeout`	`int`	HTTP request timeout in seconds (per request).	`10`

Returns:

Type	Description
`DiscoveryResult`	A `DiscoveryResult` with the raw content of whichever files were
`DiscoveryResult`	found (fields are `None` when the file does not exist or could not
`DiscoveryResult`	be fetched).

Example::

result = discover("https://example.com/docs/guide")
content = result.llms_full_txt or result.llms_txt
if content:
    doc = parse(content)

Source code in llmstxt/llmstxt.py

def discover(url: str, *, timeout: int = 10) -> DiscoveryResult:
    """Probe a site for ``/llms.txt`` and ``/llms-full.txt``.

    Given any URL, extracts the root (``{scheme}://{netloc}``) and attempts to
    fetch both ``/llms.txt`` and ``/llms-full.txt``.  If the input URL already
    points to one of these files, it is still fetched (along with its sibling).

    Args:
        url: Any URL belonging to the target site.
        timeout: HTTP request timeout in seconds (per request).

    Returns:
        A ``DiscoveryResult`` with the raw content of whichever files were
        found (fields are ``None`` when the file does not exist or could not
        be fetched).

    Example::

        result = discover("https://example.com/docs/guide")
        content = result.llms_full_txt or result.llms_txt
        if content:
            doc = parse(content)
    """
    parsed = urllib.parse.urlparse(url)
    root = f"{parsed.scheme}://{parsed.netloc}"

    llms_txt = _fetch_text(f"{root}/llms.txt", timeout)
    llms_full_txt = _fetch_text(f"{root}/llms-full.txt", timeout)

    return DiscoveryResult(
        llms_txt=llms_txt,
        llms_full_txt=llms_full_txt,
        source_url=root,
    )

llms.txt API Reference¶

llmstxt ¶

LlmsTxtError ¶

FileEntry dataclass ¶

LlmsTxt dataclass ¶

DiscoveryResult dataclass ¶

parse(text) ¶

find_candidates(url, doc=None) ¶

discover(url, *, timeout=10) ¶

`llmstxt` ¶

`LlmsTxtError` ¶

`FileEntry` `dataclass` ¶

`LlmsTxt` `dataclass` ¶

`DiscoveryResult` `dataclass` ¶

`parse(text)` ¶

`find_candidates(url, doc=None)` ¶

`discover(url, *, timeout=10)` ¶