llms.txt API Reference¶
Auto-generated API documentation for the llms.txt parser module.
llmstxt
¶
llms.txt parser — zero dependencies, stdlib only, Python 3.10+.
Part of zerodep: https://github.com/Oaklight/zerodep Copyright (c) 2026 Peng Ding. MIT License.
Parse llms.txt files per the llmstxt.org specification into structured data, and generate candidate per-page markdown URLs for content discovery.
Example::
from llmstxt import parse, find_candidates
doc = parse("""# My Project
> A cool project
Some details here.
## Docs
- [Guide](https://example.com/guide.md): The main guide
""")
print(doc.title) # 'My Project'
print(doc.sections) # {'Docs': [FileEntry(name='Guide', ...)]}
# With a parsed llms.txt — looks up matching entries, falls back to heuristic
matches = find_candidates("https://example.com/guide", doc=doc)
# [FileEntry(name='Guide', url='https://example.com/guide.md', ...)]
# Without llms.txt — pure heuristic URL generation
matches = find_candidates("https://example.com/docs")
# [FileEntry(name='', url='https://example.com/docs.md', ...)]
LlmsTxtError
¶
FileEntry
dataclass
¶
A linked resource entry from an llms.txt file list.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Display name of the link. |
url |
str
|
URL of the linked resource. |
notes |
str
|
Descriptive text after the |
Source code in llmstxt/llmstxt.py
LlmsTxt
dataclass
¶
Parsed representation of an llms.txt file.
Attributes:
| Name | Type | Description |
|---|---|---|
title |
str
|
The H1 heading (project/site name). |
description |
str
|
The blockquote summary, or empty string if absent. |
details |
str
|
Text paragraphs between blockquote and first H2, or empty string. |
sections |
dict[str, list[FileEntry]]
|
Mapping of H2 section name to list of file entries.
The special |
optional |
list[FileEntry]
|
Entries from the |
Source code in llmstxt/llmstxt.py
DiscoveryResult
dataclass
¶
Result of probing a site for llms.txt and llms-full.txt.
Attributes:
| Name | Type | Description |
|---|---|---|
llms_txt |
str | None
|
Raw content of |
llms_full_txt |
str | None
|
Raw content of |
source_url |
str
|
The root URL ( |
Source code in llmstxt/llmstxt.py
parse(text)
¶
Parse llms.txt content into structured data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Raw text content of an llms.txt file. |
required |
Returns:
| Type | Description |
|---|---|
LlmsTxt
|
Parsed |
Raises:
| Type | Description |
|---|---|
LlmsTxtError
|
If the required H1 title is missing. |
Source code in llmstxt/llmstxt.py
find_candidates(url, doc=None)
¶
Find candidate markdown resources for a given URL.
When doc is provided, searches all sections and optional entries for
URLs that relate to url (exact match > extension variation > path
prefix). If no match is found (or doc is None), falls back to
heuristic URL generation based on common per-page .md conventions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The page URL to look up. |
required |
doc
|
LlmsTxt | None
|
An optional parsed |
None
|
Returns:
| Type | Description |
|---|---|
list[FileEntry]
|
List of |
Source code in llmstxt/llmstxt.py
discover(url, *, timeout=10)
¶
Probe a site for /llms.txt and /llms-full.txt.
Given any URL, extracts the root ({scheme}://{netloc}) and attempts to
fetch both /llms.txt and /llms-full.txt. If the input URL already
points to one of these files, it is still fetched (along with its sibling).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
Any URL belonging to the target site. |
required |
timeout
|
int
|
HTTP request timeout in seconds (per request). |
10
|
Returns:
| Type | Description |
|---|---|
DiscoveryResult
|
A |
DiscoveryResult
|
found (fields are |
DiscoveryResult
|
be fetched). |
Example::
result = discover("https://example.com/docs/guide")
content = result.llms_full_txt or result.llms_txt
if content:
doc = parse(content)