XML API Reference¶
Auto-generated API documentation for the XML module.
xml
¶
XML ↔ dict converter with fault-tolerant LLM tag extraction — zero-dep, stdlib only, Python 3.10+.
Part of zerodep: https://github.com/Oaklight/zerodep Copyright (c) 2026 Peng Ding. MIT License.
Provides xmltodict-compatible parse / unparse for bidirectional XML ↔
dict conversion, plus extract_tags for fault-tolerant extraction of XML-like
tags from LLM output (unclosed tags, malformed nesting, streaming truncation).
Standard layer (xmltodict-compatible)::
d = parse('<root><name>Alice</name><age>30</age></root>')
# {'root': {'name': 'Alice', 'age': '30'}}
xml_str = unparse({'root': {'name': 'Alice', 'age': '30'}})
# '<?xml version="1.0" encoding="utf-8"?>\n<root><name>Alice</name><age>30</age></root>'
Lenient layer (LLM tag extraction)::
tags = extract_tags('<answer>42</answer>', 'answer')
# [ExtractedTag(tag='answer', content='42', attrs={}, is_closed=True)]
tags = extract_tags('Here is my thinking <thinking>let me reason')
# [ExtractedTag(tag='thinking', content='let me reason', attrs={}, is_closed=False)]
XMLError
¶
ParsingInterrupted
¶
ExtractedTag
dataclass
¶
A tag extracted from text (possibly malformed XML).
Attributes:
| Name | Type | Description |
|---|---|---|
tag |
str
|
Tag name (e.g. |
content |
str
|
Text content between open and close tags. |
attrs |
dict[str, str]
|
Dictionary of attributes on the opening tag. |
is_closed |
bool
|
True if a matching close tag was found. |
Source code in xml/xml.py
parse(xml_input, *, encoding=None, process_namespaces=False, namespace_separator=':', disable_entities=True, process_comments=False, xml_attribs=True, attr_prefix='@', cdata_key='#text', force_cdata=False, cdata_separator='', postprocessor=None, dict_constructor=dict, strip_whitespace=True, force_list=None, comment_key='#comment')
¶
Parse an XML document into a Python dict.
Compatible with xmltodict.parse(). Attributes are prefixed with
attr_prefix (default "@"), text content is stored under cdata_key
(default "#text"), and same-name siblings auto-coalesce into lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
xml_input
|
str | bytes | IO[bytes]
|
XML string, bytes, or file-like object. |
required |
encoding
|
str | None
|
Character encoding override. |
None
|
process_namespaces
|
bool
|
Expand namespace URIs in element names. |
False
|
namespace_separator
|
str
|
Separator between namespace and local name. |
':'
|
disable_entities
|
bool
|
Block entity declarations for security (XXE). |
True
|
process_comments
|
bool
|
Include XML comments in the output. |
False
|
xml_attribs
|
bool
|
Include element attributes in the output. |
True
|
attr_prefix
|
str
|
Prefix for attribute keys in the output dict. |
'@'
|
cdata_key
|
str
|
Key for text content in the output dict. |
'#text'
|
force_cdata
|
bool
|
Always wrap text content in a dict with cdata_key. |
False
|
cdata_separator
|
str
|
Separator for joining multiple text nodes. |
''
|
postprocessor
|
Callable | None
|
Callable |
None
|
dict_constructor
|
type
|
Dict class to use (default |
dict
|
strip_whitespace
|
bool
|
Strip whitespace from text nodes. |
True
|
force_list
|
bool | tuple[str, ...] | Callable | None
|
Force list creation — bool, tuple of tag names, or callable. |
None
|
comment_key
|
str
|
Key for XML comments in the output dict. |
'#comment'
|
Returns:
| Type | Description |
|---|---|
dict | None
|
Parsed dict, or None for empty documents. |
Raises:
| Type | Description |
|---|---|
XMLError
|
If the XML is malformed. |
Source code in xml/xml.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 | |
unparse(input_dict, *, output=None, encoding='utf-8', full_document=True, short_empty_elements=False, pretty=False, indent='\t', newl='\n', attr_prefix='@', cdata_key='#text', preprocessor=None, namespace_separator=':', namespaces=None, comment_key='#comment')
¶
Convert a Python dict into an XML string.
Compatible with xmltodict.unparse(). Keys prefixed with attr_prefix
(default "@") become element attributes, cdata_key (default
"#text") values become text content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dict
|
dict
|
Dictionary to convert. |
required |
output
|
IO[str] | None
|
File-like object to write to. If None, return a string. |
None
|
encoding
|
str
|
Output encoding (used in XML declaration). |
'utf-8'
|
full_document
|
bool
|
Include |
True
|
short_empty_elements
|
bool
|
Use |
False
|
pretty
|
bool
|
Pretty-print with indentation. |
False
|
indent
|
str
|
Indentation string (used when pretty is True). |
'\t'
|
newl
|
str
|
Newline string (used when pretty is True). |
'\n'
|
attr_prefix
|
str
|
Prefix for attribute keys in the input dict. |
'@'
|
cdata_key
|
str
|
Key for text content in the input dict. |
'#text'
|
preprocessor
|
Callable | None
|
Callable |
None
|
namespace_separator
|
str
|
Separator between namespace prefix and local name. |
':'
|
namespaces
|
dict[str, str] | None
|
Dict mapping namespace URIs to prefixes. |
None
|
comment_key
|
str
|
Key for XML comments in the input dict. |
'#comment'
|
Returns:
| Type | Description |
|---|---|
str | None
|
XML string if output is None, otherwise None. |
Raises:
| Type | Description |
|---|---|
XMLError
|
If the dict cannot be serialized. |
Source code in xml/xml.py
589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 | |
extract_tags(text, tag=None, *, first_only=False)
¶
Extract XML-like tags from text, tolerating malformed XML.
Designed for extracting structured tags from LLM output where the XML may be incomplete, improperly nested, or truncated mid-stream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Raw text containing XML-like tags. |
required |
tag
|
str | None
|
If provided, only extract tags with this name. If None, extract all top-level tags found. |
None
|
first_only
|
bool
|
If True, return after finding the first match. |
False
|
Returns:
| Type | Description |
|---|---|
list[ExtractedTag]
|
List of |
Example::
>>> extract_tags('<answer>42</answer>', 'answer')
[ExtractedTag(tag='answer', content='42', attrs={}, is_closed=True)]
>>> extract_tags('Thinking... <thought>hmm')
[ExtractedTag(tag='thought', content='hmm', attrs={}, is_closed=False)]