duck.contrib.sitemap

Sitemap builder for Duck.

Class-based sitemap builder that walks the application’s RouteRegistry and builds an XML sitemap using Duck’s component system (duck.html.components.to_component).

… rubric:: Example

builder = SitemapBuilder( server_url=None, # Parsing None will automatically resolve server URL save_to_file=True, filepath=“/etc/sitemap.xml”, extra_urls=[“/about”, “https://example.com/contact”], exclude_patterns=[“^/admin”, “https://example.com/secret”, “^/api/.*”], default_priority=0.5, default_changefreq=“weekly”, ) xml = builder.build(return_content=True)

Module Contents

Classes

SitemapBuilder

Build an XML sitemap for a Duck application.

Data

DEFAULT_EXCLUDES

to_component

API

duck.contrib.sitemap.DEFAULT_EXCLUDES

None

class duck.contrib.sitemap.SitemapBuilder(server_url: str = None, filepath: Optional[str | pathlib.Path] = None, save_to_file: bool = True, extra_urls: Optional[Iterable[str]] = None, exclude_patterns: Optional[Iterable[str]] = None, default_priority: Optional[float] = 0.5, default_changefreq: Optional[str] = 'monthly', apply_default_excludes: bool = True, excludes_ignorecase: bool = True)

Build an XML sitemap for a Duck application.

The builder walks RouteRegistry.url_map, filters out dynamic or regex-like routes, supports explicit extra URLs, supports exclude patterns (absolute or relative, plain or regex), and emits a sitemap using Duck components.

Initialization

Initialize the builder.

Parameters:
  • filepath – Optional path to save sitemap XML.

  • save_to_file – Whether to persist the sitemap to disk. Filepath must be provided.

  • extra_urls – Extra URL strings (absolute or path) to include in addition to the registered routes.

  • exclude_patterns – URL strings or regex patterns to exclude. Absolute excludes match against the final URL; non-absolute excludes match against the registered route path and the final URL.

  • default_priority – Default value for URLs (0.0 - 1.0). If None the element is omitted.

  • default_changefreq – Default value for URLs (e.g., “daily”, “weekly”). If None the element is omitted.

  • apply_default_excludes – Whether to apply default exclude patterns to your list of exclude_patterns. Defaults to True.

  • excludes_ignorecase – Whether to use re.IGNORECASE when compiling exclude patterns. Defaults to True.

_REGEX_META_CHARS

‘[\^\$\*\+\?\[\]\(\)\]’

__slots__

(‘server_url’, ‘filepath’, ‘save_to_file’, ‘extra_urls’, ‘exclude_patterns’, ‘default_priority’, ‘de…

_build_url_component(url_obj: duck.utils.urlcrack.URL, lastmod_iso: str, changefreq: Optional[str], priority: Optional[float])

Construct a component for a given URL.

Parameters:
  • url_obj – The URL object to include.

  • lastmod_iso – ISO formatted last modified date string.

  • changefreq – Optional changefreq value.

  • priority – Optional priority between 0.0 and 1.0.

Returns:

Component instance for the element.

_collect_extra_urls(existing_set: Set[str]) List[duck.utils.urlcrack.URL]

Normalize and filter explicitly provided extra URLs.

Parameters:

existing_set – Set of absolute URL strings already collected.

Returns:

A list of extra absolute URL objects to include.

_collect_registered_urls() List[duck.utils.urlcrack.URL]

Collect absolute URLs from RouteRegistry that are valid sitemap candidates.

Returns:

A list of absolute URL objects derived from registered routes.

_is_excluded(full_url_str: str, registered_route_pattern: str) bool

Decide whether a candidate URL should be excluded.

Excludes in self.exclude_patterns can be:

  • absolute URL strings (or regexes) which match the full URL,

  • relative paths or patterns matched against registered route pattern or full URL,

  • plain strings (exact match) or regex patterns.

Parameters:
  • full_url_str – The absolute URL string to evaluate.

  • registered_route_pattern – The registered route string or compiled pattern string to use for relative-match comparisons.

Returns:

True if the URL should be excluded.

static _looks_like_regex(path: str) bool

Return True if path contains characters that look like a regex.

Parameters:

path – Registered route string.

Returns:

True if the string contains regex meta characters.

_to_absolute_url(raw: str) duck.utils.urlcrack.URL

Convert a raw URL or path into an absolute URL object.

Parameters:

raw – Absolute URL string or path.

Returns:

An absolute URL object.

Return type:

URL

build(return_content: bool = True) Optional[str]

Build the sitemap XML.

Parameters:

return_content – If True, return the sitemap XML as a string. If False, return None (but still save to file if configured).

Returns:

The sitemap XML string when return_content is True, otherwise None.

duck.contrib.sitemap.to_component

None