duck.utils.path

Module for Path Operations, .e.g path sanitization, manipulations, joining etc.

Module Contents

Functions

build_absolute_uri

This builds an absolute url from provided root_url and path.

is_absolute_url

Check whether a URL is s complete url including scheme (e.g. ‘https’)

is_good_url_path

Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.

joinpaths

Returns joined paths but makes sure all paths are included in the final path rather than os.path.join.

normalize_url_path

Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.

paths_are_same

Checks if two paths point to the same location, handling case-insensitivity and different separators.

replace_hostname

Replaces the hostname in a URL.

sanitize_path_segment

Sanitize a path segment to prevent directory traversal attacks. (same as normalize_url_path)

url_normalize

Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.

Data

URL_PATH_REGEX

API

duck.utils.path.URL_PATH_REGEX

‘^[a-zA-Z0-9-._~:/?#\ue001\ue001@!\(&\\()*+,;=%]*\)

duck.utils.path.build_absolute_uri(root_url: str, path: str, normalization_ignore_chars: Optional[List[str]] = None) str[source]

This builds an absolute url from provided root_url and path.

Parameters:
  • path – The path to join with the root url.

  • normalization_ignore_chars – List of characters to ignore when normalizing the url path. By default, all unsafe characters are stripped.

duck.utils.path.is_absolute_url(url: str)[source]

Check whether a URL is s complete url including scheme (e.g. ‘https’)

duck.utils.path.is_good_url_path(url_path: str) bool[source]

Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.

Parameters:

url_path – The URL path string to validate.

Returns:

True if the URL is in the specified format and has no disallowed characters, False otherwise.

Return type:

bool

duck.utils.path.joinpaths(path1: Union[str, pathlib.Path], path2: Union[str, pathlib.Path], *more)[source]

Returns joined paths but makes sure all paths are included in the final path rather than os.path.join.

duck.utils.path.normalize_url_path(url_path: str, ignore_chars: Optional[List[str]] = None) str[source]

Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.

duck.utils.path.paths_are_same(path1, path2)[source]

Checks if two paths point to the same location, handling case-insensitivity and different separators.

duck.utils.path.replace_hostname(url: str, hostname: str) str[source]

Replaces the hostname in a URL.

If URL doesn’t have scheme (e.g https) or is a urlpath, no modifications will be done.

Parameters:
  • url – The target URL.

  • new_hostname – The new hostname or domain.

duck.utils.path.sanitize_path_segment(segment)[source]

Sanitize a path segment to prevent directory traversal attacks. (same as normalize_url_path)

Parameters:

segment – The path segment to sanitize.

Returns:

The sanitized path segment.

Return type:

str

duck.utils.path.url_normalize(url: str, ignore_chars: Optional[List[str]] = None) str[source]

Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.