`duck.utils.path`¶

Module for Path Operations, .e.g path sanitization, manipulations, joining etc.

Module Contents¶

Functions¶

`build_absolute_uri`	This builds an absolute url from provided root_url and path.
`is_absolute_url`	Check whether a URL is s complete url including scheme (e.g. ‘https’)
`is_good_url_path`	Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.
`joinpaths`	Returns joined paths but makes sure all paths are included in the final path rather than `os.path.join`.
`normalize_url_path`	Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.
`paths_are_same`	Checks if two paths point to the same location, handling case-insensitivity and different separators.
`replace_hostname`	Replaces the hostname in a URL.
`sanitize_path_segment`	Sanitize a path segment to prevent directory traversal attacks. (same as `normalize_url_path`)
`url_normalize`	Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.

Data¶

URL_PATH_REGEX

API¶

duck.utils.path.URL_PATH_REGEX¶: ‘^[a-zA-Z0-9-._~:/?#\ue001\ue001@!\(&\\()*+,;=%]*\)’

duck.utils.path.build_absolute_uri(root_url: str, path: str, normalization_ignore_chars: Optional[List[str]] = None) → str[source]¶

This builds an absolute url from provided root_url and path.

Parameters:

path – The path to join with the root url.
normalization_ignore_chars – List of characters to ignore when normalizing the url path. By default, all unsafe characters are stripped.

duck.utils.path.is_absolute_url(url: str)[source]¶: Check whether a URL is s complete url including scheme (e.g. ‘https’)

duck.utils.path.is_good_url_path(url_path: str) → bool[source]¶

Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.

Parameters:: url_path – The URL path string to validate.
Returns:: True if the URL is in the specified format and has no disallowed characters, False otherwise.
Return type:: bool

duck.utils.path.joinpaths(path1: Union[str, pathlib.Path], path2: Union[str, pathlib.Path], *more)[source]¶: Returns joined paths but makes sure all paths are included in the final path rather than os.path.join.

duck.utils.path.normalize_url_path(url_path: str, ignore_chars: Optional[List[str]] = None) → str[source]¶: Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.

duck.utils.path.paths_are_same(path1, path2)[source]¶: Checks if two paths point to the same location, handling case-insensitivity and different separators.

duck.utils.path.replace_hostname(url: str, hostname: str) → str[source]¶

Replaces the hostname in a URL.

If URL doesn’t have scheme (e.g https) or is a urlpath, no modifications will be done.

Parameters:

url – The target URL.
new_hostname – The new hostname or domain.

duck.utils.path.sanitize_path_segment(segment)[source]¶

Sanitize a path segment to prevent directory traversal attacks. (same as normalize_url_path)

Parameters:: segment – The path segment to sanitize.
Returns:: The sanitized path segment.
Return type:: str

duck.utils.path.url_normalize(url: str, ignore_chars: Optional[List[str]] = None) → str[source]¶: Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.

duck.utils.path¶

Module Contents¶

Functions¶

Data¶

API¶

`duck.utils.path`¶