duck.utils.pathยถ

Module for Path Operations, .e.g path sanitization, manipulations, joining etc.

Module Contentsยถ

Functionsยถ

build_absolute_uri

This builds an absolute url from provided root_url and path.

is_absolute_url

Check whether a URL is s complete url including scheme (e.g. โ€˜httpsโ€™)

is_good_url_path

Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.

joinpaths

Returns joined paths but makes sure all paths are included in the final path rather than os.path.join.

normalize_url_path

Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g โ€œ<โ€, string quotes (etc), replacing back slashes and lowercasing the scheme.

paths_are_same

Checks if two paths point to the same location, handling case-insensitivity and different separators.

replace_hostname

Replaces the hostname in a URL.

sanitize_path_segment

Sanitize a path segment to prevent directory traversal attacks. (same as normalize_url_path)

url_normalize

Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g โ€œ<โ€, string quotes (etc), replacing back slashes and lowercasing the scheme.

Dataยถ

URL_PATH_REGEX

APIยถ

duck.utils.path.URL_PATH_REGEXยถ

โ€˜^[a-zA-Z0-9-._~:/?#\ue001\ue001@!\(&\\()*+,;=%]*\)โ€™

duck.utils.path.build_absolute_uri(root_url: str, path: str, normalization_ignore_chars: Optional[List[str]] = None) โ†’ str[source]ยถ

This builds an absolute url from provided root_url and path.

Parameters:
  • path โ€“ The path to join with the root url.

  • normalization_ignore_chars โ€“ List of characters to ignore when normalizing the url path. By default, all unsafe characters are stripped.

duck.utils.path.is_absolute_url(url: str)[source]ยถ

Check whether a URL is s complete url including scheme (e.g. โ€˜httpsโ€™)

duck.utils.path.is_good_url_path(url_path: str) โ†’ bool[source]ยถ

Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.

Parameters:

url_path โ€“ The URL path string to validate.

Returns:

True if the URL is in the specified format and has no disallowed characters, False otherwise.

Return type:

bool

duck.utils.path.joinpaths(path1: Union[str, pathlib.Path], path2: Union[str, pathlib.Path], *more)[source]ยถ

Returns joined paths but makes sure all paths are included in the final path rather than os.path.join.

duck.utils.path.normalize_url_path(url_path: str, ignore_chars: Optional[List[str]] = None) โ†’ str[source]ยถ

Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g โ€œ<โ€, string quotes (etc), replacing back slashes and lowercasing the scheme.

duck.utils.path.paths_are_same(path1, path2)[source]ยถ

Checks if two paths point to the same location, handling case-insensitivity and different separators.

duck.utils.path.replace_hostname(url: str, hostname: str) โ†’ str[source]ยถ

Replaces the hostname in a URL.

If URL doesnโ€™t have scheme (e.g https) or is a urlpath, no modifications will be done.

Parameters:
  • url โ€“ The target URL.

  • new_hostname โ€“ The new hostname or domain.

duck.utils.path.sanitize_path_segment(segment)[source]ยถ

Sanitize a path segment to prevent directory traversal attacks. (same as normalize_url_path)

Parameters:

segment โ€“ The path segment to sanitize.

Returns:

The sanitized path segment.

Return type:

str

duck.utils.path.url_normalize(url: str, ignore_chars: Optional[List[str]] = None) โ†’ str[source]ยถ

Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g โ€œ<โ€, string quotes (etc), replacing back slashes and lowercasing the scheme.