duck.utils.path¶
Module for Path Operations, .e.g path sanitization, manipulations, joining etc.
Module Contents¶
Functions¶
This builds an absolute url from provided root_url and path. |
|
Check whether a URL is s complete url including scheme (e.g. ‘https’) |
|
Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc. |
|
Returns joined paths but makes sure all paths are included in the final path rather than |
|
Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme. |
|
Checks if two paths point to the same location, handling case-insensitivity and different separators. |
|
Replaces the hostname in a URL. |
|
Sanitize a path segment to prevent directory traversal attacks. (same as |
|
Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme. |
Data¶
API¶
- duck.utils.path.URL_PATH_REGEX¶
‘^[a-zA-Z0-9-._~:/?#\ue001\ue001@!\(&\\()*+,;=%]*\)’
- duck.utils.path.build_absolute_uri(root_url: str, path: str, normalization_ignore_chars: Optional[List[str]] = None) str[source]¶
This builds an absolute url from provided root_url and path.
- Parameters:
path – The path to join with the root url.
normalization_ignore_chars – List of characters to ignore when normalizing the url path. By default, all unsafe characters are stripped.
- duck.utils.path.is_absolute_url(url: str)[source]¶
Check whether a URL is s complete url including scheme (e.g. ‘https’)
- duck.utils.path.is_good_url_path(url_path: str) bool[source]¶
Validates if the URL path conforms to RFC 3986 standards. Only allows specific special characters. Also checks for disallowed characters like space, tilde (~), etc.
- Parameters:
url_path – The URL path string to validate.
- Returns:
True if the URL is in the specified format and has no disallowed characters, False otherwise.
- Return type:
bool
- duck.utils.path.joinpaths(path1: Union[str, pathlib.Path], path2: Union[str, pathlib.Path], *more)[source]¶
Returns joined paths but makes sure all paths are included in the final path rather than
os.path.join.
- duck.utils.path.normalize_url_path(url_path: str, ignore_chars: Optional[List[str]] = None) str[source]¶
Normalizes a URL path by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.
- duck.utils.path.paths_are_same(path1, path2)[source]¶
Checks if two paths point to the same location, handling case-insensitivity and different separators.
- duck.utils.path.replace_hostname(url: str, hostname: str) str[source]¶
Replaces the hostname in a URL.
If URL doesn’t have scheme (e.g https) or is a urlpath, no modifications will be done.
- Parameters:
url – The target URL.
new_hostname – The new hostname or domain.
- duck.utils.path.sanitize_path_segment(segment)[source]¶
Sanitize a path segment to prevent directory traversal attacks. (same as
normalize_url_path)- Parameters:
segment – The path segment to sanitize.
- Returns:
The sanitized path segment.
- Return type:
str
- duck.utils.path.url_normalize(url: str, ignore_chars: Optional[List[str]] = None) str[source]¶
Normalizes a URL by removing consecutive slashes, adding a leading slash, removing trailing slashes, removing disallowed characters, e.g “<”, string quotes (etc), replacing back slashes and lowercasing the scheme.