Skip to content

Utils

The utils module provides utility functions for handling XML data in the context of OAI-PMH services.

This module includes functions essential for parsing and transforming XML data obtained from OAI-PMH responses. These utilities facilitate the extraction of namespaces and conversion of XML elements into more accessible data structures.

Functions:

Name Description
log_response

Log the details of an HTTP response.

remove_none_values

Remove keys from the dictionary where the value is None.

filter_dict_except_resumption_token

Filter keys from the dictionary, if resumption token is not None.

get_namespace

Extracts the namespace from an XML element.

xml_to_dict

Converts an XML tree or element into a dictionary representation.

filter_dict_except_resumption_token(d)

Filter out keys with None values from a dictionary, with special handling for 'resumptionToken'.

If 'resumptionToken' is present and not None, and there are other non-None keys, log a warning and retain only 'resumptionToken' and 'verb' keys. Otherwise, return a dictionary excluding any keys with None values.

Parameters:

Name Type Description Default
d dict[str, Any | None]

The dictionary to filter.

required

Returns:

Type Description
dict[str, Any]

dict[str, Any]: A filtered dictionary based on the defined criteria.

Source code in src/oaipmh_scythe/utils.py
def filter_dict_except_resumption_token(d: dict[str, Any | None]) -> dict[str, Any]:
    """Filter out keys with None values from a dictionary, with special handling for 'resumptionToken'.

    If 'resumptionToken' is present and not None, and there are other non-None keys, log a warning and
    retain only 'resumptionToken' and 'verb' keys. Otherwise, return a dictionary excluding any keys
    with None values.

    Args:
        d (dict[str, Any | None]): The dictionary to filter.

    Returns:
        dict[str, Any]: A filtered dictionary based on the defined criteria.
    """
    allowed_keys = ("verb", "resumptionToken")
    resumption_token_present = d["resumptionToken"] is not None
    non_empty_keys = [k for k, v in d.items() if v is not None and k not in allowed_keys]
    if resumption_token_present and resumption_token_present:
        logger.warning(
            "`resumption_token` should not be used in combination with other parameters. Dropping %s", non_empty_keys
        )
        return {k: v for k, v in d.items() if k in allowed_keys}
    return d

get_namespace(element)

Return the namespace URI of an XML element.

Extracts and returns the namespace URI from the tag of the given XML element. The namespace URI is enclosed in curly braces at the start of the tag. If the element does not have a namespace, None is returned.

Parameters:

Name Type Description Default
element _Element

The XML element from which to extract the namespace.

required

Returns:

Type Description
str | None

The namespace URI as a string if the element has a namespace, otherwise None.

Source code in src/oaipmh_scythe/utils.py
def get_namespace(element: etree._Element) -> str | None:
    """Return the namespace URI of an XML element.

    Extracts and returns the namespace URI from the tag of the given XML element.
    The namespace URI is enclosed in curly braces at the start of the tag.
    If the element does not have a namespace, `None` is returned.

    Args:
        element: The XML element from which to extract the namespace.

    Returns:
        The namespace URI as a string if the element has a namespace, otherwise `None`.
    """
    match = re.search(r"(\{.*\})", element.tag)
    return match.group(1) if match else None

log_response(response)

Log the details of an HTTP response.

This function logs the HTTP method, URL, and status code of the response for debugging purposes. It uses the 'debug' logging level to provide detailed diagnostic information.

Parameters:

Name Type Description Default
response Response

The response object received from an HTTP request.

required

Returns:

Type Description
None

None

Source code in src/oaipmh_scythe/utils.py
def log_response(response: httpx.Response) -> None:
    """Log the details of an HTTP response.

    This function logs the HTTP method, URL, and status code of the response for debugging purposes.
    It uses the 'debug' logging level to provide detailed diagnostic information.

    Args:
        response: The response object received from an HTTP request.

    Returns:
        None
    """
    logger.debug(
        "[http] Response: %s %s - Status %s", response.request.method, response.request.url, response.status_code
    )

remove_none_values(d)

Remove keys from the dictionary where the value is None.

Parameters:

Name Type Description Default
d dict[str, Any | None]

The input dictionary.

required

Returns:

Type Description
dict[str, Any]

A new dictionary with the same keys as the input dictionary but none values have been removed.

Source code in src/oaipmh_scythe/utils.py
def remove_none_values(d: dict[str, Any | None]) -> dict[str, Any]:
    """Remove keys from the dictionary where the value is `None`.

    Args:
        d: The input dictionary.

    Returns:
        A new dictionary with the same keys as the input dictionary but none values have been removed.
    """
    return {key: value for key, value in d.items() if value is not None}

xml_to_dict(tree, paths=None, nsmap=None, strip_ns=False)

Convert an XML tree to a dictionary, with options for custom XPath and namespace handling.

This function takes an XML element tree and converts it into a dictionary. The keys of the dictionary are the tags of the XML elements, and the values are lists of the text contents of these elements. It offers options to apply specific XPath expressions, handle namespaces, and optionally strip namespaces from the tags in the resulting dictionary.

Parameters:

Name Type Description Default
tree _Element

The root element of the XML tree to be converted.

required
paths list[str] | None

An optional list of XPath expressions to apply on the XML tree. If None or not provided, the function will consider all elements in the tree.

None
nsmap dict[str, str] | None

An optional dictionary for namespace mapping, used to provide shorter, more readable paths in XPath expressions. If None or not provided, no namespace mapping is applied.

None
strip_ns bool

A boolean flag indicating whether to remove namespaces from the element tags in the resulting dictionary. Defaults to False.

False

Returns:

Type Description
dict[str, list[str | None]]

A dictionary where each key is an element tag (with or without namespace, based on

dict[str, list[str | None]]

strip_ns) and each value is a list of strings representing the text content of

dict[str, list[str | None]]

each element with that tag.

Source code in src/oaipmh_scythe/utils.py
def xml_to_dict(
    tree: etree._Element, paths: list[str] | None = None, nsmap: dict[str, str] | None = None, strip_ns: bool = False
) -> dict[str, list[str | None]]:
    """Convert an XML tree to a dictionary, with options for custom XPath and namespace handling.

    This function takes an XML element tree and converts it into a dictionary. The keys of the
    dictionary are the tags of the XML elements, and the values are lists of the text contents
    of these elements. It offers options to apply specific XPath expressions, handle namespaces,
    and optionally strip namespaces from the tags in the resulting dictionary.

    Args:
        tree: The root element of the XML tree to be converted.
        paths: An optional list of XPath expressions to apply on the XML tree. If None or not
            provided, the function will consider all elements in the tree.
        nsmap: An optional dictionary for namespace mapping, used to provide shorter, more
            readable paths in XPath expressions. If None or not provided, no namespace
            mapping is applied.
        strip_ns: A boolean flag indicating whether to remove namespaces from the element tags
            in the resulting dictionary. Defaults to False.

    Returns:
        A dictionary where each key is an element tag (with or without namespace, based on
        `strip_ns`) and each value is a list of strings representing the text content of
        each element with that tag.
    """
    paths = paths or [".//"]
    nsmap = nsmap or {}
    fields = defaultdict(list)
    for path in paths:
        elements = tree.findall(path, nsmap)
        for element in elements:
            tag = re.sub(r"\{.*\}", "", element.tag) if strip_ns else element.tag
            fields[tag].append(element.text)
    return dict(fields)