Client
The client module provides a client interface for interacting with OAI-PMH services.
This module defines the Scythe class, which facilitates the harvesting of records, identifiers, and sets from OAI-PMH compliant repositories. It handles various OAI-PMH requests, manages pagination with resumption tokens, and supports customizable error handling and retry logic.
Scythe
A client for interacting with OAI-PMH interfaces, facilitating the harvesting of records, identifiers, and sets.
The Scythe class is designed to simplify the process of making OAI-PMH requests and processing the responses. It supports various OAI-PMH verbs and handles pagination through resumption tokens, error handling, and retry logic.
Attributes:
Name | Type | Description |
---|---|---|
endpoint |
The base URL of the OAI-PMH service. |
|
http_method |
The HTTP method to use for requests (either 'GET' or 'POST'). |
|
iterator |
The iterator class to be used for iterating over responses. |
|
max_retries |
The maximum number of retries for a request in case of failures. |
|
retry_status_codes |
The HTTP status codes on which to retry the request. |
|
default_retry_after |
The default wait time (in seconds) between retries if no 'retry-after' header is present. |
|
class_mapping |
A mapping from OAI verbs to classes representing OAI items. |
|
encoding |
The character encoding for decoding responses. Defaults to the server's specified encoding. |
|
auth |
Optional authentication credentials for accessing the OAI-PMH interface. |
|
timeout |
The timeout (in seconds) for HTTP requests. |
Examples:
>>> with Scythe("https://zenodo.org/oai2d") as scythe:
>>> records = scythe.list_records()
>>> for record in records:
>>> print(record)
Source code in src/oaipmh_scythe/client.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 |
|
client: httpx.Client
property
Provide a reusable HTTP client instance for making requests.
This property ensures that an httpx.Client
instance is created and maintained for
the lifecycle of the Scythe
instance. It handles the creation of the client and
ensures that a new client is created if the existing one is closed.
Returns:
Type | Description |
---|---|
Client
|
A reusable HTTP client instance for making HTTP requests. |
close()
Close the internal HTTP client if it exists and is open.
This method is responsible for explicitly closing the httpx.Client
instance used
by the Scythe
class. It should be called when the client is no longer needed, to
ensure proper cleanup and release of resources.
Note
It's recommended to call this method at the end of operations or when the Scythe
instance is no longer in use, especially if it's not being used as a context manager.
Source code in src/oaipmh_scythe/client.py
get_record(identifier, metadata_prefix='oai_dc')
Issue a GetRecord request to the OAI server.
Send a request to the OAI server to retrieve a specific record. The request is constructed with the provided identifier and metadata prefix. The method then processes and returns the relevant OAIResponse or Record object using an iterator.
Ref: https://openarchives.org/OAI/openarchivesprotocol.html#GetRecord
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier |
str
|
A unique identifier for the record to be retrieved from the OAI server. |
required |
metadata_prefix |
str
|
The metadata format to be returned for the record. Defaults to "oai_dc". |
'oai_dc'
|
Returns:
Type | Description |
---|---|
OAIResponse | Record
|
An OAIResponse or Record object representing the requested record. |
Raises:
Type | Description |
---|---|
CannotDisseminateFormat
|
If the specified metadata_prefix is not supported by the OAI server for the requested record. |
IdDoesNotExist
|
If the specified identifier does not correspond to any record in the OAI server. |
Source code in src/oaipmh_scythe/client.py
get_retry_after(http_response)
Determine the appropriate time to wait before retrying a request, based on the server's response.
Check the status code of the provided HTTP response. If it's 503 (Service Unavailable), attempt to parse the 'retry-after' header to find the suggested wait time. If parsing fails or a different status code is received, use the default retry time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
http_response |
Response
|
The HTTP response received from the server. |
required |
Returns:
Type | Description |
---|---|
int | float
|
An integer representing the number of seconds to wait before retrying the request. |
Source code in src/oaipmh_scythe/client.py
harvest(query)
Perform an HTTP request to the OAI server with the given parameters.
Send an OAI-PMH request to the server using the specified parameters. Handle retry logic for failed requests based on the configured retry settings and response status codes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query |
dict[str, str]
|
A dictionary containing the request parameters. |
required |
Returns:
Type | Description |
---|---|
OAIResponse
|
An OAIResponse object encapsulating the server's response. |
Raises:
Type | Description |
---|---|
HTTPError
|
If the HTTP request fails after the maximum number of retries. |
Source code in src/oaipmh_scythe/client.py
identify()
Issue an Identify request to the OAI server.
Send a request to identify the OAI server and retrieve its information. This includes details such as the repository name, the base URL, the protocol version, and other relevant data about the OAI server. It's useful for understanding the capabilities and configuration of the server.
Ref: https://openarchives.org/OAI/openarchivesprotocol.html#Identify
Returns:
Type | Description |
---|---|
Identify
|
An object encapsulating the server's identify response, which contains various pieces of information about the OAI server. |
Source code in src/oaipmh_scythe/client.py
list_identifiers(from_=None, until=None, metadata_prefix='oai_dc', set_=None, resumption_token=None, ignore_deleted=False)
Issue a ListIdentifiers request to the OAI server.
Send a request to list record identifiers from the OAI server. This method allows filtering records based on date range, set membership, and metadata format. It also supports pagination through resumption tokens and has an option to ignore deleted records.
Ref: https://openarchives.org/OAI/openarchivesprotocol.html#ListIdentifiers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
from_ |
str | None
|
An optional date string specifying the start of a date range for harvesting records. |
None
|
until |
str | None
|
An optional date string specifying the end of a date range for harvesting records. |
None
|
metadata_prefix |
str
|
The metadata format for the records to be harvested. Defaults to "oai_dc". |
'oai_dc'
|
set_ |
str | None
|
An optional set identifier to restrict the harvest to records within a specific set. |
None
|
resumption_token |
str | None
|
An optional token for pagination, used to continue a request for the next page of identifiers. |
None
|
ignore_deleted |
bool
|
If True, skip records flagged as deleted in the response. |
False
|
Yields:
Type | Description |
---|---|
OAIResponse | Header
|
An iterator over OAIResponse or Header objects, each representing an individual record identifier or response from the server. |
Raises:
Type | Description |
---|---|
BadResumptionToken
|
If the provided resumption token is invalid or expired. |
CannotDisseminateFormat
|
If the specified metadata_prefix is not supported by the OAI server. |
NoRecordsMatch
|
If no records match the provided criteria. |
NoSetHierarchy
|
If set-based harvesting is requested but the OAI server does not support sets. |
Source code in src/oaipmh_scythe/client.py
list_metadata_formats(identifier=None)
Issue a ListMetadataFormats request to the OAI server.
Send a request to list the metadata formats available from the OAI server. This can be done for the entire repository or for a specific record if an identifier is provided. The method constructs a query and yields an iterator over OAIResponse or MetadataFormat objects, each representing a different metadata format or response from the server.
Ref: https://openarchives.org/OAI/openarchivesprotocol.html#ListMetadataFormats
Parameters:
Name | Type | Description | Default |
---|---|---|---|
identifier |
str | None
|
An optional unique identifier for a specific record to query available metadata formats. If None, all metadata formats available in the repository are listed. |
None
|
Yields:
Type | Description |
---|---|
OAIResponse | MetadataFormat
|
An iterator over OAIResponse or MetadataFormat objects, each representing an individual metadata format or response from the server. |
Raises:
Type | Description |
---|---|
IdDoesNotExist
|
If the specified identifier does not correspond to any record in the OAI server. |
NoMetadataFormats
|
If there are no metadata formats available for the requested record or repository. |
Source code in src/oaipmh_scythe/client.py
list_records(from_=None, until=None, metadata_prefix='oai_dc', set_=None, resumption_token=None, ignore_deleted=False)
Issue a ListRecords request to the OAI server.
Send a request to list records from the OAI server, allowing for selective harvesting based on date range, set membership, and metadata format. This method supports pagination via resumption tokens and can optionally ignore records marked as deleted.
Ref: https://openarchives.org/OAI/openarchivesprotocol.html#ListRecords
Parameters:
Name | Type | Description | Default |
---|---|---|---|
from_ |
str | None
|
An optional date string specifying the start of a date range for harvesting records. |
None
|
until |
str | None
|
An optional date string specifying the end of a date range for harvesting records. |
None
|
metadata_prefix |
str
|
The metadata format for the records to be harvested. Defaults to "oai_dc". |
'oai_dc'
|
set_ |
str | None
|
An optional set identifier to restrict the harvest to records within a specific set. |
None
|
resumption_token |
str | None
|
An optional token for pagination, used to continue a request for the next page of records. |
None
|
ignore_deleted |
bool
|
If True, skip records flagged as deleted in the response. |
False
|
Yields:
Type | Description |
---|---|
OAIResponse | Record
|
An iterator over OAIResponse or Record objects, each representing an individual record or response from the server. |
Raises:
Type | Description |
---|---|
BadArgument
|
If the arguments provided do not conform to the expectations of the OAI server. |
BadResumptionToken
|
If the provided resumption token is invalid or expired. |
CannotDisseminateFormat
|
If the specified metadata_prefix is not supported by the OAI server. |
NoRecordsMatch
|
If no records match the provided criteria. |
NoSetHierarchy
|
If set-based harvesting is requested but the OAI server does not support sets. |
Source code in src/oaipmh_scythe/client.py
list_sets(resumption_token=None)
Issue a ListSets request to the OAI server.
Send a request to list all sets defined in the OAI server. Sets are used to categorize records in the OAI repository. This method allows for the retrieval of these sets, optionally using a resumption token to handle pagination.
Ref: https://openarchives.org/OAI/openarchivesprotocol.html#ListSets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
resumption_token |
str | None
|
An optional token for pagination, used to continue a request for the next batch of sets. |
None
|
Yields:
Type | Description |
---|---|
OAIResponse | Set
|
An iterator over OAIResponse or Set objects, representing an individual set or response from the server. |
Raises:
Type | Description |
---|---|
BadResumptionToken
|
If the provided resumption token is invalid or expired. |
NoSetHierarchy
|
If the OAI server does not support sets or has no set hierarchy available. |