FileDownloader
- class FileDownloader(file_url: str, meta_filepath: str, dst_dir: str, filename: str | None = None, auto_unzip: bool = True, expire_hours=12, expiry_time_format='%Y/%m/%d, %H:%M:%S', large_file_hint=False, timeout=(None, None), http_client: HttpClient = HttpClient.REQUESTS)[source]
Bases:
objectClass for managing single file download
Methods
Initialize a downloader for one remote file and its metadata.
Return whether only the metadata expiry timestamp should be refreshed.
Decide whether the target file should be downloaded again.
Download the target file and refresh its metadata cache.
Write or refresh the JSON metadata file for the current download.
- __init__(file_url: str, meta_filepath: str, dst_dir: str, filename: str | None = None, auto_unzip: bool = True, expire_hours=12, expiry_time_format='%Y/%m/%d, %H:%M:%S', large_file_hint=False, timeout=(None, None), http_client: HttpClient = HttpClient.REQUESTS) None[source]
Initialize a downloader for one remote file and its metadata.
- Parameters:
file_url – Source URL for the target file.
meta_filepath – Path to the JSON metadata file used to store cache fields such as URL, expiry, ETag, and SHA-256.
dst_dir – Local destination directory where the downloaded file is stored.
filename – Optional output filename. If
None, the client chooses a name (typically derived from the URL).auto_unzip – If
True(default), unzip compressed downloads when supported by the active HTTP client.expire_hours – Number of hours to add to
datetime.now()when writing metadataexpiry.expiry_time_format –
datetime.strftime/strptimeformat used for metadataexpiryvalues.large_file_hint – If
True, fetch headers early to estimate file size and prefer large-file transfer logic for big downloads.timeout – Network timeout tuple passed to helper requests. Conventionally
(connect_timeout, read_timeout).http_client – HTTP backend selection, either
HttpClient.REQUESTSorHttpClient.AIOHTTP.
- check_if_expire_date_need_update()[source]
Return whether only the metadata expiry timestamp should be refreshed.
This helper is typically called after
check_if_file_need_update()has fetched remote state. It returnsTruewhen remote content is unchanged and therefore the local file can be kept while extending the metadataexpiryvalue.Match conditions (either is sufficient):
SHA-256 path:
self.new_sha256is available and equalsself.meta_sha256.ETag fallback path:
self.new_etagis available and equalsself.meta_etag.
- Returns:
Trueif content identity is unchanged and expiry metadata should be updated, otherwiseFalse.- Return type:
- check_if_file_need_update()[source]
Decide whether the target file should be downloaded again.
- Returns:
Truewhen the file should be downloaded/re-downloaded,Falsewhen the existing local file can be reused.- Return type:
- Decision flow:
If the metadata file is missing, return
True.If the stored
urldiffers fromself.file_url(or is missing), returnTrue.If the metadata
expiryis still valid, returnFalse.If expired (or expiry is invalid/missing), compare remote content state: - Prefer SHA-256 comparison when available. - Fall back to ETag comparison if SHA-256 cannot be obtained. - If neither reliable value is available, return
True.
- download_file_and_update_metadata()[source]
Download the target file and refresh its metadata cache.
The method selects an HTTP backend from
self.http_clientand then chooses download strategy based on size:If
self.large_file_hintisTrue, it first retrieves response headers and storesself.file_size.If
self.file_sizeis known and greater than 20 MB, it usesfetch_large_file.Otherwise, it uses
fetch_file.
After a successful download call,
self.new_etagis updated from the client response andupdate_metadata()is called to write metadata fields (URL, expiry, ETag, SHA-256) toself.meta_filepath.- Raises:
Exception – Propagates exceptions raised by network/header retrieval, download client calls, or metadata writing.
- update_metadata()[source]
Write or refresh the JSON metadata file for the current download.
The metadata file at
self.meta_filepathis created (including parent directories) and overwritten with these fields:url:self.file_urlexpiry: current time plusself.expire_hoursformatted withself.expiry_time_formatetag:self.new_etagsha256:self.new_sha256
If
self.new_sha256is not already populated, SHA-256 is retrieved from the remote resource before writing metadata.- Raises:
Exception – Propagates exceptions raised while fetching SHA-256, creating directories, or writing the metadata file.