FileDownloader

class FileDownloader(file_url: str, meta_filepath: str, dst_dir: str, filename: str | None = None, auto_unzip: bool = True, expire_hours=12, expiry_time_format='%Y/%m/%d, %H:%M:%S', large_file_hint=False, timeout=(None, None), http_client: HttpClient = HttpClient.REQUESTS)[source]

Bases: object

Class for managing single file download

Methods

__init__

Initialize a downloader for one remote file and its metadata.

check_if_expire_date_need_update

Return whether only the metadata expiry timestamp should be refreshed.

check_if_file_need_update

Decide whether the target file should be downloaded again.

download_file_and_update_metadata

Download the target file and refresh its metadata cache.

update_metadata

Write or refresh the JSON metadata file for the current download.

__init__(file_url: str, meta_filepath: str, dst_dir: str, filename: str | None = None, auto_unzip: bool = True, expire_hours=12, expiry_time_format='%Y/%m/%d, %H:%M:%S', large_file_hint=False, timeout=(None, None), http_client: HttpClient = HttpClient.REQUESTS) None[source]

Initialize a downloader for one remote file and its metadata.

Parameters:
  • file_url – Source URL for the target file.

  • meta_filepath – Path to the JSON metadata file used to store cache fields such as URL, expiry, ETag, and SHA-256.

  • dst_dir – Local destination directory where the downloaded file is stored.

  • filename – Optional output filename. If None, the client chooses a name (typically derived from the URL).

  • auto_unzip – If True (default), unzip compressed downloads when supported by the active HTTP client.

  • expire_hours – Number of hours to add to datetime.now() when writing metadata expiry.

  • expiry_time_formatdatetime.strftime/strptime format used for metadata expiry values.

  • large_file_hint – If True, fetch headers early to estimate file size and prefer large-file transfer logic for big downloads.

  • timeout – Network timeout tuple passed to helper requests. Conventionally (connect_timeout, read_timeout).

  • http_client – HTTP backend selection, either HttpClient.REQUESTS or HttpClient.AIOHTTP.

check_if_expire_date_need_update()[source]

Return whether only the metadata expiry timestamp should be refreshed.

This helper is typically called after check_if_file_need_update() has fetched remote state. It returns True when remote content is unchanged and therefore the local file can be kept while extending the metadata expiry value.

Match conditions (either is sufficient):

  • SHA-256 path: self.new_sha256 is available and equals self.meta_sha256.

  • ETag fallback path: self.new_etag is available and equals self.meta_etag.

Returns:

True if content identity is unchanged and expiry metadata should be updated, otherwise False.

Return type:

bool

check_if_file_need_update()[source]

Decide whether the target file should be downloaded again.

Returns:

True when the file should be downloaded/re-downloaded, False when the existing local file can be reused.

Return type:

bool

Decision flow:
  1. If the metadata file is missing, return True.

  2. If the stored url differs from self.file_url (or is missing), return True.

  3. If the metadata expiry is still valid, return False.

  4. If expired (or expiry is invalid/missing), compare remote content state: - Prefer SHA-256 comparison when available. - Fall back to ETag comparison if SHA-256 cannot be obtained. - If neither reliable value is available, return True.

download_file_and_update_metadata()[source]

Download the target file and refresh its metadata cache.

The method selects an HTTP backend from self.http_client and then chooses download strategy based on size:

  • If self.large_file_hint is True, it first retrieves response headers and stores self.file_size.

  • If self.file_size is known and greater than 20 MB, it uses fetch_large_file.

  • Otherwise, it uses fetch_file.

After a successful download call, self.new_etag is updated from the client response and update_metadata() is called to write metadata fields (URL, expiry, ETag, SHA-256) to self.meta_filepath.

Raises:

Exception – Propagates exceptions raised by network/header retrieval, download client calls, or metadata writing.

update_metadata()[source]

Write or refresh the JSON metadata file for the current download.

The metadata file at self.meta_filepath is created (including parent directories) and overwritten with these fields:

  • url: self.file_url

  • expiry: current time plus self.expire_hours formatted with self.expiry_time_format

  • etag: self.new_etag

  • sha256: self.new_sha256

If self.new_sha256 is not already populated, SHA-256 is retrieved from the remote resource before writing metadata.

Raises:

Exception – Propagates exceptions raised while fetching SHA-256, creating directories, or writing the metadata file.