general_function#

Auxiliary functions

build_non_existing_dirs(file_path: str)[source]#

Build non-existing directories for a given file path.

Parameters:

file_path (str) – The file path.

Returns:

True if directories were created successfully.

Return type:

bool

camel_to_snake(camel_str: str) str[source]#

Convert a camelCase string to snake_case.

Parameters:

camel_str (str) – The camelCase string.

Returns:

The snake_case string.

Return type:

str

convert_list_to_string(list_data: list) str[source]#

Convert a list to a comma-separated string.

Parameters:

list_data (list) – The list to convert.

Returns:

The comma-separated string.

Return type:

str

dict_to_duckdb(data: dict[str, DataFrame], file_path: str)[source]#

Save a dictionary of Polars DataFrames as a DuckDB file.

Parameters:
  • data (dict[str, pl.DataFrame]) – The dictionary of Polars DataFrames.

  • file_path (str) – The DuckDB file path.

dict_to_gpkg(data: dict, file_path: str, srid: int = 2056)[source]#

Save a dictionary of Polars DataFrames as a GeoPackage file.

Parameters:
  • data (dict) – The dictionary of Polars DataFrames.

  • file_path (str) – The GeoPackage file path.

  • srid (int, optional) – The SRID. Defaults to SWISS_SRID.

dictionary_key_filtering(dictionary: dict, key_list: list) dict[source]#

Filter a dictionary by a list of keys.

Parameters:
  • dictionary (dict) – The dictionary to filter.

  • key_list (list) – The list of keys to keep.

Returns:

The filtered dictionary.

Return type:

dict

download_from_switch(switch_folder_path: str, switch_link: str, switch_pass: str, local_folder_path: str = '.cache', download_anyway: bool = False)[source]#

Download files from a SWITCH directory to a local folder.

Parameters:
  • switch_folder_path (str) – The SWITCH folder path.

  • switch_link (str) – The public link to the SWITCH folder.

  • switch_pass (str) – The password for the SWITCH folder.

  • local_folder_path (str, optional) – The local folder path. Defaults to “.cache”.

  • download_anyway (bool, optional) – Whether to download files even if they already exist locally. Defaults to False.

duckdb_to_dict(file_path: str) dict[source]#

Load a DuckDB file into a dictionary of Polars DataFrames.

Parameters:

file_path (str) – The DuckDB file path.

Returns:

The dictionary of Polars DataFrames.

Return type:

dict

extract_archive(file_name: str, extracted_folder: str | None = None, force_extraction: bool = False) None[source]#

Extract an archive file to a specified folder.

Parameters:
  • file_name (str) – The name of the archive file.

  • extracted_folder (Optional[str], optional) – The folder to extract the files to. Defaults to None.

  • force_extraction (bool, optional) – Whether to force extraction even if the folder already exists. Defaults to False.

generate_log(name: str, log_level: str = 'info') Logger[source]#

Generate a logger with the specified name and log level.

Parameters:
  • name (str) – The name of the logger.

  • log_level (str, optional) – The log level. Defaults to “info”.

Returns:

The generated logger.

Return type:

logging.Logger

generate_uuid(base_value: str, base_uuid: UUID | None = None, added_string: str = '') str[source]#

Generate a UUID based on a base value, base UUID, and an optional added string.

Parameters:
  • base_value (str) – The base value for generating the UUID.

  • base_uuid (uuid.UUID, optional) – The base UUID for generating the UUID.

  • added_string (str, optional) – The optional added string. Defaults to “”.

Returns:

The generated UUID.

Return type:

str

initialize_output_file(file_path: str)[source]#

Initialize an output file by creating necessary directories and removing the file if it already exists.

Parameters:

file_path (str) – The path of the file to initialize.

modify_string(string: str, format_str: dict) str[source]#

Modify a string by replacing substrings according to a format dictionary - Input could contains RegEx. - The replacement is done in the order of the dictionary keys.

Parameters:
  • string (str) – Input string.

  • format_str (dict) – Dictionary containing the substrings to be replaced and their replacements.

Returns:

Modified string.

Return type:

str

pl_to_dict(df: DataFrame) dict[source]#

Convert a Polars DataFrame with two columns into a dictionary. It is assumed that the first column contains the keys and the second column contains the values. The keys must be unique but Null values will be filtered.

Parameters:

df (pl.DataFrame) – Polars DataFrame with two columns.

Returns:

Dictionary representation of the DataFrame.

Return type:

dict

Raises:

ValueError – If the DataFrame does not have exactly two columns or if the keys are not unique.

pl_to_dict_with_tuple(df: DataFrame) dict[source]#

Convert a Polars DataFrame with two columns into a dictionary where the first column contains tuples as keys and the second column contains the values.

Parameters:

df (pl.DataFrame) – Polars DataFrame with two columns.

Returns:

Dictionary representation of the DataFrame with tuples as keys.

Return type:

dict

Raises:

ValueError – If the DataFrame does not have exactly two columns.

Example: >>> import polars as pl >>> data = {‘key’: [[1, 2], [3, 4], [5, 6]], ‘value’: [10, 20, 30]} >>> df = pl.DataFrame(data) >>> pl_to_dict_with_tuple(df) {(1, 2): 10, (3, 4): 20, (5, 6): 30}

scan_folder(folder_name: str, extension: str | list[str] | None = None, file_names: str | None = None) list[str][source]#

Scan a folder and return a list of file paths with specified extensions or names.

Parameters:
  • folder_name (str) – The folder to scan.

  • extension (Optional[Union[str, list[str]]], optional) – The file extensions to filter by. Defaults to None.

  • file_names (Optional[str], optional) – The file names to filter by. Defaults to None.

Returns:

List of file paths.

Return type:

list[str]

scan_switch_directory(oc: Client, local_folder_path: str, switch_folder_path: str, download_anyway: bool) list[str][source]#

Scan a directory on the SWITCH server and return a list of file paths.

Parameters:
  • oc (owncloud.Client) – The ownCloud client.

  • local_folder_path (str) – The local folder path.

  • switch_folder_path (str) – The SWITCH folder path.

  • download_anyway (bool) – Whether to download files even if they already exist locally.

Returns:

List of file paths.

Return type:

list[str]

snake_to_camel(snake_str: str) str[source]#

Convert a snake_case string to CamelCase.

Parameters:

snake_str (str) – The snake_case string.

Returns:

The CamelCase string.

Return type:

str

table_to_gpkg(table: DataFrame, gpkg_file_name: str, layer_name: str, srid: int = 2056)[source]#

Save a Polars DataFrame as a GeoPackage file. As GeoPackage does not support list columns, the list columns are joined into a single string separated with a comma.

Parameters:
  • table (pl.DataFrame) – The Polars DataFrame.

  • gpkg_file_name (str) – The GeoPackage file name.

  • layer_name (str) – The layer name.

  • srid (int, optional) – The SRID. Defaults to SWISS_SRID.