polars_function#

cast_boolean(col: Expr) Expr[source]#

Cast a column to boolean based on predefined replacements.

Parameters:

col (pl.Expr) – The column to cast.

Returns:

The casted boolean column.

Return type:

pl.Expr

cast_float(float_str: Expr) Expr[source]#

Cast a string column to float, modifying the string format as needed.

Parameters:

float_str (pl.Expr) – The string column to cast.

Returns:

The casted float column.

Return type:

pl.Expr

cast_to_utc_timestamp(timestamp: Expr, initial_time_zone: str = 'Europe/Zurich') Expr[source]#

Convert a timestamp column to UTC from the specified initial time zone.

Parameters:
  • timestamp (pl.Expr) – The timestamp column to convert.

  • initial_time_zone (str, optional) – The initial time zone of the timestamps. Defaults to “Europe/Zurich”.

Returns:

The timestamp column converted to UTC.

Return type:

pl.Expr

concat_list_of_list(col_list: Expr) Expr[source]#

Concatenate a column of lists into a list containing sublist.

Parameters:

col_list (pl.Expr) – The column of lists to concatenate.

Returns:

The concatenated list column.

Return type:

pl.Expr

cum_count_duplicates(cols_names: str | list[str]) Expr[source]#

Calculate the cumulative count of duplicate values in a specified column of a DataFrame, assigning half of the count as strict positive values and the other half as strict negative values.

Parameters:

cols_names (Union[str, list[str]]) – The name of the column to check for duplicates.

Returns:

A polar expression showing the cumulative count of duplicates.

Return type:

pl.Expr

Example:#

>>> df = pl.DataFrame({"a": [1, 1, 2, 3, 4, 4, 4]})
... df.with_columns(
...    cum_count_duplicates(cols_names="a").alias("cum_count")
... )
shape: (7, 2)
┌─────┬────────────┐
│ id  ┆ cum_count  │
│ --- ┆ ---        │
│ i64 ┆ i64        │
╞═════╪════════════╡
│ 1   ┆ -1         │
│ 1   ┆ 1          │
│ 2   ┆ 1          │
│ 3   ┆ 1          │
│ 4   ┆ -1         │
│ 4   ┆ 1          │
│ 4   ┆ 2          │
└─────┴────────────┘
digitize_col(col: Expr, min: float, max: float, nb_state: int) Expr[source]#

Digitize a column into discrete states based on the specified number of states.

Parameters:
  • col (pl.Expr) – The column to digitize.

  • min (float) – The minimum value of the column.

  • max (float) – The maximum value of the column.

  • nb_state (int) – The number of discrete states.

Returns:

The digitized column.

Return type:

pl.Expr

generate_random_uuid(col: Expr) Expr[source]#

Generate a random UUID.

Returns:

The generated UUID.

Return type:

str

generate_uuid_col(col: Expr, base_uuid: UUID | None = None, added_string: str = '') Expr[source]#

Generate UUIDs for a column based on a base UUID and an optional added string.

Parameters:
  • col (pl.Expr) – The column to generate UUIDs for.

  • base_uuid (uuid.UUID, optional) – The base UUID for generating the UUIDs.

  • added_string (str, optional) – The optional added string. Defaults to “”.

Returns:

The column with generated UUIDs.

Return type:

pl.Expr

get_meta_data_string(metadata: Expr) Expr[source]#

Convert metadata to a JSON string, excluding keys with None values.

Parameters:

metadata (pl.Expr) – The metadata column.

Returns:

The metadata column as JSON strings.

Return type:

pl.Expr

get_transfo_admittance(rated_v: Expr, rated_s: Expr, oc_current_ratio: Expr) Expr[source]#

Get the transformer admittance based on the open circuit test

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • rated_s (pl.Expr) – The rated power column [VA].

  • oc_current_ratio (pl.Expr) – The ratio between the measured current when transformer secondary is opened and the

  • [%]. (rated current)

Returns:

The transformer admittance column [Simens].

Return type:

pl.Expr

get_transfo_conductance(rated_v: Expr, iron_losses: Expr) Expr[source]#

Get the transformer conductance based on iron losses measurement.

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • iron_losses (pl.Expr) – The iron losses column [W].

Returns:

The transformer conductance column [Simens].

Return type:

pl.Expr

get_transfo_imaginary_component(module: Expr, real: Expr) Expr[source]#

Get the transformer imaginary component based on the module and real component.

Parameters:
  • module (pl.Expr) – The module column [Ohm or Simens].

  • real (pl.Expr) – The real component column [Ohm or Simens].

Returns:

The transformer imaginary component column [Ohm or Simens].

Return type:

pl.Expr

get_transfo_impedance(rated_v: Expr, rated_s: Expr, voltage_ratio: Expr) Expr[source]#

Get the transformer impedance (or resistance if real part) based on the short-circuit tests.

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • rated_s (pl.Expr) – The rated power column [VA].

  • voltage_ratio (pl.Expr) – The ratio between the applied input voltage to get rated current when transformer

  • [%]. (secondary is short-circuited and the rated voltage)

Returns:

The transformer impedance column [Ohm].

Return type:

pl.Expr

get_transfo_resistance(rated_v: Expr, rated_s: Expr, copper_losses: Expr) Expr[source]#

Get the transformer resistance based on copper losses measurement.

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • rated_s (pl.Expr) – The rated power column [VA].

  • copper_losses (pl.Expr) – The copper losses column [W].

Returns:

The transformer resistance column [Ohm].

Return type:

pl.Expr

keep_only_duplicated_list(data: Expr) Expr[source]#

Return a boolean Polars expression indicating which rows in a list column are duplicates, after sorting and joining the list elements with an underscore. This function is useful for identifying rows in a DataFrame where the concatenated list of elements contains duplicates no mater the position of the elements. :param data: A Polars expression representing a list column. :type data: pl.Expr

Returns:

A boolean Polars expression indicating whether the concatenated list of elements is duplicated

Return type:

pl.Expr

keep_only_first_unique_list(data: Expr) Expr[source]#

Return a boolean Polars expression indicating which rows in a list column are unique, after sorting and joining the list elements with an underscore. This function is useful for identifying rows in a DataFrame where the concatenated list of elements contains unique elements, meaning that the concatenated list does not have any duplicates. The function will return True for rows that have unique concatenated lists and False for rows that have duplicates in their concatenated lists. :param data: A Polars expression representing a list column. :type data: pl.Expr

Returns:

A boolean Polars expression indicating whether the concatenated list of elements is duplicated

Return type:

pl.Expr

linear_interpolation_for_bound(x_col: Expr, y_col: Expr) Expr[source]#

Perform linear interpolation for boundary values in a column.

Parameters:
  • x_col (pl.Expr) – The x-axis column.

  • y_col (pl.Expr) – The y-axis column to interpolate.

Returns:

The interpolated y-axis column.

Return type:

pl.Expr

linear_interpolation_using_cols(df: DataFrame, x_col: str, y_col: list[str] | str) DataFrame[source]#

Perform linear interpolation on specified columns of a DataFrame.

Parameters:
  • df (pl.DataFrame) – The DataFrame containing the data.

  • x_col (str) – The name of the x-axis column.

  • y_col (Union[list[str], str]) – The name(s) of the y-axis column(s) to interpolate.

Returns:

The DataFrame with interpolated y-axis columns.

Return type:

pl.DataFrame

list_to_list_of_tuple(list_col: Expr) Expr[source]#

Convert a list of lists to a list of tuples. :param list_col: A polars expression representing a list of lists.

Returns:

A polars expression representing a list of tuples.

modify_string_col(string_col: Expr, format_str: dict) Expr[source]#

Modify string columns based on a given format dictionary.

Parameters:
  • string_col (pl.Expr) – The string column to modify.

  • format_str (dict) – The format dictionary containing the string modifications.

Returns:

The modified string column.

Return type:

pl.Expr

parse_date(date_str: str | None, default_date: datetime) datetime[source]#

Parse a date string and return a datetime object.

Parameters:
  • date_str (str, optional) – The date string to parse.

  • default_date (datetime) – The default date to return if the date string is None.

Returns:

The parsed datetime object.

Return type:

datetime

Raises:

ValueError – If the date format is not recognized.

parse_timestamp(timestamp_str: Expr, item: str | None, keep_string_format: bool = False, convert_to_utc: bool = False, initial_time_zone: str = 'Europe/Zurich') Expr[source]#

Parse a timestamp column based on a given item.

Parameters:
  • timestamp_str (pl.Expr) – The timestamp column.

  • item (str, optional) – The item to parse.

  • keep_string_format (bool, optional) – Whether to keep the string format. Defaults to False.

  • convert_to_utc (bool, optional) – Whether to convert the timestamp to UTC. Defaults to False.

  • initial_time_zone (str, optional) – The initial time zone of the timestamps. Defaults to “Europe/Zurich”.

Returns:

The parsed timestamp column.

Return type:

pl.Expr

Raises:

ValueError – If the timestamp format is not recognized.

replace_null_list(col: Expr, default_value: list | str | int | float | None = None) Expr[source]#

Replace null values in a list column with a specified value.

Parameters:
  • col (pl.Expr) – The list column to modify.

  • default_value (Optional[Union[list, str, int, float]], optional) – The default value for nulls. Defaults to None.

Returns:

The modified list column.

Return type:

pl.Expr