polars_function#

cast_boolean(col: Expr) Expr[source]#

Cast a column to boolean based on predefined replacements.

Parameters:

col (pl.Expr) – The column to cast.

Returns:

The casted boolean column.

Return type:

pl.Expr

cast_float(float_str: Expr) Expr[source]#

Cast a string column to float, modifying the string format as needed.

Parameters:

float_str (pl.Expr) – The string column to cast.

Returns:

The casted float column.

Return type:

pl.Expr

cast_to_utc_timestamp(timestamp: Expr, initial_time_zone: str = 'Europe/Zurich') Expr[source]#

Convert a timestamp column to UTC from the specified initial time zone.

Parameters:
  • timestamp (pl.Expr) – The timestamp column to convert.

  • initial_time_zone (str, optional) – The initial time zone of the timestamps. Defaults to “Europe/Zurich”.

Returns:

The timestamp column converted to UTC.

Return type:

pl.Expr

concat_list_of_list(col_list: Expr) Expr[source]#

Concatenate a column of lists into a list containing sublist.

Parameters:

col_list (pl.Expr) – The column of lists to concatenate.

Returns:

The concatenated list column.

Return type:

pl.Expr

cum_count_duplicates(cols_names: str | list[str]) Expr[source]#

Calculate the cumulative count of duplicate values in a specified column of a DataFrame, assigning half of the count as strict positive values and the other half as strict negative values.

Parameters:

cols_names (Union[str, list[str]]) – The name of the column to check for duplicates.

Returns:

A polar expression showing the cumulative count of duplicates.

Return type:

pl.Expr

Example:#

>>> df = pl.DataFrame({"a": [1, 1, 2, 3, 4, 4, 4]})
... df.with_columns(
...    cum_count_duplicates(cols_names="a").alias("cum_count")
... )
shape: (7, 2)
┌─────┬────────────┐
│ id  ┆ cum_count  │
│ --- ┆ ---        │
│ i64 ┆ i64        │
╞═════╪════════════╡
│ 1   ┆ -1         │
│ 1   ┆ 1          │
│ 2   ┆ 1          │
│ 3   ┆ 1          │
│ 4   ┆ -1         │
│ 4   ┆ 1          │
│ 4   ┆ 2          │
└─────┴────────────┘
digitize_col(col: Expr, min: float, max: float, nb_state: int) Expr[source]#

Digitize a column into discrete states based on the specified number of states.

Parameters:
  • col (pl.Expr) – The column to digitize.

  • min (float) – The minimum value of the column.

  • max (float) – The maximum value of the column.

  • nb_state (int) – The number of discrete states.

Returns:

The digitized column.

Return type:

pl.Expr

generate_random_uuid(col: Expr) Expr[source]#

Generate a random UUID.

Returns:

The generated UUID.

Return type:

str

generate_uuid_col(col: Expr, base_uuid: UUID | None = None, added_string: str = '') Expr[source]#

Generate UUIDs for a column based on a base UUID and an optional added string.

Parameters:
  • col (pl.Expr) – The column to generate UUIDs for.

  • base_uuid (uuid.UUID, optional) – The base UUID for generating the UUIDs.

  • added_string (str, optional) – The optional added string. Defaults to “”.

Returns:

The column with generated UUIDs.

Return type:

pl.Expr

get_meta_data_string(metadata: Expr) Expr[source]#

Convert metadata to a JSON string, excluding keys with None values.

Parameters:

metadata (pl.Expr) – The metadata column.

Returns:

The metadata column as JSON strings.

Return type:

pl.Expr

get_transfo_admittance(rated_v: Expr, rated_s: Expr, oc_current_ratio: Expr) Expr[source]#

Get the transformer admittance based on the open circuit test

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • rated_s (pl.Expr) – The rated power column [VA].

  • oc_current_ratio (pl.Expr) – The ratio between the measured current when transformer secondary is opened and the

  • [%]. (rated current)

Returns:

The transformer admittance column [Simens].

Return type:

pl.Expr

get_transfo_conductance(rated_v: Expr, iron_losses: Expr) Expr[source]#

Get the transformer conductance based on iron losses measurement.

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • iron_losses (pl.Expr) – The iron losses column [W].

Returns:

The transformer conductance column [Simens].

Return type:

pl.Expr

get_transfo_imaginary_component(module: Expr, real: Expr) Expr[source]#

Get the transformer imaginary component based on the module and real component.

Parameters:
  • module (pl.Expr) – The module column [Ohm or Simens].

  • real (pl.Expr) – The real component column [Ohm or Simens].

Returns:

The transformer imaginary component column [Ohm or Simens].

Return type:

pl.Expr

get_transfo_impedance(rated_v: Expr, rated_s: Expr, voltage_ratio: Expr) Expr[source]#

Get the transformer impedance (or resistance if real part) based on the short-circuit tests.

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • rated_s (pl.Expr) – The rated power column [VA].

  • voltage_ratio (pl.Expr) – The ratio between the applied input voltage to get rated current when transformer

  • [%]. (secondary is short-circuited and the rated voltage)

Returns:

The transformer impedance column [Ohm].

Return type:

pl.Expr

get_transfo_resistance(rated_v: Expr, rated_s: Expr, copper_losses: Expr) Expr[source]#

Get the transformer resistance based on copper losses measurement.

Parameters:
  • rated_v (pl.Expr) – The rated voltage column indicates which side of the transformer the parameters are

  • with (associated)

  • rated_s (pl.Expr) – The rated power column [VA].

  • copper_losses (pl.Expr) – The copper losses column [W].

Returns:

The transformer resistance column [Ohm].

Return type:

pl.Expr

linear_interpolation_for_bound(x_col: Expr, y_col: Expr) Expr[source]#

Perform linear interpolation for boundary values in a column.

Parameters:
  • x_col (pl.Expr) – The x-axis column.

  • y_col (pl.Expr) – The y-axis column to interpolate.

Returns:

The interpolated y-axis column.

Return type:

pl.Expr

linear_interpolation_using_cols(df: DataFrame, x_col: str, y_col: list[str] | str) DataFrame[source]#

Perform linear interpolation on specified columns of a DataFrame.

Parameters:
  • df (pl.DataFrame) – The DataFrame containing the data.

  • x_col (str) – The name of the x-axis column.

  • y_col (Union[list[str], str]) – The name(s) of the y-axis column(s) to interpolate.

Returns:

The DataFrame with interpolated y-axis columns.

Return type:

pl.DataFrame

modify_string_col(string_col: Expr, format_str: dict) Expr[source]#

Modify string columns based on a given format dictionary.

Parameters:
  • string_col (pl.Expr) – The string column to modify.

  • format_str (dict) – The format dictionary containing the string modifications.

Returns:

The modified string column.

Return type:

pl.Expr

parse_date(date_str: str | None, default_date: datetime) datetime[source]#

Parse a date string and return a datetime object.

Parameters:
  • date_str (str, optional) – The date string to parse.

  • default_date (datetime) – The default date to return if the date string is None.

Returns:

The parsed datetime object.

Return type:

datetime

Raises:

ValueError – If the date format is not recognized.

parse_timestamp(timestamp_str: Expr, item: str | None, keep_string_format: bool = False, convert_to_utc: bool = False, initial_time_zone: str = 'Europe/Zurich') Expr[source]#

Parse a timestamp column based on a given item.

Parameters:
  • timestamp_str (pl.Expr) – The timestamp column.

  • item (str, optional) – The item to parse.

  • keep_string_format (bool, optional) – Whether to keep the string format. Defaults to False.

  • convert_to_utc (bool, optional) – Whether to convert the timestamp to UTC. Defaults to False.

  • initial_time_zone (str, optional) – The initial time zone of the timestamps. Defaults to “Europe/Zurich”.

Returns:

The parsed timestamp column.

Return type:

pl.Expr

Raises:

ValueError – If the timestamp format is not recognized.