pyxla.util¶

A set of utility functions that are not pyXla xLA features.

pyxla.util.load_data(sample, sep=' ')¶

Loads data specified in a dictionary.

Loads data from a dictionary and converts it to a pyXla sample by modifying the dictionary in-place. This function can be imported from either pyxla or from pyxla.util

Parameters:

sample (dict) –

A Python dictionary comprising of any of the following keys, F, X, V, D, N, max, or representation. The F key is mandatory.

Keys

The keys F, X, V, D, and N can take as value either

a path or url to a delimited data file (e.g. CSV file),
a pandas DataFrame or,
a function (with the exception of X which takes a pyxla.sampling.Sampler object).

The max key should have as its value either a boolean, a single element boolean array, or an array of booleans whose size must match the number of objectives.

Functions as Values

For F, V, the function must take a solution (pandas Series from X) as input and return a numeric value.

For D, the function must take two solutions (pandas Series from X) as inputs and return a numeric value.

For N, the function must take two solutions (pandas Series from X) as inputs and return a boolean value indicating whether the two solutions are neighbours.

sepstr, optional: Separator use in the operator delimited file, by default it is whitespace.

Raises:: Exception – Raises an exception is the F key is not specified. All other keys are optional.

Examples

>>> from pyxla import load_data
>>> import pandas as pd
>>>
>>> # Loading using file paths or URLs
>>> sample = {
...     "X": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv",
...     "F": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_F.csv",
...     "V": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_V.csv",
...     "D": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv"
... }
>>> load_data(sample)
>>>
>>> # Loading via domain and function specification
>>> from pyxla.sampling import RandomWalkSampler
>>> import random
>>> sample = {
...        "X": RandomWalkSampler(
...            sample_size=100, step_size=1, dim=2, return_neighbourhood=True
...        ),
...        "F": lambda x: (x**2).sum(),
...        "V": lambda x: ((x**2).sum() - 3),
...        "D": lambda x1, x2: ((x1 - x2)**2).sum(),
...        "N": lambda x1, x2: random.choice([True, False])
...    }
>>> load_data(sample)
>>>
>>> # Loading mixing dataframes, paths, function specification
>>> sample = {
...        "X": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv",
...        "F": [lambda x: (x**2).sum(), lambda x: x.sum()],
...        "V": lambda x: ((x**2).sum() - 3),
...        "D": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv",
...        "N": lambda x1, x2: random.choice([True, False])
...    }
>>> load_data(sample)
>>> isinstance(sample['F'], pd.DataFrame)
True

pyxla.util.load_sample(name: str, test=False, exclude=[])¶

Load an example dataset.

pyXla provided a few example datasets that can be used to try out the framework.

Parameters:

name (str) – Name of the dataset. [@todo: include link to folder of datasets]
test (bool, optional) – If True, a lightweight test version of the sample is loaded. If False, the whole dataset is loaded.
exclude (list[str], optional) – Selectively load input files to reduce memory footprint. This is useful where some input files are not needed for the desired operation.

Returns:

sample – A dictionary with the input files of a problem sample i.e. F or V files.

Return type:

dict

Examples

>>> from pyxla.util import load_sample
>>> import pandas as pd
>>> sample = load_sample('cec2010_c18_2d_F1_V2')
>>> type(sample)
<class 'dict'>

pyxla.util.present(sample: dict, input: str) → bool¶

Check presence of a given input file in a sample.

An input is considered present if there is a pandas.Dataframe associated to the a key corresponding to the input file.

Parameters:

sample (dict) – A loaded sample containing input files.
input (str) – A character representing an input file type i.e. ‘X’ or ‘F’

Returns:

True if the input file is present and False otherwise.

Return type:

bool

pyxla.util.sample_X(sampler: Sampler) → DataFrame | Tuple[DataFrame]¶

Samples from a search space.

Samples from a search given a full definition of the domain and sampling strategy. These are specified using an object of the sampling.Sampler class.

Parameters:: sampler (sampling.Sampler) – Object of the sampling.Sampler class.
Returns:: Depending on the specification of the sampler, it returns either an X file as dataframe or a tuple of the X and N dataframes. The latter case is the default behaviour where neighbourhood is tracked during sampling. See the return_neighbourhood argument in sampling.Sampler constructor.
Return type:: Union[pd.DataFrame, Tuple[pd.DataFrame]]

Examples

>>> import pandas as pd
>>> from pyxla.util import sample_X
>>> from pyxla.sampling import RandomWalkSampler
>>> X, N = sample_X(RandomWalkSampler(sample_size=100, step_size=1))
>>> isinstance(X, pd.DataFrame)
True
>>> isinstance(N, pd.DataFrame)
True

pyxla.util.compute_F(objectives: List[Callable], X: DataFrame) → DataFrame¶

Generate objective values.

Generates objective values given the objective functions and the solutions.

Parameters:

objectives (List[Callable]) – List of objective function(s), they can be regular function or lambda functions.
X (pd.DataFrame) – Dataframe with solutions sampled from the search space.

Returns:

Dataframe of objective values for each objective column-wise.

Return type:

pd.DataFrame

Examples

>>> from pyxla.util import load_sample, compute_F
>>> import pandas as pd
>>> s = load_sample('ackley8d_F1_V0', test=True)
>>> sphere_f = lambda x: (x.sum()) ** 2
>>> sine_f = lambda x: 8 * np.sin(20 * x.sum())
>>> objectives = [sphere_f, sine_f]
>>> F = compute_F(objectives, s['X'])
>>> F.columns
Index(['f0', 'f1'], dtype='object')

pyxla.util.compute_V(violations: List[Callable], X: DataFrame) → DataFrame¶

Generate violation values.

Generates violation values given the violation functions and the solutions.

Parameters:

objectives (List[Callable]) – List of violation function(s), they can be regular function or lambda functions. The violation functions passed are assumed to be in standard form. Thus, function evaluation that result in values \(\leq0\) are feasible and are clipped to \(0\). Function evaluation that are greater than zero are infeasible.
X (pd.DataFrame) – Dataframe with solutions sampled from the search space.

Returns:

Dataframe of violation values for each violation function column-wise.

Return type:

pd.DataFrame

Examples

>>> from pyxla.util import load_sample, compute_V
>>> import pandas as pd
>>> s = load_sample('ackley8d_F1_V0', test=True)
>>> sphere_v = lambda x: (x.sum()) ** 2 - 3
>>> sine_v = lambda x: 8 * np.sin(20 * x.sum())
>>> violations = [sphere_v, sine_v]
>>> V = compute_V(violations, s['X'])
>>> V.columns
Index(['v0', 'v1'], dtype='object')

pyxla.util.compute_D(sample: dict, metric: Callable | str = None, representation: str = None, force: bool = False) → DataFrame¶

Compute a D file containing pairwise distance between solutions.

Parameters:

sample (dict) – A sample containing the at least input X.
metric (Callable or str, optional) – A metric function or the name of a distance metric as listed scipy’s pdist function. If a metric function is defined it must take two solutions and computes distance between them of the form dist(Xa, Xb) -> d where Xa and Xb are pandas Series representing solutions, by default None. For example: lambda Xa, Xb: abs(Xa.sum() - Xb.sum()).
representation ({'continuous', 'binary'}, optional) – Representation of the data i.e. continuous or discrete, by default None.
force (bool, optional) – If set to True, the D file is recomputed even if there is an existing D file, by default False.

Returns:

Returns a dataframe of pairwise distances.

Return type:

pandas.DataFrame

Raises:

Exception – Raised if no X file is in the sample. It is also raised if there is an attempt to compute a D file when one is already present and force is set to False.
Exception – Raised if neither a metric function nor a representation category is specified.

Examples

>>> from pyxla.util import compute_D, load_sample
>>> sample = load_sample("ackley8d_F1_V0", test=True)
>>> # the X file of ackley8d_F1_V0 has 8 variables
>>> len(sample["X"].columns)
8
>>> # define a function to compute pairwise distance
>>> squared_euclidean = lambda X1, X2: ((X1 - X2)**2).sum()
>>> D = compute_D(sample, metric=squared_euclidean)
>>>
>>> # use one of scipy's distance metrics
>>> D = compute_D(sample, metric="canberra")
>>>
>>> # add the D input to the sample
>>> sample['D'] = D

pyxla.util.compute_N(sample: dict, neighbourhood_func: Literal['hilbert-curve', 'X-index'] | Callable, force=False) → DataFrame¶

Computes a neighbourhood information file N

Parameters:

sample (dict) – A sample containing the at least input X
neighbourhood_func (Union[Literal['hilbert-curve', 'X-index'], Callable]) – A function of the form f(X1, X2) -> bool; where X1 and X2 are one-dimensional array-like.
force (bool, optional) – If set to True, the N file is recomputed even if there is an existing N file, by default False.

Returns:

The N file generated.

Return type:

pd.DataFrame

Raises:

Exception – Raised if no X file is in the sample. It is also raised if there is an attempt to compute an N file when one is already present and force is set to False.
Exception – Raised if no neighbourhood function specified. The neighbourhood function should be of the form f(X1, X2) -> bool.

Examples

>>> from pyxla.util import compute_N, load_sample
>>> import random
>>> sample = load_sample("ackley8d_F1_V0", test=True)
>>>
>>> # Compute N file using a neighbourhood function
>>> neighbourhood_func = lambda x, y: random.choice([True, False])
>>> N = compute_N(sample, neighbourhood_func=neighbourhood_func)
>>>
>>> # Compute N file using the Hilbert curve
>>> N = compute_N(sample, neighbourhood_func="hilbert-curve")
>>>
>>> # add the N input to the sample
>>> sample['N'] = N

pyxla.util.compute_R(sample)¶

Compute ranks: objective, violations, Pareto rank on objectives, violations and their combination (treating violations as objectives).

Parameters:: sample (dict) – A loaded sample containing input files.