pyxla.util¶
A set of utility functions that are not pyXla xLA features.
- pyxla.util.load_data(sample, sep=' ')¶
Loads data specified in a dictionary.
Loads data from a dictionary and converts it to a
pyXlasample by modifying the dictionary in-place. This function can be imported from eitherpyxlaor frompyxla.util- Parameters:
sample (dict) –
A Python dictionary comprising of any of the following keys,
F,X,V,D,N,max, orrepresentation. TheFkey is mandatory.Keys
- The keys
F,X,V,D, andNcan take as value either a path or url to a delimited data file (e.g. CSV file),
a
pandasDataFrameor,a function (with the exception of
Xwhich takes apyxla.sampling.Samplerobject).
The
maxkey should have as its value either a boolean, a single element boolean array, or an array of booleans whose size must match the number of objectives.Functions as Values
For
F,V, the function must take a solution (pandasSeriesfromX) as input and return a numeric value.For
D, the function must take two solutions (pandasSeriesfromX) as inputs and return a numeric value.For
N, the function must take two solutions (pandasSeriesfromX) as inputs and return a boolean value indicating whether the two solutions are neighbours.- The keys
- sepstr, optional
Separator use in the operator delimited file, by default it is whitespace.
- Raises:
Exception – Raises an exception is the
Fkey is not specified. All other keys are optional.
Examples
>>> from pyxla import load_data >>> import pandas as pd >>> >>> # Loading using file paths or URLs >>> sample = { ... "X": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv", ... "F": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_F.csv", ... "V": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_V.csv", ... "D": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv" ... } >>> load_data(sample) >>> >>> # Loading via domain and function specification >>> from pyxla.sampling import RandomWalkSampler >>> import random >>> sample = { ... "X": RandomWalkSampler( ... sample_size=100, step_size=1, dim=2, return_neighbourhood=True ... ), ... "F": lambda x: (x**2).sum(), ... "V": lambda x: ((x**2).sum() - 3), ... "D": lambda x1, x2: ((x1 - x2)**2).sum(), ... "N": lambda x1, x2: random.choice([True, False]) ... } >>> load_data(sample) >>> >>> # Loading mixing dataframes, paths, function specification >>> sample = { ... "X": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv", ... "F": [lambda x: (x**2).sum(), lambda x: x.sum()], ... "V": lambda x: ((x**2).sum() - 3), ... "D": "../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv", ... "N": lambda x1, x2: random.choice([True, False]) ... } >>> load_data(sample) >>> isinstance(sample['F'], pd.DataFrame) True
- pyxla.util.load_sample(name: str, test=False, exclude=[])¶
Load an example dataset.
pyXla provided a few example datasets that can be used to try out the framework.
- Parameters:
name (str) – Name of the dataset. [@todo: include link to folder of datasets]
test (bool, optional) – If True, a lightweight test version of the sample is loaded. If False, the whole dataset is loaded.
exclude (list[str], optional) – Selectively load input files to reduce memory footprint. This is useful where some input files are not needed for the desired operation.
- Returns:
sample – A dictionary with the input files of a problem sample i.e. F or V files.
- Return type:
dict
Examples
>>> from pyxla.util import load_sample >>> import pandas as pd >>> sample = load_sample('cec2010_c18_2d_F1_V2') >>> type(sample) <class 'dict'>
- pyxla.util.present(sample: dict, input: str) bool¶
Check presence of a given input file in a sample.
An input is considered present if there is a pandas.Dataframe associated to the a key corresponding to the input file.
- Parameters:
sample (dict) – A loaded sample containing input files.
input (str) – A character representing an input file type i.e. ‘X’ or ‘F’
- Returns:
True if the input file is present and False otherwise.
- Return type:
bool
- pyxla.util.sample_X(sampler: Sampler) DataFrame | Tuple[DataFrame]¶
Samples from a search space.
Samples from a search given a full definition of the domain and sampling strategy. These are specified using an object of the
sampling.Samplerclass.- Parameters:
sampler (sampling.Sampler) – Object of the
sampling.Samplerclass.- Returns:
Depending on the specification of the sampler, it returns either an
Xfile as dataframe or a tuple of the X and N dataframes. The latter case is the default behaviour where neighbourhood is tracked during sampling. See thereturn_neighbourhoodargument insampling.Samplerconstructor.- Return type:
Union[pd.DataFrame, Tuple[pd.DataFrame]]
Examples
>>> import pandas as pd >>> from pyxla.util import sample_X >>> from pyxla.sampling import RandomWalkSampler >>> X, N = sample_X(RandomWalkSampler(sample_size=100, step_size=1)) >>> isinstance(X, pd.DataFrame) True >>> isinstance(N, pd.DataFrame) True
- pyxla.util.compute_F(objectives: List[Callable], X: DataFrame) DataFrame¶
Generate objective values.
Generates objective values given the objective functions and the solutions.
- Parameters:
objectives (List[Callable]) – List of objective function(s), they can be regular function or lambda functions.
X (pd.DataFrame) – Dataframe with solutions sampled from the search space.
- Returns:
Dataframe of objective values for each objective column-wise.
- Return type:
pd.DataFrame
Examples
>>> from pyxla.util import load_sample, compute_F >>> import pandas as pd >>> s = load_sample('ackley8d_F1_V0', test=True) >>> sphere_f = lambda x: (x.sum()) ** 2 >>> sine_f = lambda x: 8 * np.sin(20 * x.sum()) >>> objectives = [sphere_f, sine_f] >>> F = compute_F(objectives, s['X']) >>> F.columns Index(['f0', 'f1'], dtype='object')
- pyxla.util.compute_V(violations: List[Callable], X: DataFrame) DataFrame¶
Generate violation values.
Generates violation values given the violation functions and the solutions.
- Parameters:
objectives (List[Callable]) – List of violation function(s), they can be regular function or lambda functions. The violation functions passed are assumed to be in standard form. Thus, function evaluation that result in values \(\leq0\) are feasible and are clipped to \(0\). Function evaluation that are greater than zero are infeasible.
X (pd.DataFrame) – Dataframe with solutions sampled from the search space.
- Returns:
Dataframe of violation values for each violation function column-wise.
- Return type:
pd.DataFrame
Examples
>>> from pyxla.util import load_sample, compute_V >>> import pandas as pd >>> s = load_sample('ackley8d_F1_V0', test=True) >>> sphere_v = lambda x: (x.sum()) ** 2 - 3 >>> sine_v = lambda x: 8 * np.sin(20 * x.sum()) >>> violations = [sphere_v, sine_v] >>> V = compute_V(violations, s['X']) >>> V.columns Index(['v0', 'v1'], dtype='object')
- pyxla.util.compute_D(sample: dict, metric: Callable | str = None, representation: str = None, force: bool = False) DataFrame¶
Compute a D file containing pairwise distance between solutions.
- Parameters:
sample (dict) – A sample containing the at least input
X.metric (Callable or str, optional) – A metric function or the name of a distance metric as listed
scipy’s pdist function. If a metric function is defined it must take two solutions and computes distance between them of the form dist(Xa, Xb) -> d where Xa and Xb arepandasSeries representing solutions, by defaultNone. For example:lambda Xa, Xb: abs(Xa.sum() - Xb.sum()).representation ({'continuous', 'binary'}, optional) – Representation of the data i.e. continuous or discrete, by default
None.force (bool, optional) – If set to True, the D file is recomputed even if there is an existing D file, by default
False.
- Returns:
Returns a dataframe of pairwise distances.
- Return type:
pandas.DataFrame
- Raises:
Exception – Raised if no X file is in the sample. It is also raised if there is an attempt to compute a D file when one is already present and force is set to False.
Exception – Raised if neither a metric function nor a representation category is specified.
Examples
>>> from pyxla.util import compute_D, load_sample >>> sample = load_sample("ackley8d_F1_V0", test=True) >>> # the X file of ackley8d_F1_V0 has 8 variables >>> len(sample["X"].columns) 8 >>> # define a function to compute pairwise distance >>> squared_euclidean = lambda X1, X2: ((X1 - X2)**2).sum() >>> D = compute_D(sample, metric=squared_euclidean) >>> >>> # use one of scipy's distance metrics >>> D = compute_D(sample, metric="canberra") >>> >>> # add the D input to the sample >>> sample['D'] = D
- pyxla.util.compute_N(sample: dict, neighbourhood_func: Literal['hilbert-curve'] | Callable, force=False) DataFrame¶
Computes a neighbourhood information file N
- Parameters:
sample (dict) – A sample containing the at least input
Xneighbourhood_func (Union[Literal['hilbert-curve'], Callable]) – A function of the form
f(X1, X2) -> bool; whereX1andX2are one-dimensional array-like.force (bool, optional) – If set to
True, the N file is recomputed even if there is an existing N file, by defaultFalse.
- Returns:
The N file generated.
- Return type:
pd.DataFrame
- Raises:
Exception – Raised if no X file is in the sample. It is also raised if there is an attempt to compute a N file when one is already present and force is set to False.
Exception – Raised tf no neighbourhood function specified. The neighbourhood function should be of the form
f(X1, X2) -> bool.
Examples
>>> from pyxla.util import compute_N, load_sample >>> import random >>> sample = load_sample("ackley8d_F1_V0", test=True) >>> >>> # Compute N file using a neighbourhood function >>> neighbourhood_func = lambda x, y: random.choice([True, False]) >>> N = compute_N(sample, neighbourhood_func=neighbourhood_func) >>> >>> # Compute N file using the Hilbert curve >>> N = compute_N(sample, neighbourhood_func="hilbert-curve") >>> >>> # add the N input to the sample >>> sample['N'] = N