Loading and Sampling¶
Download this as a Jupyter notebook
This guide demonstrates how to load samples in pyXla.
The first step to doing any landscape analysis in pyxla is loading a data sample.
In pyXla, samples are specified using a regular Python dictionary.
The pyXla framework specifies 6 different types of input files outlined in the table below:
Input |
Description |
Required |
|---|---|---|
F |
File or function specifying objective values |
Yes |
X |
File or domain specifying solutions |
No |
V |
File or function specifying violation values |
No |
D |
File or function specifying distance information |
No |
N |
File or function specifying neighbourhood information |
No |
I |
File specifying additional information (work in progress) |
No |
F, X, V, D and N are the keys in the Python dictionary used to define a sample. Additionally the keys max and representation are can also be specified. Only the F must be present.
pyXla allows users to load sample is 3 ways each corresponding to how values are specified for each input key (e.g. F, X):
Using file paths or urls to delimited data file e.g. CSV file,
By specifying
pandasDataFrames,By specifying a function (for all keys except
X; forXa Sampler object is supplied).
All these ways can be mixed.
To load a sample, import the load_data function:
from pyxla import load_data
Defining Samples¶
Method 1: Load sample using file paths or URLs¶
This is the simplest method.
To define a sample using file paths or URLs, simple write a dictionary as below, supplying the path or URL for each key:
sample = {
"X": "../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv",
"F": "../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_F.csv",
"V": "../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_V.csv",
"D": "../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv"
}
Once the sample is defined use the load_data function to load in the data.
load_data(sample)
The load_data functions load the data in place such that the sample variable contains the pyXla sample which is ready for analysis.
We can check that the dataframe for the F input has indeed been loaded:
sample['F'].head()
| f1 | |
|---|---|
| 0 | -0.000046 |
| 1 | -0.041758 |
| 2 | -0.034587 |
| 3 | -0.004379 |
| 4 | -0.012440 |
Method 2: Load by supplying pandas DataFrames¶
This method is straight-forward. Simply supply a dataframe for each key:
import pandas as pd
X_df = pd.read_csv("../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv", sep=" ")
F_df = pd.read_csv("../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_F.csv", sep=" ")
sample = {
"X": X_df,
"F": F_df,
}
# load the sample
load_data(sample)
Method 3: Loading via domain and function specification¶
This method allows you to specify a sample declaratively. For the X key, one specifies the domain of the search space along with the sampling strategy via a Sampler object.
Currently pyXla supports 3 continuous sampling techniques:
To use random walk sampling, import the RandomWalkSampler class:
from pyxla.sampling import RandomWalkSampler
Instantiate the sampler:
sampler = RandomWalkSampler(
sample_size=100, step_size=1, dim=2, return_neighbourhood=True
)
For the remaining input keys, Python functions are supplied. Thus, we can define and load a sample as:
import random
sample = {
"X": sampler,
"F": lambda x: (x**2).sum(),
"V": lambda x: ((x**2).sum() - 3),
"D": lambda x1, x2: ((x1 - x2)**2).sum(), #must take a pair of arguments and real number
"N": lambda x1, x2: random.choice([True, False]) # must take a pair of arguments and return a bool
}
# load the sample
load_data(sample)
We can confirm that, for instance the N input has been loaded.
sample['N'].head()
| id1 | id2 | |
|---|---|---|
| 0 | 0 | 3 |
| 1 | 0 | 4 |
| 2 | 0 | 6 |
| 3 | 0 | 7 |
| 4 | 0 | 8 |
All the methods can be mixed¶
We can supply values to the input keys using file paths, URLs, dataframes and functions:
sample = {
"X": X_df, # dataframe
"F": [lambda x: (x**2).sum(), lambda x: x.sum()], # multiple objectives
"max": [True, False],
"V": lambda x: ((x**2).sum() - 3),
"D": "../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv", # file path
"N": lambda x1, x2: random.choice([True, False])
}
# load the sample
load_data(sample)
The max and representation Keys¶
The max key¶
The max key is used to specify whether the objective(s) should be maximised or not.
If the max key is not supplied, pyXla assumes minimisation by default. To specify that the objective should be maximised simply supply:
sample = {
#...
"max": True
#...
}
If there are multiple objectives you can specify a boolean value for each, or simply supply a single boolean value the objectives are treated similarly:
sample = {
#...
"max": [True, False]
#...
}
The representation key¶
The representation key is (optionally) used to specify whether the sample is from a discrete or continuous domain.
It can as values the following strings: discrete, binary or continuous. If specified this allows pyXla to use sensible default when computing information such as distance. For instance:
sample = {
#...sample has no D input
"representation": "continuous"
#...
}
Suppose that the sample above has no D input, i.e. it lacks distance information. If the representation key is specified, pyXla will use Euclidean distance by default to compute distance information should feature that utilises distance be invoked.