pyxla.sampling

A set of functions and classes for sampling.

class pyxla.sampling.Sampler(*, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None)

Bases: ABC

Sampler dataclass.

A simple abstract (extends ABC) base dataclass for capturing the requisite information for any sampling strategy.

Parameters:
  • sample_size (int) – Desired size of the sample.

  • dim (int, optional) – The dimensionality of the sample, by default 1.

  • l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 10. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • return_neighbourhood (bool, optional) – Whether the neighbourhood information of solution should be captured during sampling, be default True.

  • seed (int, optional) – Seed for random number generator for reproducibility, by default None.

abstractmethod sample() Tuple[DataFrame] | DataFrame

Abstract method that does the actual sampling

Returns:

A tuple of dataframes i.e X and N or a just a dataframe X if return_neighbourhood = False.

Return type:

Union[Tuple[pandas.DataFrame], pandas.DataFrame]

class pyxla.sampling.WalkSampler(*, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None, step_size: float | List[float], num_neighbours: int = 1)

Bases: Sampler

Abstract base class for sampling strategies that involve walks.

Parameters:
  • step_size (Union[float, List[float]]) – A float (or integer) or array of floats (integers) specifying the step size for random walk in each dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • num_neighbours (int, optional) – Number of neighbours to sample, by default 1.

abstractmethod sample() Tuple[DataFrame] | DataFrame

Abstract method that does the actual sampling

Returns:

A tuple of dataframes i.e X and N or a just a dataframe X if return_neighbourhood = False.

Return type:

Union[Tuple[pandas.DataFrame], pandas.DataFrame]

class pyxla.sampling.RandomWalkSampler(*, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None, step_size: float | List[float], num_neighbours: int = 1)

Bases: WalkSampler

Dataclass capturing configuration for random walk sampling.

Uses the function pyxla.sampling.random_walk_sampling() in the implementation of the abstract method Sampler.sample().

Examples

>>> from pyxla.sampling import RandomWalkSampler
>>> sampler = RandomWalkSampler(
...    sample_size=10,
...    step_size=1,
...    dim=2,
...    l_bound=-5,
...    u_bound=5,
...    return_neighbourhood=True
... )
>>> X, N = sampler.sample()
>>> X.shape
(10, 2)
sample() Tuple[DataFrame] | DataFrame

Implements abstract method Sampler.sample().

Returns:

A tuple of dataframes i.e X and N or a just a dataframe X if return_neighbourhood = False.

Return type:

Union[Tuple[pandas.DataFrame], pandas.DataFrame]

class pyxla.sampling.AdaptiveWalkSampler(objective: Callable[[List[float]], float], maximise: bool = False, step_retries: int = 10, *, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None, step_size: float | List[float], num_neighbours: int = 1)

Bases: WalkSampler

Dataclass capturing configuration for adaptive walk sampling.

Uses the function pyxla.sampling.adaptive_walk_sampling_continuous() in the implementation of the abstract method Sampler.sample().

Parameters:
  • objective (Callable[[List[float]], float]) – The objective function to be used in the adaptive walk.

  • maximise (bool, optional) – Whether the objective is minimised of maximised, by default False.

  • step_retries (int, optional) – Number many times to try finding a fitter neighbour, by default 10. If this is exceeded, a one neighbour is randomly chosen and used to find fitter neighbours.

Examples

>>> from pyxla.sampling import AdaptiveWalkSampler
>>> obj = lambda x: x[0] ** 2
>>> sampler = AdaptiveWalkSampler(
...    objective=obj,
...    maximise=False,
...    sample_size=10,
...    step_size=1,
...    dim=2,
...    l_bound=-5,
...    u_bound=5,
...    return_neighbourhood=True
... )
>>> X, N = sampler.sample()
>>> X.shape
(10, 2)
sample() Tuple[DataFrame] | DataFrame

Implements abstract method Sampler.sample().

Returns:

A tuple of dataframes i.e X and N or a just a dataframe X if return_neighbourhood = False.

Return type:

Union[Tuple[pandas.DataFrame], pandas.DataFrame]

class pyxla.sampling.HilbertCurveSampler(std_dev: float = 0.3, *, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None)

Bases: Sampler

Dataclass capturing configuration for Hilbert curve sampling.

Uses the function pyxla.sampling.hilbert_curve_sampling() in the implementation of the abstract method Sampler.sample().

Parameters:

std_dev (float, optional) – Standard deviation to sampling points around Hilbert curve vertices, by default 0.3.

Examples

>>> from pyxla.sampling import HilbertCurveSampler
>>> obj = lambda x: x[0] ** 2
>>> sampler = HilbertCurveSampler(
...    sample_size=10,
...    dim=2,
...    l_bound=-5,
...    u_bound=5,
...    return_neighbourhood=False
... )
>>> X = sampler.sample()
>>> X.shape
(10, 2)
sample() Tuple[DataFrame] | DataFrame

Implements abstract method Sampler.sample().

Returns:

A tuple of dataframes i.e X and N or a just a dataframe X if return_neighbourhood = False.

Return type:

Union[Tuple[pandas.DataFrame], pandas.DataFrame]

pyxla.sampling.random_walk_sampling(sample_size: int, step_size: float | List[float], dim: int = 1, num_neighbours: int = 1, l_bound: float | List[float] = 0, u_bound: float | List[float] = 100, seed: int = None) Tuple[DataFrame, DataFrame]

Generate an a sample consisting of an X (solutions) file and an N (neighbourhood) file using random walk.

Performs a random walk in the search space and captures neighbourhood in the process.

Parameters:
  • sample_size (int) – Desired size of the sample.

  • step_size (Union[float, List[float]]) – A float (or integer) or array of floats (integers) specifying the step size for random walk in each dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • dim (int, optional) – The dimensionality of the sample, by default 1.

  • num_neighbours (int, optional) – Number of neighbours to sample, by default 1.

  • l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 100. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • seed (int, optional) – Seed for random number generator for reproducibility, by default None.

Returns:

  • pandas.DataFrame – A dataframe consisting the solutions i.e an X file.

  • pandas.DataFrame – A dataframe defining neighbourhood among the solutions i.e an N file.

Examples

Generating a 1-dimensional sample:

>>> import numpy as np
>>> from pyxla.sampling import random_walk_sampling
>>> sample = np.random.rand(100, 2)
>>> N = random_walk_sampling(100, 5, 1, 0, 6)

Generating a n-dimensional sample:

>>> n, dim = 100, 2
>>> l_bound, u_bound, step = [0, 100], [100, 1000], [5, 100]
>>> X, N = random_walk_sampling(n, step, dim=dim, l_bound=l_bound, u_bound=u_bound)
pyxla.sampling.hilbert_curve_sampling(sample_size: int, dim: int = 2, l_bound: float | List[float] = 0, u_bound: float | List[float] = 10, std_dev: float = 0.3, seed: int = None) Tuple[DataFrame, DataFrame]

Generate a sample using the Hilbert curve.

A Hilbert curve is a space-filling curve described by David Hilbert in 1891. It has been showed to be a good alternative to random sampling and Latin hypercube sampling [2]. It is applicable in the generation of multidimensional samples. To add stochasticity points are sampled around the Hilbert curve vertices are sampled using the normal distribution.

Parameters:
  • sample_size (int) – Desired size of the sample

  • dim (int, optional) – The dimensionality of the sample, by default 2.

  • l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 10. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • std_dev (float, optional) – Standard deviation to sampling points around Hilbert curve vertices, by default 0.3, chosen empirically see [2].

  • seed (int, optional) – Seed for random number generator for reproducibility, by default None

Returns:

  • pandas.DataFrame – A dataframe consisting the solutions i.e an X file.

  • pandas.DataFrame – A dataframe defining neighbourhood among the solutions i.e an N file.

Raises:

Exception – Throws an exception if dimension dim is anything below 2. The Hilbert curve with dimension 1 is just a number line.

Examples

>>> from pyxla import sampling
>>> n, dim = 100, 2
>>> l_bound, u_bound = np.array([0, 100]), np.array([100, 1000])
>>> X, N = sampling.hilbert_curve_sampling(n, dim, l_bound, u_bound)
pyxla.sampling.hilbert_curve_neighbour_sampling(X: DataFrame, binary: bool = False) DataFrame

Generate an N (neighbourhood) file using the hilbert curve.

Maps samples from an n-dimensional space on a Hilbert curve to a 1-d Hilbert curve and infers neighbourhood from the order. The inputs are rescaled. Taking a 1-d Hilbert curve [5, 2, 1, 7], the following set of neighbourhood pairs in inferred: [[5, 2], [2, 1], [1, 7]].

Parameters:
  • X (pandas.DataFrame) – Dataframe containing the decision space variable i.e. the X file.

  • binary (bool, optional) – Specify whether the sample is binary or not, by default False.

Returns:

A 2-d sorted dataframe where for an row, the solution in column id2 can be reached from column id1.

Return type:

pandas.DataFrame

Examples

>>> from pyxla.util import load_sample
>>> from pyxla.sampling import hilbert_curve_neighbour_sampling
>>> sample = load_sample('nk_n14_k2_id5_F3_V2', test=True)
>>> N = hilbert_curve_neighbour_sampling(sample)
pyxla.sampling.adaptive_walk_sampling_continuous(objective: Callable[[List[float]], float], sample_size: int, step_size: float | List[float], maximise: bool = False, dim: int = 1, num_neighbours=1, step_retries=10, l_bound: float | List[float] = 0, u_bound: float | List[float] = 100, seed: int = None) Tuple[DataFrame, DataFrame]

Sample via an adaptive walk.

Adaptive walk sampling uses the idea of an adaptive walk [1], which has some similarity to a random walk. In an adaptive walk, beginning from a random solution in the search space, the walk steps only to a neighbour that is fitter than the current solution. The algorithm proceeds by considering the newly added solution in turn and looking for a fitter neighbour. The process continues until the required number of samples has been obtained.

Parameters:
  • objective (Callable[[List[float]], float]) – The objective function to be used in the adaptive walk.

  • sample_size (int) – Desired size of the sample.

  • step_size (Union[float, List[float]]) – A float (or integer) or array of floats (integers) specifying the step size for the adaptive walk in each dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • maximise (bool, optional) – Whether the objective is minimised of maximised, by default False.

  • dim (int, optional) – The dimensionality of the sample, by default 1

  • num_neighbours (int, optional) – Number of neighbours to sample, by default 1

  • step_retries (int, optional) – Number many times to try finding a fitter neighbour, by default 10. If this is exceeded, a one neighbour is randomly chosen and used to find fitter neighbours.

  • l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 100. If an array is supplied each element corresponds to a dimension. If the dim > 2, and l_bound is supplied as a single float, the same bound will be assumed for all the dimensions.

  • seed (int, optional) – Seed for random number generator for reproducibility, by default None.

Returns:

  • pandas.DataFrame – A dataframe consisting the solutions i.e an X file.

  • pandas.DataFrame – A dataframe defining neighbourhood among the solutions i.e an N file.

Examples

>>> from pyxla.sampling import adaptive_walk_sampling_continuous
>>> n, dim = 10, 2
>>> l_bound, u_bound, step = -5, 5, 0.1
>>> obj = lambda x: x[0] ** 2
>>> X, N = adaptive_walk_sampling_continuous(
...    obj, n, step, dim=dim, l_bound=l_bound, u_bound=u_bound, num_neighbours=3
... )
>>> X.shape
(10, 2)

References

[1]

Stuart Kauffman and Simon Levin. Towards a general theory of adaptive walks on rugged landscapes. Journal of theoretical Biology, 128(1):11–45, 1987.

[2] (1,2)

Johannes J Pienaar, Anna S Boman, and Katherine M Malan. Hilbert curves for efficient exploratory landscape analysis neighbourhood sampling. In International Conference on the Applications of Evolutionary Computation (Part of EvoStar), 293–309. Springer, 2024.