pyxla.sampling¶
A set of functions and classes for sampling.
- class pyxla.sampling.Sampler(*, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None)¶
Bases:
ABCSampler dataclass.
A simple abstract (extends
ABC) base dataclass for capturing the requisite information for any sampling strategy.- Parameters:
sample_size (int) – Desired size of the sample.
dim (int, optional) – The dimensionality of the sample, by default 1.
l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 10. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.return_neighbourhood (bool, optional) – Whether the neighbourhood information of solution should be captured during sampling, be default True.
seed (int, optional) – Seed for random number generator for reproducibility, by default None.
- abstractmethod sample() Tuple[DataFrame] | DataFrame¶
Abstract method that does the actual sampling
- Returns:
A tuple of dataframes i.e
XandNor a just a dataframeXif return_neighbourhood =False.- Return type:
Union[Tuple[pandas.DataFrame], pandas.DataFrame]
- class pyxla.sampling.WalkSampler(*, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None, step_size: float | List[float], num_neighbours: int = 1)¶
Bases:
SamplerAbstract base class for sampling strategies that involve walks.
- Parameters:
step_size (Union[float, List[float]]) – A float (or integer) or array of floats (integers) specifying the step size for random walk in each dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.num_neighbours (int, optional) – Number of neighbours to sample, by default 1.
- abstractmethod sample() Tuple[DataFrame] | DataFrame¶
Abstract method that does the actual sampling
- Returns:
A tuple of dataframes i.e
XandNor a just a dataframeXif return_neighbourhood =False.- Return type:
Union[Tuple[pandas.DataFrame], pandas.DataFrame]
- class pyxla.sampling.RandomWalkSampler(*, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None, step_size: float | List[float], num_neighbours: int = 1)¶
Bases:
WalkSamplerDataclass capturing configuration for random walk sampling.
Uses the function
pyxla.sampling.random_walk_sampling()in the implementation of the abstract methodSampler.sample().Examples
>>> from pyxla.sampling import RandomWalkSampler >>> sampler = RandomWalkSampler( ... sample_size=10, ... step_size=1, ... dim=2, ... l_bound=-5, ... u_bound=5, ... return_neighbourhood=True ... ) >>> X, N = sampler.sample() >>> X.shape (10, 2)
- sample() Tuple[DataFrame] | DataFrame¶
Implements abstract method
Sampler.sample().- Returns:
A tuple of dataframes i.e
XandNor a just a dataframeXif return_neighbourhood =False.- Return type:
Union[Tuple[pandas.DataFrame], pandas.DataFrame]
- class pyxla.sampling.AdaptiveWalkSampler(objective: Callable[[List[float]], float], maximise: bool = False, step_retries: int = 10, *, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None, step_size: float | List[float], num_neighbours: int = 1)¶
Bases:
WalkSamplerDataclass capturing configuration for adaptive walk sampling.
Uses the function
pyxla.sampling.adaptive_walk_sampling_continuous()in the implementation of the abstract methodSampler.sample().- Parameters:
objective (Callable[[List[float]], float]) – The objective function to be used in the adaptive walk.
maximise (bool, optional) – Whether the objective is minimised of maximised, by default False.
step_retries (int, optional) – Number many times to try finding a fitter neighbour, by default 10. If this is exceeded, a one neighbour is randomly chosen and used to find fitter neighbours.
Examples
>>> from pyxla.sampling import AdaptiveWalkSampler >>> obj = lambda x: x[0] ** 2 >>> sampler = AdaptiveWalkSampler( ... objective=obj, ... maximise=False, ... sample_size=10, ... step_size=1, ... dim=2, ... l_bound=-5, ... u_bound=5, ... return_neighbourhood=True ... ) >>> X, N = sampler.sample() >>> X.shape (10, 2)
- sample() Tuple[DataFrame] | DataFrame¶
Implements abstract method
Sampler.sample().- Returns:
A tuple of dataframes i.e
XandNor a just a dataframeXif return_neighbourhood =False.- Return type:
Union[Tuple[pandas.DataFrame], pandas.DataFrame]
- class pyxla.sampling.HilbertCurveSampler(std_dev: float = 0.3, *, sample_size: int, dim: int = 1, l_bound: float | List[float] = 1, u_bound: float | List[float] = 100, return_neighbourhood: bool = True, seed: int = None)¶
Bases:
SamplerDataclass capturing configuration for Hilbert curve sampling.
Uses the function
pyxla.sampling.hilbert_curve_sampling()in the implementation of the abstract methodSampler.sample().- Parameters:
std_dev (float, optional) – Standard deviation to sampling points around Hilbert curve vertices, by default 0.3.
Examples
>>> from pyxla.sampling import HilbertCurveSampler >>> obj = lambda x: x[0] ** 2 >>> sampler = HilbertCurveSampler( ... sample_size=10, ... dim=2, ... l_bound=-5, ... u_bound=5, ... return_neighbourhood=False ... ) >>> X = sampler.sample() >>> X.shape (10, 2)
- sample() Tuple[DataFrame] | DataFrame¶
Implements abstract method
Sampler.sample().- Returns:
A tuple of dataframes i.e
XandNor a just a dataframeXif return_neighbourhood =False.- Return type:
Union[Tuple[pandas.DataFrame], pandas.DataFrame]
- pyxla.sampling.random_walk_sampling(sample_size: int, step_size: float | List[float], dim: int = 1, num_neighbours: int = 1, l_bound: float | List[float] = 0, u_bound: float | List[float] = 100, seed: int = None) Tuple[DataFrame, DataFrame]¶
Generate an a sample consisting of an X (solutions) file and an N (neighbourhood) file using random walk.
Performs a random walk in the search space and captures neighbourhood in the process.
- Parameters:
sample_size (int) – Desired size of the sample.
step_size (Union[float, List[float]]) – A float (or integer) or array of floats (integers) specifying the step size for random walk in each dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.dim (int, optional) – The dimensionality of the sample, by default 1.
num_neighbours (int, optional) – Number of neighbours to sample, by default 1.
l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 100. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.seed (int, optional) – Seed for random number generator for reproducibility, by default None.
- Returns:
pandas.DataFrame – A dataframe consisting the solutions i.e an X file.
pandas.DataFrame – A dataframe defining neighbourhood among the solutions i.e an N file.
Examples
Generating a 1-dimensional sample:
>>> import numpy as np >>> from pyxla.sampling import random_walk_sampling >>> sample = np.random.rand(100, 2) >>> N = random_walk_sampling(100, 5, 1, 0, 6)
Generating a n-dimensional sample:
>>> n, dim = 100, 2 >>> l_bound, u_bound, step = [0, 100], [100, 1000], [5, 100] >>> X, N = random_walk_sampling(n, step, dim=dim, l_bound=l_bound, u_bound=u_bound)
- pyxla.sampling.hilbert_curve_sampling(sample_size: int, dim: int = 2, l_bound: float | List[float] = 0, u_bound: float | List[float] = 10, std_dev: float = 0.3, seed: int = None) Tuple[DataFrame, DataFrame]¶
Generate a sample using the Hilbert curve.
A Hilbert curve is a space-filling curve described by David Hilbert in 1891. It has been showed to be a good alternative to random sampling and Latin hypercube sampling [2]. It is applicable in the generation of multidimensional samples. To add stochasticity points are sampled around the Hilbert curve vertices are sampled using the normal distribution.
- Parameters:
sample_size (int) – Desired size of the sample
dim (int, optional) – The dimensionality of the sample, by default 2.
l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 10. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.std_dev (float, optional) – Standard deviation to sampling points around Hilbert curve vertices, by default 0.3, chosen empirically see [2].
seed (int, optional) – Seed for random number generator for reproducibility, by default None
- Returns:
pandas.DataFrame – A dataframe consisting the solutions i.e an X file.
pandas.DataFrame – A dataframe defining neighbourhood among the solutions i.e an N file.
- Raises:
Exception – Throws an exception if dimension
dimis anything below 2. The Hilbert curve with dimension 1 is just a number line.
Examples
>>> from pyxla import sampling >>> n, dim = 100, 2 >>> l_bound, u_bound = np.array([0, 100]), np.array([100, 1000]) >>> X, N = sampling.hilbert_curve_sampling(n, dim, l_bound, u_bound)
- pyxla.sampling.hilbert_curve_neighbour_sampling(X: DataFrame, binary: bool = False) DataFrame¶
Generate an N (neighbourhood) file using the hilbert curve.
Maps samples from an n-dimensional space on a Hilbert curve to a 1-d Hilbert curve and infers neighbourhood from the order. The inputs are rescaled. Taking a 1-d Hilbert curve
[5, 2, 1, 7], the following set of neighbourhood pairs in inferred:[[5, 2], [2, 1], [1, 7]].- Parameters:
X (pandas.DataFrame) – Dataframe containing the decision space variable i.e. the X file.
binary (bool, optional) – Specify whether the sample is binary or not, by default
False.
- Returns:
A 2-d sorted dataframe where for an row, the solution in column
id2can be reached from columnid1.- Return type:
pandas.DataFrame
Examples
>>> from pyxla.util import load_sample >>> from pyxla.sampling import hilbert_curve_neighbour_sampling >>> sample = load_sample('nk_n14_k2_id5_F3_V2', test=True) >>> N = hilbert_curve_neighbour_sampling(sample)
- pyxla.sampling.adaptive_walk_sampling_continuous(objective: Callable[[List[float]], float], sample_size: int, step_size: float | List[float], maximise: bool = False, dim: int = 1, num_neighbours=1, step_retries=10, l_bound: float | List[float] = 0, u_bound: float | List[float] = 100, seed: int = None) Tuple[DataFrame, DataFrame]¶
Sample via an adaptive walk.
Adaptive walk sampling uses the idea of an adaptive walk [1], which has some similarity to a random walk. In an adaptive walk, beginning from a random solution in the search space, the walk steps only to a neighbour that is fitter than the current solution. The algorithm proceeds by considering the newly added solution in turn and looking for a fitter neighbour. The process continues until the required number of samples has been obtained.
- Parameters:
objective (Callable[[List[float]], float]) – The objective function to be used in the adaptive walk.
sample_size (int) – Desired size of the sample.
step_size (Union[float, List[float]]) – A float (or integer) or array of floats (integers) specifying the step size for the adaptive walk in each dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.maximise (bool, optional) – Whether the objective is minimised of maximised, by default False.
dim (int, optional) – The dimensionality of the sample, by default 1
num_neighbours (int, optional) – Number of neighbours to sample, by default 1
step_retries (int, optional) – Number many times to try finding a fitter neighbour, by default 10. If this is exceeded, a one neighbour is randomly chosen and used to find fitter neighbours.
l_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the lower bound of the sample, by default 0. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.u_bound (Union[float, List[float]], optional) – A float (or integer) or array of floats (integers) specifying the upper bound of the sample, by default 100. If an array is supplied each element corresponds to a dimension. If the
dim> 2, andl_boundis supplied as a single float, the same bound will be assumed for all the dimensions.seed (int, optional) – Seed for random number generator for reproducibility, by default None.
- Returns:
pandas.DataFrame – A dataframe consisting the solutions i.e an X file.
pandas.DataFrame – A dataframe defining neighbourhood among the solutions i.e an N file.
Examples
>>> from pyxla.sampling import adaptive_walk_sampling_continuous >>> n, dim = 10, 2 >>> l_bound, u_bound, step = -5, 5, 0.1 >>> obj = lambda x: x[0] ** 2 >>> X, N = adaptive_walk_sampling_continuous( ... obj, n, step, dim=dim, l_bound=l_bound, u_bound=u_bound, num_neighbours=3 ... ) >>> X.shape (10, 2)
References
Stuart Kauffman and Simon Levin. Towards a general theory of adaptive walks on rugged landscapes. Journal of theoretical Biology, 128(1):11–45, 1987.