{ "cells": [ { "cell_type": "markdown", "id": "1bc7af07", "metadata": {}, "source": [ "# Loading and Sampling" ] }, { "cell_type": "markdown", "id": "ded5760f", "metadata": {}, "source": [ "{nb-download}`Download this as a Jupyter notebook `" ] }, { "cell_type": "markdown", "id": "5e697210", "metadata": {}, "source": [ "This guide demonstrates how to load samples in `pyXla`." ] }, { "cell_type": "markdown", "id": "67dbd626", "metadata": {}, "source": [ "The first step to doing any landscape analysis in pyxla is loading a data sample. \n", "In `pyXla`, samples are specified using a regular Python dictionary." ] }, { "cell_type": "markdown", "id": "918f3ce7", "metadata": {}, "source": [ "The `pyXla` framework specifies 6 different types of input files outlined in the table below:\n", "\n", "|Input|Description|Required|\n", "|-----|-----------|--------|\n", "|F|File or function specifying objective values|Yes|\n", "|X|File or domain specifying solutions|No|\n", "|V|File or function specifying violation values|No|\n", "|D|File or function specifying distance information|No|\n", "|N|File or function specifying neighbourhood information|No|\n", "|I|File specifying additional information (work in progress)|No|\n", "\n", "`F`, `X`, `V`, `D` and `N` are the keys in the Python dictionary used to define a sample. Additionally the keys `max` and `representation` are can also be specified. Only the `F` must be present." ] }, { "cell_type": "markdown", "id": "4c4c8f6d", "metadata": {}, "source": [ "`pyXla` allows users to load sample is 3 ways each corresponding to how values are specified for each input key (e.g. `F`, `X`):\n", "1. Using file paths or urls to delimited data file e.g. CSV file,\n", "2. By specifying `pandas` `DataFrames`,\n", "3. By specifying a function (for all keys except `X`; for `X` a [Sampler](#pyxla.sampling.Sampler) object is supplied).\n", "\n", "All these ways can be mixed." ] }, { "cell_type": "markdown", "id": "25eb15be", "metadata": {}, "source": [ "To load a sample, import the `load_data` function:" ] }, { "cell_type": "code", "execution_count": 1, "id": "29f1a1c4", "metadata": {}, "outputs": [], "source": [ "from pyxla import load_data" ] }, { "cell_type": "markdown", "id": "e6a8a9c1", "metadata": {}, "source": [ "## Defining Samples" ] }, { "cell_type": "markdown", "id": "8d895148", "metadata": {}, "source": [ "### Method 1: Load sample using file paths or URLs" ] }, { "cell_type": "markdown", "id": "7f1c73b3", "metadata": {}, "source": [ "This is the simplest method." ] }, { "cell_type": "markdown", "id": "3a9b94d8", "metadata": {}, "source": [ "To define a sample using file paths or URLs, simple write a dictionary as below, supplying the path or URL for each key:" ] }, { "cell_type": "code", "execution_count": 2, "id": "dac86bf1", "metadata": {}, "outputs": [], "source": [ "sample = {\n", " \"X\": \"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv\",\n", " \"F\": \"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_F.csv\",\n", " \"V\": \"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_V.csv\",\n", " \"D\": \"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv\"\n", "}" ] }, { "cell_type": "markdown", "id": "84b99d3e", "metadata": {}, "source": [ "Once the sample is defined use the `load_data` function to load in the data." ] }, { "cell_type": "code", "execution_count": 3, "id": "ee0f2059", "metadata": {}, "outputs": [], "source": [ "load_data(sample)" ] }, { "cell_type": "markdown", "id": "39c8686d", "metadata": {}, "source": [ "The `load_data` functions load the data in place such that the `sample` variable contains the `pyXla` sample which is ready for analysis." ] }, { "cell_type": "markdown", "id": "d65bbc0c", "metadata": {}, "source": [ "We can check that the dataframe for the `F` input has indeed been loaded:" ] }, { "cell_type": "code", "execution_count": 4, "id": "ba834680", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f1
0-0.000046
1-0.041758
2-0.034587
3-0.004379
4-0.012440
\n", "
" ], "text/plain": [ " f1\n", "0 -0.000046\n", "1 -0.041758\n", "2 -0.034587\n", "3 -0.004379\n", "4 -0.012440" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample['F'].head()" ] }, { "cell_type": "markdown", "id": "120d1c10", "metadata": {}, "source": [ "### Method 2: Load by supplying `pandas` `DataFrames`" ] }, { "cell_type": "markdown", "id": "13d5abb6", "metadata": {}, "source": [ "This method is straight-forward. Simply supply a dataframe for each key:" ] }, { "cell_type": "code", "execution_count": 5, "id": "e8b01937", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "X_df = pd.read_csv(\"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_X.csv\", sep=\" \")\n", "F_df = pd.read_csv(\"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_F.csv\", sep=\" \")" ] }, { "cell_type": "code", "execution_count": 6, "id": "2c8e3762", "metadata": {}, "outputs": [], "source": [ "sample = {\n", " \"X\": X_df,\n", " \"F\": F_df,\n", "}\n", "\n", "# load the sample\n", "load_data(sample)" ] }, { "cell_type": "markdown", "id": "096b4c20", "metadata": {}, "source": [ "### Method 3: Loading via domain and function specification" ] }, { "cell_type": "markdown", "id": "49fe0666", "metadata": {}, "source": [ "This method allows you to specify a sample *declaratively*. For the `X` key, one specifies the domain of the search space along with the sampling strategy via a [Sampler](#pyxla.sampling.Sampler) object.\n", "\n", "Currently `pyXla` supports 3 continuous sampling techniques:\n", "1. [Random walk sampling](#pyxla.sampling.random_walk_sampling),\n", "1. [Adaptive walk sampling](#pyxla.sampling.adaptive_walk_sampling_continuous), and\n", "1. [Hilbert curve sampling](#pyxla.sampling.hilbert_curve_sampling)." ] }, { "cell_type": "markdown", "id": "994c4030", "metadata": {}, "source": [ "To use random walk sampling, import the `RandomWalkSampler` class:" ] }, { "cell_type": "code", "execution_count": 7, "id": "31d2d9ec", "metadata": {}, "outputs": [], "source": [ "from pyxla.sampling import RandomWalkSampler" ] }, { "cell_type": "markdown", "id": "82ca7c74", "metadata": {}, "source": [ "Instantiate the sampler:" ] }, { "cell_type": "code", "execution_count": 8, "id": "987d9866", "metadata": {}, "outputs": [], "source": [ "sampler = RandomWalkSampler(\n", " sample_size=100, step_size=1, dim=2, return_neighbourhood=True\n", ")" ] }, { "cell_type": "markdown", "id": "4be9cb9d", "metadata": {}, "source": [ "For the remaining input keys, Python functions are supplied. Thus, we can define and load a sample as:" ] }, { "cell_type": "code", "execution_count": 9, "id": "d41c745d", "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "sample = {\n", " \"X\": sampler,\n", " \"F\": lambda x: (x**2).sum(),\n", " \"V\": lambda x: ((x**2).sum() - 3),\n", " \"D\": lambda x1, x2: ((x1 - x2)**2).sum(), #must take a pair of arguments and real number\n", " \"N\": lambda x1, x2: random.choice([True, False]) # must take a pair of arguments and return a bool\n", "}\n", "\n", "# load the sample\n", "load_data(sample)" ] }, { "cell_type": "markdown", "id": "67d0d029", "metadata": {}, "source": [ "We can confirm that, for instance the `N` input has been loaded." ] }, { "cell_type": "code", "execution_count": 10, "id": "edf63fb9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id1id2
003
104
206
307
408
\n", "
" ], "text/plain": [ " id1 id2\n", "0 0 3\n", "1 0 4\n", "2 0 6\n", "3 0 7\n", "4 0 8" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample['N'].head()" ] }, { "cell_type": "markdown", "id": "308310c8", "metadata": {}, "source": [ "### All the methods can be mixed" ] }, { "cell_type": "markdown", "id": "7466b5ee", "metadata": {}, "source": [ "We can supply values to the input keys using file paths, URLs, dataframes and functions:" ] }, { "cell_type": "code", "execution_count": 11, "id": "92cc807b", "metadata": {}, "outputs": [], "source": [ "sample = {\n", " \"X\": X_df, # dataframe\n", " \"F\": [lambda x: (x**2).sum(), lambda x: x.sum()], # multiple objectives\n", " \"max\": [True, False],\n", " \"V\": lambda x: ((x**2).sum() - 3),\n", " \"D\": \"../../../data/test_samples/cec2010_c01_2d_F1_V2/cec2010_c01_2d_F1_V2_D.csv\", # file path\n", " \"N\": lambda x1, x2: random.choice([True, False])\n", "}\n", "\n", "# load the sample\n", "load_data(sample)" ] }, { "cell_type": "markdown", "id": "ebb223b5", "metadata": {}, "source": [ "## The `max` and `representation` Keys" ] }, { "cell_type": "markdown", "id": "aafe3ab7", "metadata": {}, "source": [ "### The `max` key" ] }, { "cell_type": "markdown", "id": "19a2988f", "metadata": {}, "source": [ "The `max` key is used to specify whether the objective(s) should be maximised or not." ] }, { "cell_type": "markdown", "id": "a67440b9", "metadata": {}, "source": [ "If the `max` key is not supplied, `pyXla` assumes minimisation by default. To specify that the objective should be maximised simply supply:" ] }, { "cell_type": "code", "execution_count": 12, "id": "79284e01", "metadata": {}, "outputs": [], "source": [ "sample = {\n", " #...\n", " \"max\": True\n", " #...\n", "}" ] }, { "cell_type": "markdown", "id": "db66f1f7", "metadata": {}, "source": [ "If there are multiple objectives you can specify a boolean value for each, or simply supply a single boolean value the objectives are treated similarly:" ] }, { "cell_type": "code", "execution_count": 13, "id": "85a3ab1e", "metadata": {}, "outputs": [], "source": [ "sample = {\n", " #...\n", " \"max\": [True, False]\n", " #...\n", "}" ] }, { "cell_type": "markdown", "id": "f28bd067", "metadata": {}, "source": [ "### The `representation` key" ] }, { "cell_type": "markdown", "id": "00254613", "metadata": {}, "source": [ "The `representation` key is (optionally) used to specify whether the sample is from a discrete or continuous domain.\n", "\n", "It can as values the following strings: `discrete`, `binary` or `continuous`. If specified this allows `pyXla` to use sensible default when computing information such as distance. For instance:" ] }, { "cell_type": "code", "execution_count": 14, "id": "1a8f93b7", "metadata": {}, "outputs": [], "source": [ "sample = {\n", " #...sample has no D input\n", " \"representation\": \"continuous\"\n", " #...\n", "}" ] }, { "cell_type": "markdown", "id": "2751fcce", "metadata": {}, "source": [ "Suppose that the sample above has no `D` input, i.e. it lacks distance information. If the `representation` key is specified, `pyXla` will use Euclidean distance by default to compute distance information should feature that utilises distance be invoked." ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.14.4" } }, "nbformat": 4, "nbformat_minor": 5 }