{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "95097ed3",
   "metadata": {},
   "source": [
    "# Auxiliary Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d5148d1",
   "metadata": {},
   "source": [
    "{nb-download}`Download this as a Jupyter notebook <auxiliary_functions.ipynb>`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dc47ae6",
   "metadata": {},
   "source": [
    "This notebook covers the usage of the auxiliary function provided in `pyXla`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c4b2b10",
   "metadata": {},
   "source": [
    "A key feature of the `pyXla` framework is the separation of sampling and analysis. A set of functions are provided to support sampling. They are:\n",
    "\n",
    "1. {func}`pyxla.util.sample_X`\n",
    "1. {func}`pyxla.util.compute_F`\n",
    "1. {func}`pyxla.util.compute_V`\n",
    "1. {func}`pyxla.util.compute_D`\n",
    "1. {func}`pyxla.util.compute_N`\n",
    "\n",
    "Each function corresponds to a given input file as indicated by it suffix.\n",
    "\n",
    "When a sample is loaded declaratively via domain and function specification (method 3 in {doc}`loading_and_sampling`), these function are used under the hood. These function are available to the user for finer control over sampling."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8410c616",
   "metadata": {},
   "source": [
    "The auxiliary functions are all imported from {mod}`pyxla.util`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3d0b6a3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyxla.util import sample_X, compute_F, compute_V, compute_D, compute_N"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c293d9f",
   "metadata": {},
   "source": [
    "One can generate an `X` file as below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "428cac3a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "x0",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "x1",
         "rawType": "float64",
         "type": "float"
        }
       ],
       "ref": "6f69164d-4090-40ef-beaa-550f15a9d479",
       "rows": [
        [
         "0",
         "2.206135162903853",
         "9.199149398146837"
        ],
        [
         "1",
         "4.500343908200286",
         "20.740091685361307"
        ],
        [
         "2",
         "17.21359626961158",
         "19.32797050174577"
        ],
        [
         "3",
         "23.857364297488484",
         "12.213632360669257"
        ],
        [
         "4",
         "26.52996954877949",
         "1.394386703189455"
        ]
       ],
       "shape": {
        "columns": 2,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x0</th>\n",
       "      <th>x1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.206135</td>\n",
       "      <td>9.199149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.500344</td>\n",
       "      <td>20.740092</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>17.213596</td>\n",
       "      <td>19.327971</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>23.857364</td>\n",
       "      <td>12.213632</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>26.529970</td>\n",
       "      <td>1.394387</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          x0         x1\n",
       "0   2.206135   9.199149\n",
       "1   4.500344  20.740092\n",
       "2  17.213596  19.327971\n",
       "3  23.857364  12.213632\n",
       "4  26.529970   1.394387"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pyxla.sampling import HilbertCurveSampler\n",
    "\n",
    "sampler = HilbertCurveSampler(\n",
    "    sample_size=100, dim=2, return_neighbourhood=True # will return an N file too\n",
    ")\n",
    "\n",
    "X, N = sample_X(sampler)\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98dffd81",
   "metadata": {},
   "source": [
    "Specifying `return_neighbourhood=True` generates an `N` file as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b7c28974",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "id1",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "id2",
         "rawType": "int64",
         "type": "integer"
        }
       ],
       "ref": "f78ff6c3-5133-490b-a415-3e3ca2f54bb1",
       "rows": [
        [
         "0",
         "0",
         "1"
        ],
        [
         "1",
         "1",
         "2"
        ],
        [
         "2",
         "2",
         "3"
        ],
        [
         "3",
         "3",
         "4"
        ],
        [
         "4",
         "4",
         "5"
        ]
       ],
       "shape": {
        "columns": 2,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id1</th>\n",
       "      <th>id2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id1  id2\n",
       "0    0    1\n",
       "1    1    2\n",
       "2    2    3\n",
       "3    3    4\n",
       "4    4    5"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "N.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bae808cc",
   "metadata": {},
   "source": [
    "The `F` file can be generated from the `X` file by specifying an objective function or multiple objective functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1fb486af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "f0",
         "rawType": "float64",
         "type": "float"
        },
        {
         "name": "f1",
         "rawType": "float64",
         "type": "float"
        }
       ],
       "ref": "d652a434-5724-4206-a641-96332e48f56d",
       "rows": [
        [
         "0",
         "89.49138200642612",
         "11.405284561050689"
        ],
        [
         "1",
         "450.4044984092687",
         "25.240435593561592"
        ],
        [
         "2",
         "669.8783402495403",
         "36.54156677135735"
        ],
        [
         "3",
         "718.3466466646655",
         "36.07099665815774"
        ],
        [
         "4",
         "705.7835985371986",
         "27.924356251968945"
        ]
       ],
       "shape": {
        "columns": 2,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>f0</th>\n",
       "      <th>f1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>89.491382</td>\n",
       "      <td>11.405285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>450.404498</td>\n",
       "      <td>25.240436</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>669.878340</td>\n",
       "      <td>36.541567</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>718.346647</td>\n",
       "      <td>36.070997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>705.783599</td>\n",
       "      <td>27.924356</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           f0         f1\n",
       "0   89.491382  11.405285\n",
       "1  450.404498  25.240436\n",
       "2  669.878340  36.541567\n",
       "3  718.346647  36.070997\n",
       "4  705.783599  27.924356"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def sphere(X):\n",
    "    return X[0]**2 + X[1]**2\n",
    "\n",
    "def summation(X):\n",
    "    return X.sum()\n",
    "\n",
    "F = compute_F([sphere, summation], X)\n",
    "F.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5c9938b",
   "metadata": {},
   "source": [
    "The process is similar for the `V` file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f70200c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "v0",
         "rawType": "float64",
         "type": "float"
        }
       ],
       "ref": "39bff0e1-2215-40a1-a231-a70a552fbdd4",
       "rows": [
        [
         "0",
         "87.49138200642612"
        ],
        [
         "1",
         "448.4044984092687"
        ],
        [
         "2",
         "667.8783402495403"
        ],
        [
         "3",
         "716.3466466646655"
        ],
        [
         "4",
         "703.7835985371986"
        ]
       ],
       "shape": {
        "columns": 1,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>v0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>87.491382</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>448.404498</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>667.878340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>716.346647</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>703.783599</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           v0\n",
       "0   87.491382\n",
       "1  448.404498\n",
       "2  667.878340\n",
       "3  716.346647\n",
       "4  703.783599"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "V = compute_V([lambda X: sphere(X) - 2], X)\n",
    "V.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4eb5012b",
   "metadata": {},
   "source": [
    "Computing a `D` file is straightforward:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "077bfb78",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "('id1', 'id2')",
         "rawType": "object",
         "type": "unknown"
        },
        {
         "name": "d",
         "rawType": "float64",
         "type": "float"
        }
       ],
       "ref": "c190d694-7f72-4052-810c-8a25ed6496f0",
       "rows": [
        [
         "(np.int64(0), np.int64(1))",
         "0.7275671928410468"
        ],
        [
         "(np.int64(0), np.int64(2))",
         "1.1278538383433099"
        ],
        [
         "(np.int64(0), np.int64(3))",
         "0.9714903533031485"
        ],
        [
         "(np.int64(0), np.int64(4))",
         "1.5832031549975716"
        ],
        [
         "(np.int64(0), np.int64(5))",
         "1.3069942877601373"
        ]
       ],
       "shape": {
        "columns": 1,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id1</th>\n",
       "      <th>id2</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">0</th>\n",
       "      <th>1</th>\n",
       "      <td>0.727567</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.127854</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.971490</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.583203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1.306994</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                d\n",
       "id1 id2          \n",
       "0   1    0.727567\n",
       "    2    1.127854\n",
       "    3    0.971490\n",
       "    4    1.583203\n",
       "    5    1.306994"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "D = compute_D({\"X\": X}, metric='canberra') # you can define a function or specify any of scipy's distance functions\n",
    "D.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "801a7b8b",
   "metadata": {},
   "source": [
    "The `N` file is equally straightforward; either specify a neighbourhood function of supply the literal `\"hilbert-curve\"` to use the Hilbert curve to efficiently generate neighbourhood information (see {func}`pyxla.sampling.hilbert_curve_neighbour_sampling`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3b210d54",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.microsoft.datawrangler.viewer.v0+json": {
       "columns": [
        {
         "name": "index",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "id1",
         "rawType": "int64",
         "type": "integer"
        },
        {
         "name": "id2",
         "rawType": "int64",
         "type": "integer"
        }
       ],
       "ref": "e589e661-e382-4b3e-9014-7279f41a06c7",
       "rows": [
        [
         "0",
         "0",
         "1"
        ],
        [
         "1",
         "0",
         "2"
        ],
        [
         "2",
         "0",
         "8"
        ],
        [
         "3",
         "0",
         "9"
        ],
        [
         "4",
         "0",
         "11"
        ]
       ],
       "shape": {
        "columns": 2,
        "rows": 5
       }
      },
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id1</th>\n",
       "      <th>id2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id1  id2\n",
       "0    0    1\n",
       "1    0    2\n",
       "2    0    8\n",
       "3    0    9\n",
       "4    0   11"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def randomly_neighbours(a, b):\n",
    "    import random\n",
    "    return random.choice([True, False])\n",
    "\n",
    "N = compute_N({\"X\": X}, neighbourhood_func=randomly_neighbours)\n",
    "N.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}