{ "cells": [ { "cell_type": "markdown", "id": "95097ed3", "metadata": {}, "source": [ "# Auxiliary Functions" ] }, { "cell_type": "markdown", "id": "2d5148d1", "metadata": {}, "source": [ "{nb-download}`Download this as a Jupyter notebook `" ] }, { "cell_type": "markdown", "id": "7dc47ae6", "metadata": {}, "source": [ "This notebook covers the usage of the auxiliary function provided in `pyXla`." ] }, { "cell_type": "markdown", "id": "6c4b2b10", "metadata": {}, "source": [ "A key feature of the `pyXla` framework is the separation of sampling and analysis. A set of functions are provided to support sampling. They are:\n", "\n", "1. {func}`pyxla.util.sample_X`\n", "1. {func}`pyxla.util.compute_F`\n", "1. {func}`pyxla.util.compute_V`\n", "1. {func}`pyxla.util.compute_D`\n", "1. {func}`pyxla.util.compute_N`\n", "\n", "Each function corresponds to a given input file as indicated by it suffix.\n", "\n", "When a sample is loaded declaratively via domain and function specification (method 3 in {doc}`loading_and_sampling`), these function are used under the hood. These function are available to the user for finer control over sampling." ] }, { "cell_type": "markdown", "id": "8410c616", "metadata": {}, "source": [ "The auxiliary functions are all imported from {mod}`pyxla.util`:" ] }, { "cell_type": "code", "execution_count": 1, "id": "3d0b6a3f", "metadata": {}, "outputs": [], "source": [ "from pyxla.util import sample_X, compute_F, compute_V, compute_D, compute_N" ] }, { "cell_type": "markdown", "id": "9c293d9f", "metadata": {}, "source": [ "One can generate an `X` file as below:" ] }, { "cell_type": "code", "execution_count": 2, "id": "428cac3a", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "x0", "rawType": "float64", "type": "float" }, { "name": "x1", "rawType": "float64", "type": "float" } ], "ref": "6f69164d-4090-40ef-beaa-550f15a9d479", "rows": [ [ "0", "2.206135162903853", "9.199149398146837" ], [ "1", "4.500343908200286", "20.740091685361307" ], [ "2", "17.21359626961158", "19.32797050174577" ], [ "3", "23.857364297488484", "12.213632360669257" ], [ "4", "26.52996954877949", "1.394386703189455" ] ], "shape": { "columns": 2, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x0x1
02.2061359.199149
14.50034420.740092
217.21359619.327971
323.85736412.213632
426.5299701.394387
\n", "
" ], "text/plain": [ " x0 x1\n", "0 2.206135 9.199149\n", "1 4.500344 20.740092\n", "2 17.213596 19.327971\n", "3 23.857364 12.213632\n", "4 26.529970 1.394387" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pyxla.sampling import HilbertCurveSampler\n", "\n", "sampler = HilbertCurveSampler(\n", " sample_size=100, dim=2, return_neighbourhood=True # will return an N file too\n", ")\n", "\n", "X, N = sample_X(sampler)\n", "X.head()" ] }, { "cell_type": "markdown", "id": "98dffd81", "metadata": {}, "source": [ "Specifying `return_neighbourhood=True` generates an `N` file as well:" ] }, { "cell_type": "code", "execution_count": 3, "id": "b7c28974", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "id1", "rawType": "int64", "type": "integer" }, { "name": "id2", "rawType": "int64", "type": "integer" } ], "ref": "f78ff6c3-5133-490b-a415-3e3ca2f54bb1", "rows": [ [ "0", "0", "1" ], [ "1", "1", "2" ], [ "2", "2", "3" ], [ "3", "3", "4" ], [ "4", "4", "5" ] ], "shape": { "columns": 2, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id1id2
001
112
223
334
445
\n", "
" ], "text/plain": [ " id1 id2\n", "0 0 1\n", "1 1 2\n", "2 2 3\n", "3 3 4\n", "4 4 5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N.head()" ] }, { "cell_type": "markdown", "id": "bae808cc", "metadata": {}, "source": [ "The `F` file can be generated from the `X` file by specifying an objective function or multiple objective functions:" ] }, { "cell_type": "code", "execution_count": 4, "id": "1fb486af", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "f0", "rawType": "float64", "type": "float" }, { "name": "f1", "rawType": "float64", "type": "float" } ], "ref": "d652a434-5724-4206-a641-96332e48f56d", "rows": [ [ "0", "89.49138200642612", "11.405284561050689" ], [ "1", "450.4044984092687", "25.240435593561592" ], [ "2", "669.8783402495403", "36.54156677135735" ], [ "3", "718.3466466646655", "36.07099665815774" ], [ "4", "705.7835985371986", "27.924356251968945" ] ], "shape": { "columns": 2, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
f0f1
089.49138211.405285
1450.40449825.240436
2669.87834036.541567
3718.34664736.070997
4705.78359927.924356
\n", "
" ], "text/plain": [ " f0 f1\n", "0 89.491382 11.405285\n", "1 450.404498 25.240436\n", "2 669.878340 36.541567\n", "3 718.346647 36.070997\n", "4 705.783599 27.924356" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def sphere(X):\n", " return X[0]**2 + X[1]**2\n", "\n", "def summation(X):\n", " return X.sum()\n", "\n", "F = compute_F([sphere, summation], X)\n", "F.head()" ] }, { "cell_type": "markdown", "id": "e5c9938b", "metadata": {}, "source": [ "The process is similar for the `V` file:" ] }, { "cell_type": "code", "execution_count": 5, "id": "f70200c0", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "v0", "rawType": "float64", "type": "float" } ], "ref": "39bff0e1-2215-40a1-a231-a70a552fbdd4", "rows": [ [ "0", "87.49138200642612" ], [ "1", "448.4044984092687" ], [ "2", "667.8783402495403" ], [ "3", "716.3466466646655" ], [ "4", "703.7835985371986" ] ], "shape": { "columns": 1, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
v0
087.491382
1448.404498
2667.878340
3716.346647
4703.783599
\n", "
" ], "text/plain": [ " v0\n", "0 87.491382\n", "1 448.404498\n", "2 667.878340\n", "3 716.346647\n", "4 703.783599" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "V = compute_V([lambda X: sphere(X) - 2], X)\n", "V.head()" ] }, { "cell_type": "markdown", "id": "4eb5012b", "metadata": {}, "source": [ "Computing a `D` file is straightforward:" ] }, { "cell_type": "code", "execution_count": 6, "id": "077bfb78", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "('id1', 'id2')", "rawType": "object", "type": "unknown" }, { "name": "d", "rawType": "float64", "type": "float" } ], "ref": "c190d694-7f72-4052-810c-8a25ed6496f0", "rows": [ [ "(np.int64(0), np.int64(1))", "0.7275671928410468" ], [ "(np.int64(0), np.int64(2))", "1.1278538383433099" ], [ "(np.int64(0), np.int64(3))", "0.9714903533031485" ], [ "(np.int64(0), np.int64(4))", "1.5832031549975716" ], [ "(np.int64(0), np.int64(5))", "1.3069942877601373" ] ], "shape": { "columns": 1, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
d
id1id2
010.727567
21.127854
30.971490
41.583203
51.306994
\n", "
" ], "text/plain": [ " d\n", "id1 id2 \n", "0 1 0.727567\n", " 2 1.127854\n", " 3 0.971490\n", " 4 1.583203\n", " 5 1.306994" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D = compute_D({\"X\": X}, metric='canberra') # you can define a function or specify any of scipy's distance functions\n", "D.head()" ] }, { "cell_type": "markdown", "id": "801a7b8b", "metadata": {}, "source": [ "The `N` file is equally straightforward; either specify a neighbourhood function of supply the literal `\"hilbert-curve\"` to use the Hilbert curve to efficiently generate neighbourhood information (see {func}`pyxla.sampling.hilbert_curve_neighbour_sampling`)." ] }, { "cell_type": "code", "execution_count": 7, "id": "3b210d54", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "id1", "rawType": "int64", "type": "integer" }, { "name": "id2", "rawType": "int64", "type": "integer" } ], "ref": "e589e661-e382-4b3e-9014-7279f41a06c7", "rows": [ [ "0", "0", "1" ], [ "1", "0", "2" ], [ "2", "0", "8" ], [ "3", "0", "9" ], [ "4", "0", "11" ] ], "shape": { "columns": 2, "rows": 5 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
id1id2
001
102
208
309
4011
\n", "
" ], "text/plain": [ " id1 id2\n", "0 0 1\n", "1 0 2\n", "2 0 8\n", "3 0 9\n", "4 0 11" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def randomly_neighbours(a, b):\n", " import random\n", " return random.choice([True, False])\n", "\n", "N = compute_N({\"X\": X}, neighbourhood_func=randomly_neighbours)\n", "N.head()" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 5 }