VectorInstitute · amrit110 · Mar 2, 2024 · Feb 15, 2024 · Feb 15, 2024 · Mar 1, 2024
diff --git a/cyclops/utils/file.py b/cyclops/utils/file.py
@@ -6,6 +6,7 @@
 from typing import Any, Generator, List, Optional
 
 import numpy as np
+import numpy.typing as npt
 import pandas as pd
 
 from cyclops.utils.log import setup_logging
@@ -208,7 +209,7 @@ def load_dataframe(
 
 
 def save_array(
-    data: np.typing.ArrayLike,
+    data: npt.ArrayLike,
     save_path: str,
     file_format: str = "npy",
     log: bool = True,

diff --git a/docs/source/evaluation.rst b/docs/source/evaluation.rst
@@ -1,2 +1,16 @@
 Evaluation
 ==========
+
+The Evaluation API equips you with a rich toolbox to assess your models across key dimensions. Dive into detailed performance metrics, unveil potential fairness concerns, and gain granular insights through data slicing.
+
+Key capabilities:
+
+    * Performance: Employ a robust selection of common metrics to evaluate your model's effectiveness and identify areas for improvement.
+    * Fairness: Uncover and analyze potential biases within your model to ensure responsible and equitable outcomes.
+    * Data slicing: Isolate the model's behavior on specific subsets of your data, revealing performance nuances across demographics, features, or other important characteristics.
+
+Follow the example below for the instructions:
+
+.. toctree::
+
+    examples/metrics.ipynb
diff --git a/docs/source/examples/metrics.ipynb b/docs/source/examples/metrics.ipynb
@@ -0,0 +1,302 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Breast Cancer Classification and Evaluation\n",
+    "\n",
+    "The Breast Cancer dataset is a well-suited example for demonstrating Cyclops features due to its two distinct classes (binary classification) and complete absence of missing values. This clean and organized structure makes it an ideal starting point for exploring Cyclops Evaluator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from datasets.arrow_dataset import Dataset\n",
+    "from sklearn import datasets\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.svm import SVC\n",
+    "\n",
+    "from cyclops.data.slicer import SliceSpec\n",
+    "from cyclops.evaluate import evaluator\n",
+    "from cyclops.evaluate.fairness import evaluate_fairness\n",
+    "from cyclops.evaluate.metrics import BinaryAccuracy, create_metric\n",
+    "from cyclops.evaluate.metrics.experimental import BinaryAUROC, BinaryAveragePrecision\n",
+    "from cyclops.evaluate.metrics.experimental.metric_dict import MetricDict"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Loading the data\n",
+    "breast_cancer_data = datasets.load_breast_cancer(as_frame=True)\n",
+    "X, y = breast_cancer_data.data, breast_cancer_data.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Features\n",
+    "Just taking a quick look at features and their stats..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = breast_cancer_data.frame\n",
+    "df.describe().T"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Splitting into train and test\n",
+    "X_train, X_test, y_train, y_test = train_test_split(\n",
+    "    X,\n",
+    "    y,\n",
+    "    test_size=0.1,\n",
+    "    random_state=13,\n",
+    ")\n",
+    "\n",
+    "# Use SVM classifier for binary classification\n",
+    "svc = SVC(C=10, gamma=0.01, probability=True)\n",
+    "svc.fit(X_train, y_train)\n",
+    "\n",
+    "# model predictions\n",
+    "y_pred = svc.predict(X_test)\n",
+    "y_pred_prob = svc.predict_proba(X_test)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can use Cyclops evaluation metrics to evaluate our model's performance. You can either use each metric individually by calling them, or define a ``MetricDict`` object.\n",
+    "Here, we show both methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Individual Metrics\n",
+    "In case you need only a single metric, you can create an object of the desired metric and call it on your ground truth and predictions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bin_acc_metric = BinaryAccuracy()\n",
+    "bin_acc_metric(y_test.values, np.float64(y_pred))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Using ``MetricDict``\n",
+    "You may define a collection of metrics in case you need more metrics. It also speeds up the metric calculation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metric_names = [\n",
+    "    \"binary_accuracy\",\n",
+    "    \"binary_precision\",\n",
+    "    \"binary_recall\",\n",
+    "    \"binary_f1_score\",\n",
+    "    \"binary_roc_curve\",\n",
+    "]\n",
+    "metrics = [\n",
+    "    create_metric(metric_name, experimental=True) for metric_name in metric_names\n",
+    "]\n",
+    "metric_collection = MetricDict(metrics)\n",
+    "metric_collection(y_test.values, np.float64(y_pred))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You may reset the metrics collection and add other metrics:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metric_collection.reset()\n",
+    "metric_collection.add_metrics(BinaryAveragePrecision(), BinaryAUROC())\n",
+    "metric_collection(y_test.values, np.float64(y_pred))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data Slicing\n",
+    "\n",
+    "In addition to overall metrics, it might be interesting to see how the model performs on certain subpopulation or subsets. We can define these subsets using ``SliceSpec`` objects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "spec_list = [\n",
+    "    {\n",
+    "        \"worst radius\": {\n",
+    "            \"min_value\": 10.0,\n",
+    "            \"max_value\": 15.0,\n",
+    "            \"min_inclusive\": True,\n",
+    "            \"max_inclusive\": False,\n",
+    "        },\n",
+    "    },\n",
+    "    {\n",
+    "        \"worst radius\": {\n",
+    "            \"min_value\": 15.0,\n",
+    "            \"max_value\": 37.0,\n",
+    "            \"min_inclusive\": True,\n",
+    "            \"max_inclusive\": False,\n",
+    "        },\n",
+    "    },\n",
+    "]\n",
+    "slice_spec = SliceSpec(spec_list)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Preparing Result\n",
+    "\n",
+    "Cyclops Evaluator takes data as a HuggingFace Dataset object, so we combine predictions and features in a dataframe, and create a `Dataset` object:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Combine result and features for test data\n",
+    "df = pd.concat([X_test, pd.DataFrame(y_test, columns=[\"target\"])], axis=1)\n",
+    "df[\"preds\"] = y_pred\n",
+    "df[\"preds_prob\"] = y_pred_prob[:, 1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Create Dataset object\n",
+    "breast_cancer_data = Dataset.from_pandas(df)\n",
+    "\n",
+    "breast_cancer_sliced_result = evaluator.evaluate(\n",
+    "    dataset=breast_cancer_data,\n",
+    "    metrics=metric_collection,  # type: ignore[list-item]\n",
+    "    target_columns=\"target\",\n",
+    "    prediction_columns=\"preds_prob\",\n",
+    "    slice_spec=slice_spec,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And here's the evaluation result for the data slices we defined:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "breast_cancer_sliced_result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Fairness Evaluator\n",
+    "\n",
+    "The Breast Cancer dataset may not be a very good example to apply fairness, but to demonstrate how you can use our fairness evaluator, we apply it to `mean texture` feature. It's recommended to use it on features with discrete values. For optimal results, the feature should have less than 50 unique categories."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fairness_result = evaluate_fairness(\n",
+    "    dataset=breast_cancer_data,\n",
+    "    metrics=\"binary_precision\",  # type: ignore[list-item]\n",
+    "    groups=\"mean texture\",\n",
+    "    target_columns=\"target\",\n",
+    "    prediction_columns=\"preds_prob\",\n",
+    ")\n",
+    "fairness_result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "cyclops",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/source/tutorials.rst b/docs/source/tutorials.rst
@@ -2,5 +2,6 @@ Tutorials
 =========
 
 .. toctree::
+   :maxdepth: 3
 
    tutorials_use_cases