Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion cyclops/utils/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from typing import Any, Generator, List, Optional

import numpy as np
import numpy.typing as npt
import pandas as pd

from cyclops.utils.log import setup_logging
Expand Down Expand Up @@ -208,7 +209,7 @@ def load_dataframe(


def save_array(
data: np.typing.ArrayLike,
data: npt.ArrayLike,
save_path: str,
file_format: str = "npy",
log: bool = True,
Expand Down
14 changes: 14 additions & 0 deletions docs/source/evaluation.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,16 @@
Evaluation
==========

The Evaluation API equips you with a rich toolbox to assess your models across key dimensions. Dive into detailed performance metrics, unveil potential fairness concerns, and gain granular insights through data slicing.

Key capabilities:

* Performance: Employ a robust selection of common metrics to evaluate your model's effectiveness and identify areas for improvement.
* Fairness: Uncover and analyze potential biases within your model to ensure responsible and equitable outcomes.
* Data slicing: Isolate the model's behavior on specific subsets of your data, revealing performance nuances across demographics, features, or other important characteristics.

Follow the example below for the instructions:

.. toctree::

examples/metrics.ipynb
302 changes: 302 additions & 0 deletions docs/source/examples/metrics.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,302 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Breast Cancer Classification and Evaluation\n",
"\n",
"The Breast Cancer dataset is a well-suited example for demonstrating Cyclops features due to its two distinct classes (binary classification) and complete absence of missing values. This clean and organized structure makes it an ideal starting point for exploring Cyclops Evaluator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from datasets.arrow_dataset import Dataset\n",
"from sklearn import datasets\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.svm import SVC\n",
"\n",
"from cyclops.data.slicer import SliceSpec\n",
"from cyclops.evaluate import evaluator\n",
"from cyclops.evaluate.fairness import evaluate_fairness\n",
"from cyclops.evaluate.metrics import BinaryAccuracy, create_metric\n",
"from cyclops.evaluate.metrics.experimental import BinaryAUROC, BinaryAveragePrecision\n",
"from cyclops.evaluate.metrics.experimental.metric_dict import MetricDict"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Loading the data\n",
"breast_cancer_data = datasets.load_breast_cancer(as_frame=True)\n",
"X, y = breast_cancer_data.data, breast_cancer_data.target"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Features\n",
"Just taking a quick look at features and their stats..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df = breast_cancer_data.frame\n",
"df.describe().T"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Splitting into train and test\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" X,\n",
" y,\n",
" test_size=0.1,\n",
" random_state=13,\n",
")\n",
"\n",
"# Use SVM classifier for binary classification\n",
"svc = SVC(C=10, gamma=0.01, probability=True)\n",
"svc.fit(X_train, y_train)\n",
"\n",
"# model predictions\n",
"y_pred = svc.predict(X_test)\n",
"y_pred_prob = svc.predict_proba(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use Cyclops evaluation metrics to evaluate our model's performance. You can either use each metric individually by calling them, or define a ``MetricDict`` object.\n",
"Here, we show both methods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Individual Metrics\n",
"In case you need only a single metric, you can create an object of the desired metric and call it on your ground truth and predictions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bin_acc_metric = BinaryAccuracy()\n",
"bin_acc_metric(y_test.values, np.float64(y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using ``MetricDict``\n",
"You may define a collection of metrics in case you need more metrics. It also speeds up the metric calculation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"metric_names = [\n",
" \"binary_accuracy\",\n",
" \"binary_precision\",\n",
" \"binary_recall\",\n",
" \"binary_f1_score\",\n",
" \"binary_roc_curve\",\n",
"]\n",
"metrics = [\n",
" create_metric(metric_name, experimental=True) for metric_name in metric_names\n",
"]\n",
"metric_collection = MetricDict(metrics)\n",
"metric_collection(y_test.values, np.float64(y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may reset the metrics collection and add other metrics:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"metric_collection.reset()\n",
"metric_collection.add_metrics(BinaryAveragePrecision(), BinaryAUROC())\n",
"metric_collection(y_test.values, np.float64(y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Slicing\n",
"\n",
"In addition to overall metrics, it might be interesting to see how the model performs on certain subpopulation or subsets. We can define these subsets using ``SliceSpec`` objects."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spec_list = [\n",
" {\n",
" \"worst radius\": {\n",
" \"min_value\": 10.0,\n",
" \"max_value\": 15.0,\n",
" \"min_inclusive\": True,\n",
" \"max_inclusive\": False,\n",
" },\n",
" },\n",
" {\n",
" \"worst radius\": {\n",
" \"min_value\": 15.0,\n",
" \"max_value\": 37.0,\n",
" \"min_inclusive\": True,\n",
" \"max_inclusive\": False,\n",
" },\n",
" },\n",
"]\n",
"slice_spec = SliceSpec(spec_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preparing Result\n",
"\n",
"Cyclops Evaluator takes data as a HuggingFace Dataset object, so we combine predictions and features in a dataframe, and create a `Dataset` object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Combine result and features for test data\n",
"df = pd.concat([X_test, pd.DataFrame(y_test, columns=[\"target\"])], axis=1)\n",
"df[\"preds\"] = y_pred\n",
"df[\"preds_prob\"] = y_pred_prob[:, 1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create Dataset object\n",
"breast_cancer_data = Dataset.from_pandas(df)\n",
"\n",
"breast_cancer_sliced_result = evaluator.evaluate(\n",
" dataset=breast_cancer_data,\n",
" metrics=metric_collection, # type: ignore[list-item]\n",
" target_columns=\"target\",\n",
" prediction_columns=\"preds_prob\",\n",
" slice_spec=slice_spec,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's the evaluation result for the data slices we defined:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"breast_cancer_sliced_result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fairness Evaluator\n",
"\n",
"The Breast Cancer dataset may not be a very good example to apply fairness, but to demonstrate how you can use our fairness evaluator, we apply it to `mean texture` feature. It's recommended to use it on features with discrete values. For optimal results, the feature should have less than 50 unique categories."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fairness_result = evaluate_fairness(\n",
" dataset=breast_cancer_data,\n",
" metrics=\"binary_precision\", # type: ignore[list-item]\n",
" groups=\"mean texture\",\n",
" target_columns=\"target\",\n",
" prediction_columns=\"preds_prob\",\n",
")\n",
"fairness_result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "cyclops",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
1 change: 1 addition & 0 deletions docs/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@ Tutorials
=========

.. toctree::
:maxdepth: 3

tutorials_use_cases