Skip to content

Conversation

@leakyH
Copy link

@leakyH leakyH commented Apr 6, 2025

A simple demo:

import seaborn as sns
import pandas as pd
df = pd.DataFrame(dict(x=rs.normal(size=60),
                           y=rs.randint(0, 4, size=(60)),
                           z=rs.gamma(3, size=60),
                           z2=rs.gamma(6, size=60),
                           }
df_with_dupe = df.copy()
df_with_dupe.columns = ["x", "y", "z", "z"] #sometimes by mistake, or the z/z2 are not important
sns.pairplot(df_with_dupe, vars=['x', 'y'])  # raise ValueError

The Traceback:

> Traceback (most recent call last):
>   File "/data1/home/----/plotReferenceMap.py", line 153, in <module>
>     sns.pairplot(df_merge,
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/axisgrid.py", line 2119, in pairplot
>     grid = PairGrid(data, vars=vars, x_vars=x_vars, y_vars=y_vars, hue=hue,
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/axisgrid.py", line 1251, in __init__
>     numeric_cols = self._find_numeric_cols(data)
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/axisgrid.py", line 1674, in _find_numeric_cols
>     if variable_type(data[col]) == "numeric":
>   File "/data1/home/--/lib/python3.9/site-packages/seaborn/_base.py", line 1498, in variable_type
>     vector = pd.Series(vector)
>   File "/data1/home/--/lib/python3.9/site-packages/pandas/core/series.py", line 367, in __init__
>     if is_empty_data(data) and dtype is None:
>   File "/data1/home/--/lib/python3.9/site-packages/pandas/core/construction.py", line 818, in is_empty_data
>     is_simple_empty = is_list_like_without_dtype and not data
>   File "/data1/home/--/lib/python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
>     raise ValueError(
> ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The error happens inside self._find_numeric_cols(data), which is unnecessary when vars is provided. So I skip it and extend to some other similar scenarios:

  1. no duplication in df.columns, but duplications in vars: Gives a simple warning. It just generates unexpected figures but does not crash.
  2. duplication in df.columns, and one of the duplicants is included in vars: raise ValueError in PairGrid Class, specify the related duplicants.
  3. duplication in df.columns, and vars is not provided: raise ValueError in PairGrid Class, specify the all duplicants.

These tests are all included in the test_axisgrid.py

Please let me know if any other modifications are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant