-
-
Couldn't load subscription status.
- Fork 1.2k
Description
What happened?
The .to_dataframe function describes in the documentation "Other coordinates are included as columns in the DataFrame.".
When applying the function on a Dataset that contains an index that is not the same 'name' as the corresponding dimension, the coordinate is not included in the resulting Pandas DataFrame. E.g.
import xarray as xr
import pandas as pd
import numpy as np
ds_temp = xr.Dataset(data_vars=dict(temp=(["time", "pos"], np.array([[5, 10, 15, 20, 25]]))), coords=dict(pf=("pos", [1., 2., 4.2, 8., 10.]), time=("time", [pd.to_datetime("2025-01-01")]))).set_xindex("pf")The example Dataset looks like
<xarray.Dataset> Size: 88B
Dimensions: (time: 1, pos: 5)
Coordinates:
* time (time) datetime64[ns] 8B 2025-01-01
* pf (pos) float64 40B 1.0 2.0 4.2 8.0 10.0
Dimensions without coordinates: pos
Data variables:
temp (time, pos) int64 40B 5 10 15 20 25
Converting the Dataset to a Pandas DataFrame:
ds_temp.to_dataframe()The resulting DataFrame is missing the pf coordinate in the returned DataFrame:
temp
time pos
2025-01-01 0 5
1 10
2 15
3 20
4 25
Dropping the index and applying to_dataframe does actually include the respective coords in the DataFrame:
>>> ds_temp.drop_indexes("pf").to_dataframe()
temp pf
time pos
2025-01-01 0 5 1.0
1 10 2.0
2 15 4.2
3 20 8.0
4 25 10.0
This behavior changed in between recent release as in version 2025.1.2 the column was included. I assume this change results from the support for ExtensionArray.
What did you expect to happen?
An index that has not the same name as the dimension is also included in the resulting DataFrame, in the case of the example having pf in the final DataFrame.
Minimal Complete Verifiable Example
import xarray as xr
import pandas as pd
import numpy as np
xr.show_versions()
ds_temp = xr.Dataset(data_vars=dict(temp=(["time", "pos"], np.array([[5, 10, 15, 20, 25]]))), coords=dict(pf=("pos", [1., 2., 4.2, 8., 10.]), time=("time", [pd.to_datetime("2025-01-01")]))).set_xindex("pf")
df = ds_temp.to_dataframe()
assert "pf" in df.columnsSteps to reproduce
The resulting DataFrame lacks the pf as a column
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 6.1.0-37-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.4-development
xarray: 2025.10.1
pandas: 2.3.3
numpy: 2.2.3
scipy: 1.15.2
netCDF4: 1.7.2
pydap: None
h5netcdf: None
h5py: None
zarr: 3.0.4
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.2.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.2.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.8.0
pip: 25.0.1
conda: None
pytest: 8.3.5
mypy: 1.15.0
IPython: 9.0.1
sphinx: 8.2.3