Skip to content

Commit ce3bbfa

Browse files
fdosanikorvalds
andauthored
Release v0.18.0 (#439)
* fix: add permissions for contents in publish-docs workflow (#428) * fix: add permissions for contents in publish-docs workflow * Update ruff badge in README * chore: bumping pandas version (#429) * bumping pandas version * updating gha pandas matrix * bump version to 0.17.1 * chore: version updates and cleanup (#437) * chore: update dependencies and version for compatibility improvements * fix: support for pyspark.sql.connect.dataframe in SparkSQLCompare * fix: update numpy dependency for Python version compatibility * fix: adjust numpy installation for Python 3.10 and 3.11/3.12 compatibility * fix: standardize YAML syntax and fix numpy logic * fix: correct conditional syntax for numpy installation in GitHub Actions workflow * fix: cleaning up actions in core, spark, and fugue * fix: update pytest command to use correct test directory for spark tests * fix: remove Python 3.12 from test matrix and correct numpy installation condition * Improve performance of SparkSQLCompare by using .withColumns() (#438) * feat: improve performance of withColumn operations and refactor column comparison logic * feat: improve performance of withColumn operations and refactor column comparison logic * [WIP] bumping versions * refactor: rename columns_equal_expr to columns_equal for consistency * fix: reverting to original logic * docs: update benchmark results and remove outdated data * reverting mismatch_counts aggregation * refactor: optimize mismatch counting by simplifying expression handling --------- Co-authored-by: korvalds <[email protected]> --------- Co-authored-by: korvalds <[email protected]>
2 parents db86acb + 44ef384 commit ce3bbfa

File tree

8 files changed

+366
-270
lines changed

8 files changed

+366
-270
lines changed

.github/workflows/test-package.yml

Lines changed: 73 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -20,119 +20,118 @@ jobs:
2020
- name: Set up Python
2121
uses: actions/setup-python@v5
2222
with:
23-
python-version: "3.10"
23+
python-version: "3.10"
2424
- name: Install dependencies
2525
run: python -m pip install .[qa]
2626
- name: Linting by ruff
2727
run: ruff check
2828
- name: Formatting by ruff
2929
run: ruff format --check
30-
test-dev-install:
3130

31+
test-spark-install:
3232
runs-on: ubuntu-latest
3333
strategy:
3434
fail-fast: false
3535
matrix:
36-
python-version: ['3.10', '3.11', '3.12']
37-
spark-version: [3.2.4, 3.3.4, 3.4.4, 3.5.6]
36+
python-version: ["3.10", "3.11"]
37+
spark-version: [3.4.4, 3.5.6]
3838
pandas-version: [2.3.1, 1.5.3]
39-
numpy-version: [2.2.6, 1.26.4]
39+
numpy-version: [2.3.2, 1.26.4]
4040
exclude:
41-
- python-version: '3.11'
42-
spark-version: 3.2.4
43-
- python-version: '3.11'
44-
spark-version: 3.3.4
4541
- pandas-version: 1.5.3
46-
numpy-version: 2.2.6
42+
numpy-version: 2.3.2
43+
4744
env:
4845
PYTHON_VERSION: ${{ matrix.python-version }}
4946
SPARK_VERSION: ${{ matrix.spark-version }}
47+
PANDAS_VERSION: ${{ matrix.pandas-version }}
48+
NUMPY_VERSION: ${{ matrix.numpy-version }}
5049

5150
steps:
52-
- uses: actions/checkout@v3
53-
54-
- name: Set up Python ${{ matrix.python-version }}
55-
uses: actions/setup-python@v5
56-
with:
57-
python-version: ${{ matrix.python-version }}
58-
59-
- name: Setup Java JDK
60-
uses: actions/setup-java@v3
61-
with:
62-
java-version: '8'
63-
distribution: 'adopt'
64-
65-
- name: Install Spark, Pandas, and Numpy
66-
run: |
67-
python -m pip install --upgrade pip
68-
python -m pip install pytest pytest-spark pypandoc
69-
python -m pip install pyspark[connect]==${{ matrix.spark-version }}
70-
python -m pip install pandas==${{ matrix.pandas-version }}
71-
python -m pip install numpy==${{ matrix.numpy-version }}
72-
73-
- name: Install Datacompy without Snowflake/Snowpark if Python 3.12
74-
if: ${{ matrix.python-version == '3.12' }}
75-
run: |
76-
python -m pip install .[dev_no_snowflake]
77-
78-
- name: Install Datacompy with all dev dependencies if Python 3.10, or 3.11
79-
if: ${{ matrix.python-version != '3.12' }}
80-
run: |
81-
python -m pip install .[dev]
82-
83-
- name: Test with pytest
84-
run: |
85-
python -m pytest tests/ --ignore=tests/test_snowflake.py
51+
- uses: actions/checkout@v3
8652

87-
test-bare-install:
53+
- name: Set up Python ${{ matrix.python-version }}
54+
uses: actions/setup-python@v5
55+
with:
56+
python-version: ${{ matrix.python-version }}
57+
58+
- name: Setup Java JDK
59+
uses: actions/setup-java@v3
60+
with:
61+
java-version: "8"
62+
distribution: "adopt"
63+
64+
- name: Install Spark, Pandas, and Numpy
65+
run: |
66+
python -m pip install pyspark[connect]==${{ matrix.spark-version }}
67+
python -m pip install pandas==${{ matrix.pandas-version }}
68+
69+
- name: Install numpy<=2.2.6 if Python 3.10 and matrix.numpy-version == 2.3.2
70+
if: ${{ matrix.python-version == '3.10' && matrix.numpy-version == '2.3.2' }}
71+
run: |
72+
python -m pip install numpy==2.2.6
73+
74+
- name: Install numpy if Python 3.11 or (3.10 and matrix.numpy-version == 1.26.4)
75+
if: ${{ (matrix.python-version == '3.11') || ( matrix.python-version == '3.10' && matrix.numpy-version == '1.26.4' ) }}
76+
run: |
77+
python -m pip install numpy==${{ matrix.numpy-version }}
78+
79+
- name: Install datacompy
80+
run: |
81+
python -m pip install .[spark,tests-spark]
8882
83+
- name: Test with pytest
84+
run: |
85+
python -m pytest tests/test_spark
86+
87+
88+
test-bare-install:
8989
runs-on: ubuntu-latest
9090
strategy:
9191
fail-fast: false
9292
matrix:
93-
python-version: ['3.10', '3.11', '3.12']
93+
python-version: ["3.10", "3.11", "3.12"]
9494

9595
env:
9696
PYTHON_VERSION: ${{ matrix.python-version }}
9797

9898
steps:
99-
- uses: actions/checkout@v3
99+
- uses: actions/checkout@v3
100100

101-
- name: Set up Python ${{ matrix.python-version }}
102-
uses: actions/setup-python@v5
103-
with:
104-
python-version: ${{ matrix.python-version }}
101+
- name: Set up Python ${{ matrix.python-version }}
102+
uses: actions/setup-python@v5
103+
with:
104+
python-version: ${{ matrix.python-version }}
105105

106-
- name: Install datacompy
107-
run: |
108-
python -m pip install --upgrade pip
109-
python -m pip install .[tests]
110-
- name: Test with pytest
111-
run: |
112-
python -m pytest tests/ --ignore=tests/test_snowflake.py
106+
- name: Install datacompy
107+
run: |
108+
python -m pip install --upgrade pip
109+
python -m pip install .[tests]
110+
- name: Test with pytest
111+
run: |
112+
python -m pytest tests/ --ignore=tests/test_snowflake.py
113113
114114
test-fugue-install-no-spark:
115-
116115
runs-on: ubuntu-latest
117116
strategy:
118117
fail-fast: false
119118
matrix:
120-
python-version: ['3.10', '3.11']
119+
python-version: ["3.10", "3.11"]
121120
env:
122121
PYTHON_VERSION: ${{ matrix.python-version }}
123122

124123
steps:
125-
- uses: actions/checkout@v3
126-
127-
- name: Set up Python ${{ matrix.python-version }}
128-
uses: actions/setup-python@v5
129-
with:
130-
python-version: ${{ matrix.python-version }}
131-
132-
- name: Install datacompy
133-
run: |
134-
python -m pip install --upgrade pip
135-
python -m pip install .[tests,fugue]
136-
- name: Test with pytest
137-
run: |
138-
python -m pytest tests/ --ignore=tests/test_snowflake.py
124+
- uses: actions/checkout@v3
125+
126+
- name: Set up Python ${{ matrix.python-version }}
127+
uses: actions/setup-python@v5
128+
with:
129+
python-version: ${{ matrix.python-version }}
130+
131+
- name: Install datacompy
132+
run: |
133+
python -m pip install --upgrade pip
134+
python -m pip install .[tests,fugue]
135+
- name: Test with pytest
136+
run: |
137+
python -m pytest tests/ --ignore=tests/test_snowflake.py

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -56,18 +56,18 @@ With the move to Pandas on Spark API and compatability issues with Pandas 2+ we
5656
with the Pandas on Spark implementation. Spark plans to support Pandas 2 in [Spark 4](https://issues.apache.org/jira/browse/SPARK-44101)
5757

5858

59-
| | Spark 3.2.4 | Spark 3.3.4 | Spark 3.4.2 | Spark 3.5.1 |
60-
|-------------|-------------|-------------|-------------|-------------|
61-
| Python 3.10 | || ||
62-
| Python 3.11 | || ||
63-
| Python 3.12 | || ||
59+
| | Spark 3.4.4 | Spark 3.5.6 |
60+
|-------------|--------------|-------------|
61+
| Python 3.10 | ||
62+
| Python 3.11 | ||
63+
| Python 3.12 | ||
6464

6565

6666
| | Pandas < 1.5.3 | Pandas >=2.0.0 |
6767
|------------------------|----------------|----------------|
68-
| ``Compare`` || |
69-
| ``SparkSQLCompare`` || |
70-
| Fugue || |
68+
| ``Compare`` |||
69+
| ``SparkSQLCompare`` |||
70+
| Fugue |||
7171

7272

7373

datacompy/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
Then extended to carry that functionality over to Spark Dataframes.
1919
"""
2020

21-
__version__ = "0.17.1"
21+
__version__ = "0.18.0"
2222

2323
import platform
2424
from warnings import warn

0 commit comments

Comments
 (0)