Implementation of paper:
- Seiji Maekawa, Hayate Iso, Nikita Bhutani. The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs
- Introduced Distinctive Feature Mining (DFM) and DiFBENCH – Proposed a new task and benchmark framework to systematically evaluate LLMs’ statistical reasoning by identifying globally rare features across document collections.
- Large-scale empirical evaluation – Conducted the first comprehensive study on ten state-of-the-art LLMs, revealing that even advanced reasoning models degrade significantly with scale and often misclassify frequent features as distinctive, providing computational evidence of base rate neglect.
- Mitigation via explicit verification prompting – Demonstrated a simple yet effective prompting strategy that improves F1 scores by 65% relative, highlighting both practical mitigation and persistent limitations in multi-document comparative reasoning.
conda create -n difbench python=3.11
conda activate difbench
pip install -r requirements.txt
The following gives a quick overview of how to use DiFBench for evaluating distinctive feature mining tasks.
bash scripts/resume_dfm.sh
bash scripts/news_dfm.sh
The results look like:
{
"0": {
"average_scores": {
"micro_avg_precision": 1.0,
"micro_avg_recall": 1.0,
"micro_avg_f1_score": 1.0,
"macro_avg_precision": 1.0,
"macro_avg_recall": 1.0,
"macro_avg_f1_score": 1.0,
"input_tokens": 7008,
"output_tokens": 15101,
"reasoning_tokens": 14464
},
"outputs": [
{
"summary_id": 1,
"reasoning": "I listed every bullet from summary 1, then checked how many other summaries contained the exact same wording. Only two lines met the \u22642-occurrence rule: the 3.09:1 axle ratio line (appears here and in summary 7 only) and the VIN-matching perk line (unique to this summary).",
"output": [
"3.09:1 axle ratio balances highway comfort with quarter-mile thrust.",
"VIN-matching perk for 2018 owners requires pre-reserved serial blocks on assembly line."
]
},
{
"summary_id": 2,
"reasoning": "All but one of summary 2\u2019s lines were found in three or more other summaries. The head-bolt/stud line appears only here and in summary 5, so it qualifies.",
"output": [
"Head bolts replaced by studs to prevent gasket lift under roughly 2,500-psi peak combustion loads."
]
},
...,
"1":{
...}
ID | OSS Component Name | Modified | Copyright Holder | Upstream Link | License |
---|---|---|---|---|---|
1 | DiFBench | No | Megagon Labs | link | BSD-3-Clause license |
You can find our generated resume features and news insights under ./data/
Each dataset is stored as a separate .jsonl
file in the same folder. The filename corresponds to the category name (e.g., Legal_occupations.jsonl
).
In each file:
- Each row represents a source document (e.g., a resume or a news article).
- Each column represents a set of features in a specific section of the document.
The 10 dataset categories are:
- Resume Domain:
Computer_and_mathematical_occupations
Life_physical_and_social_science_occupations
Legal_occupations
Architecture_and_engineering_occupations
Healthcare_occupations
- News Summary Domain:
topic1
topic2
topic3
topic4
topic5
You can easily load any of the datasets using Python with the pandas
library. Make sure the .jsonl
files are in the same directory as your script, or provide the correct path to the files.
Here is an example of how to load a single dataset:
import os
import pandas as pd
# List of all dataset categories
categories = [
"Computer_and_mathematical_occupations",
"Life_physical_and_social_science_occupations",
"Legal_occupations",
"Architecture_and_engineering_occupations",
"Healthcare_occupations",
"topic1",
"topic2",
"topic3",
"topic4",
"topic5",
]
# --- Load a single dataset ---
# Select a category to load
category_to_load = categories[0]
# Define the path to the data file
# Assumes the data files are in the current directory ("./")
data_path = f"./{category_to_load}.jsonl"
# Load the dataset into a pandas DataFrame
df = pd.read_json(data_path, lines=True)
# Display the first few rows of the dataframe
print(f"Successfully loaded {category_to_load}:")
print(df.head())
@misc{maekawa2025distinctive,
title={The Rarity Blind Spot: A Framework for Evaluating Statistical Reasoning in LLMs},
author={Seiji Maekawa and Hayate Iso and Nikita Bhutani},
url={https://arxiv.org/abs/2509.00245},
year={2025}
}
Embedded in, or bundled with, this product are open source software (OSS) components, datasets and other third party components identified below. The license terms respectively governing the datasets and third-party components continue to govern those portions, and you agree to those license terms, which, when applicable, specifically limit any distribution. You may receive a copy of, distribute and/or modify any open source code for the OSS component under the terms of their respective licenses, which may be BSD 3 clause license and Apache 2.0 license. In the event of conflicts between Megagon Labs, Inc., license conditions and the Open Source Software license conditions, the Open Source Software conditions shall prevail with respect to the Open Source Software portions of the software. You agree not to, and are not permitted to, distribute actual datasets used with the OSS components listed below. You agree and are limited to distribute only links to datasets from known sources by listing them in the datasets overview table below. You are permitted to distribute derived datasets of data sets from known sources by including links to original dataset source in the datasets overview table below. You agree that any right to modify datasets originating from parties other than Megagon Labs, Inc. are governed by the respective third party’s license conditions. All OSS components and datasets are distributed WITHOUT ANY WARRANTY, without even implied warranty such as for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, and without any liability to or claim against any Megagon Labs, Inc. entity other than as explicitly documented in this README document. You agree to cease using any part of the provided materials if you do not agree with the terms or the lack of any warranty herein. While Megagon Labs, Inc., makes commercially reasonable efforts to ensure that citations in this document are complete and accurate, errors may occur. If you see any error or omission, please help us improve this document by sending information to [email protected].