LongCodeZip

This repository is the official implementation of LongCodeZip, a novel two-stage long code compression method. Our paper "LongCodeZip: Compress Long Context for Code Language Models" has been accepted to ASE 2025.

Method Overview

LongCodeZip introduces a two-stage code compression framework specifically designed for code LLMs:

Coarse-grained Compression: Function-based chunking and ranking using conditional perplexity with respect to the query to select the most relevant functions.
Fine-grained Compression: Entropy-based block detection combined with 0/1 knapsack optimization to maximize relevance within adaptive token budgets.

The method is plug-and-play and can be integrated with existing code LLMs to achieve significant compression ratios while maintaining or improving task performance.

Installation

You can install directly from the GitHub repository:

pip install git+https://github.com/YerbaPage/LongCodeZip.git

Or clone and install in development mode:

git clone https://github.com/YerbaPage/LongCodeZip.git
cd LongCodeZip
pip install -e .

Quick Demo

We provide a simple demo (demo.py) to help you get started with LongCodeZip.

python demo.py

The demo showcases both compression modes: coarse-grained compression (function-level selection only) and the full two-stage compression (with fine-grained token optimization). It demonstrates how LongCodeZip compresses a code file based on a given query and achieves different compression ratios.

Basic Example

from longcodezip import LongCodeZip

# Initialize the compressor
compressor = LongCodeZip(model_name="Qwen/Qwen2.5-Coder-7B-Instruct")

# Compress code with a query
result = compressor.compress_code_file(
    code=<your_code_string>,
    query=<your_query>,
    instruction=<your_instruction>,
    rate=0.5,  # Keep 50% of tokens
    rank_only=False, # Set to True to only rank and select contexts without fine-grained compression
)

# Access compressed results
compressed_code = result['compressed_code']
compressed_prompt = result['compressed_prompt']  # Full prompt with instruction
compression_ratio = result['compression_ratio']

References

@article{shi2025longcodezip,
  title={LongCodeZip: Compress Long Context for Code Language Models},
  author={Shi, Yuling and Qian, Yichun and Zhang, Hongyu and Shen, Beijun and Gu, Xiaodong},
  journal={arXiv preprint arXiv:2510.00446},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
experiments		experiments
longcodezip		longcodezip
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LongCodeZip

Method Overview

Installation

Quick Demo

Basic Example

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

YerbaPage/LongCodeZip

Folders and files

Latest commit

History

Repository files navigation

LongCodeZip

Method Overview

Installation

Quick Demo

Basic Example

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages