Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking

Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking
Mattia Segu, Luigi Piccinelli, Siyuan Li, Yung-Hsu Yang, Luc Van Gool, Bernt Schiele
ICLR 2025 Spotlight, Paper at arXiv 2410.01806

SambaMOTR is a novel tracking-by-propagation framework for multiple object tracking in complex scenarios like dance, sports, and animal groups. It leverages Samba, a linear-time set-of-sequences model that synchronizes state-spaces across tracklets to capture long-range dependencies, inter-tracklet interactions, and temporal occlusions. With an autoregressive memory mechanism and a simple uncertainty handling strategy (MaskObs), SambaMOTR tracks objects accurately through occlusions without hand-crafted heuristics. It achieves state-of-the-art results on DanceTrack, BFT, and SportsMOT.

_DanceTrack

_BFT

_{SportsMOT (Volleyball)}

_{SportsMOT (Football)}

News 🔥

2025.03.31: We’re releasing SambaMOTR's weights based on the stronger DAB-D-DETR detector.
2025.03.26: We’re excited to release the main code and checkpoints!
2025.01.22: SambaMOTR has been accepted to ICLR 2025 as a spotlight paper! 🥳 Looking forward to seeing you in Singapore.

Installation

Install with conda

conda create -n sambamotr -y python=3.11  # create a virtual env
conda activate sambamotr               # activate the env
conda install -y pytorch==2.5.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y matplotlib pyyaml scipy tqdm tensorboard einops
pip install opencv-python

Install with venv (alternative)

python -m venv venv/sambamotr
export PYTHONPATH=venv/sambamotr/bin/python
source venv/sambamotr/bin/activate

pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install matplotlib pyyaml scipy tqdm tensorboard einops
pip install opencv-python

Build Deformable Attention CUDA ops

You also need to compile the Deformable Attention CUDA ops:

# From https://github.com/fundamentalvision/Deformable-DETR
cd ./models/ops/
# Build for different CUDA architectures (refer to https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)
TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 8.7 8.9" sh make.sh
# You can test this ops if you need:
python test.py

Data

You should put the unzipped DanceTrack, SportsMOT and BFT datasets into the DATADIR/DanceTrack/, DATADIR/SportsMOT/, and DATADIR/BFT/, respectively. If a dataset does not provide the ${SPLIT}_seqmap.txt file, you can generate it with

python data/gen_seqmap.py --data-dir $DATA_DIR --split $SPLIT

For example:

# DanceTrack (SPLIT in [train, val])
python data/gen_seqmap.py --data-dir $ROOT_DIR/DanceTrack --split $SPLIT 

# BFT (SPLIT in [train, val, test])
python data/gen_seqmap.py --data-dir $ROOT_DIR/BFT --split $SPLIT

Finally, you should get the following dataset structure:

DATADIR/
  ├── DanceTrack/
  │ ├── train/
  │ ├── val/
  │ ├── test/
  │ ├── train_seqmap.txt
  │ ├── val_seqmap.txt
  │ └── test_seqmap.txt
  ├── SportsMOT/
  │ ├── train/
  │ ├── val/
  │ ├── test/
  │ ├── train_seqmap.txt
  │ ├── val_seqmap.txt
  │ └── test_seqmap.txt
  └── BFT/
    ├── train/
    ├── val/
    ├── test/
    ├── train_seqmap.txt
    ├── val_seqmap.txt
    └── test_seqmap.txt

Pretrain (Deformable DETR)

We initialize our model with the official Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at pretrained/deformable_detr.pth.

Pretrain (DAB-DETR)

We initialize our model with the official DAB-Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at pretrained/dab_deformable_detr.pth.

Scripts

Training

Train SambaMOTR with 8 GPUs on ${DATASET} (one of [DanceTrack, SportsMOT, BFT]):

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/sambamotr/${DATASET}/def_detr/train_residual_masking_sync_longer.yaml --outputs-dir ./outputs/sambamotr/${DATASET}/ --batch-size 1 --data-root <your data dir path>

if the model does not fit in your GPU's memory, use the flag --use-checkpoint to activate gradient checkpointing and reduce the allocated GPU memory.

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/sambamotr/${DATASET}/def_detr/train_residual_masking_sync_longer.yaml --outputs-dir ./outputs/sambamotr/${DATASET}/ --batch-size 1 --data-root <your data dir path> --use-checkpoint

Submit and Evaluation

You can use this script to evaluate the trained model on the ${SPLIT} (one of [train, val, test]) set:

python main.py --data-root <your data dir path> --mode eval --eval-mode specific --eval-dir ./outputs/sambamotr/${DATASET}/ --eval-model <filename of the checkpoint> --eval-data-split ${SPLIT} --eval-threads <your gpus num>

for submitting (running inference on the test set), you can use the following scripts:

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --data-root <your data dir path> --mode submit --submit-dir ./outputs/sambamotr/${DATASET}/ --submit-model <filename of the checkpoint> --submit-data-split test

To reproduce our results, you can download our pre-trained checkpoints from here and move the corresponding one to ./outputs/sambamotr/${DATASET}/ before running the above scripts.

Demo

$INPUT_PATH can be either a folder with frames or an mp4 video located at ${INPUT_PATH}:

python demo/demo.py --in_video_path "$INPUT_PATH" --output_dir "$OUTPUT_DIR" --config_path "$CONFIG_PATH" --model_path "$MODEL_PATH" --fps "$FPS"

Pretrained SambaMOTR Weights

The pretrained checkpoints and output files for SambaMOTR are stored at the following Hugging Face link: HERE

You can use this link to download the necessary files, such as model weights and outputs, to reproduce the results or use the tracker for your own tasks.

Results (DanceTrack)

Dataset	Method	HOTA	AssA	DetA	IDF1	MOTA	Cfg	Weights	Output
DanceTrack	SambaMOTR (Deformable DETR)	67.2	57.7	78.6	71.3	88.1	Config	Hugging Face	Hugging Face
	SambaMOTR (DAB-D-DETR)	69.0	60.0	79.5	74.2	89.1	Config	Hugging Face	Hugging Face

Results (SportsMOT)

Dataset	Method	HOTA	AssA	DetA	IDF1	MOTA	Cfg	Weights	Output
SportsMOT	SambaMOTR (Deformable DETR)	70.5	60.6	82.2	73.3	90.4	Config	Hugging Face	Hugging Face
	SambaMOTR (DAB-D-DETR)	72.1	62.5	83.4	74.8	91.9	Config	Hugging Face	Hugging Face

Results (BFT)

Dataset	Method	HOTA	AssA	DetA	IDF1	MOTA	Cfg	Weights	Output
BFT	SambaMOTR (Deformable DETR)	69.6	74.2	65.4	81.2	70.2	Config	Hugging Face	Hugging Face
	SambaMOTR (DAB-D-DETR)	72.1	75.4	69.2	84.6	76.4	Config	Hugging Face	Hugging Face

Contributions

If you find any bug in the code, please report to Mattia Segu ([email protected])

Citation

If you find our work useful in your research, please consider citing our publication:

@article{segu2024samba,
  title={Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking},
  author={Segu, Mattia and Piccinelli, Luigi and Li, Siyuan and Yang, Yung-Hsu and Van Gool, Luc and Schiele, Bernt},
  journal={arXiv preprint arXiv:2410.01806},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SparseTrackEval		SparseTrackEval
assets		assets
configs		configs
data		data
demo		demo
log		log
models		models
structures		structures
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_engine.py		eval_engine.py
main.py		main.py
submit_engine.py		submit_engine.py
train_engine.py		train_engine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking

News 🔥

Installation

Install with conda

Install with venv (alternative)

Build Deformable Attention CUDA ops

Data

Pretrain (Deformable DETR)

Pretrain (DAB-DETR)

Scripts

Training

Submit and Evaluation

Demo

Pretrained SambaMOTR Weights

Results (DanceTrack)

Results (SportsMOT)

Results (BFT)

Contributions

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mattiasegu/sambamotr

Folders and files

Latest commit

History

Repository files navigation

Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking

News 🔥

Installation

Install with conda

Install with venv (alternative)

Build Deformable Attention CUDA ops

Data

Pretrain (Deformable DETR)

Pretrain (DAB-DETR)

Scripts

Training

Submit and Evaluation

Demo

Pretrained SambaMOTR Weights

Results (DanceTrack)

Results (SportsMOT)

Results (BFT)

Contributions

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages