Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking
Mattia Segu, Luigi Piccinelli, Siyuan Li, Yung-Hsu Yang, Luc Van Gool, Bernt Schiele
ICLR 2025 Spotlight, Paper at arXiv 2410.01806
SambaMOTR is a novel tracking-by-propagation framework for multiple object tracking in complex scenarios like dance, sports, and animal groups. It leverages Samba, a linear-time set-of-sequences model that synchronizes state-spaces across tracklets to capture long-range dependencies, inter-tracklet interactions, and temporal occlusions. With an autoregressive memory mechanism and a simple uncertainty handling strategy (MaskObs), SambaMOTR tracks objects accurately through occlusions without hand-crafted heuristics. It achieves state-of-the-art results on DanceTrack, BFT, and SportsMOT.
![]() DanceTrack |
![]() BFT |
![]() SportsMOT (Volleyball) |
![]() SportsMOT (Football) |
- 2025.03.31: Weβre releasing SambaMOTR's weights based on the stronger DAB-D-DETR detector.
- 2025.03.26: Weβre excited to release the main code and checkpoints!
- 2025.01.22: SambaMOTR has been accepted to ICLR 2025 as a spotlight paper! π₯³ Looking forward to seeing you in Singapore.
conda create -n sambamotr -y python=3.11 # create a virtual env
conda activate sambamotr # activate the env
conda install -y pytorch==2.5.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y matplotlib pyyaml scipy tqdm tensorboard einops
pip install opencv-python
python -m venv venv/sambamotr
export PYTHONPATH=venv/sambamotr/bin/python
source venv/sambamotr/bin/activate
pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install matplotlib pyyaml scipy tqdm tensorboard einops
pip install opencv-python
You also need to compile the Deformable Attention CUDA ops:
# From https://github.com/fundamentalvision/Deformable-DETR
cd ./models/ops/
# Build for different CUDA architectures (refer to https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)
TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 8.7 8.9" sh make.sh
# You can test this ops if you need:
python test.py
You should put the unzipped DanceTrack, SportsMOT and BFT datasets into the DATADIR/DanceTrack/
, DATADIR/SportsMOT/
, and DATADIR/BFT/
, respectively. If a dataset does not provide the ${SPLIT}_seqmap.txt
file, you can generate it with
python data/gen_seqmap.py --data-dir $DATA_DIR --split $SPLIT
For example:
# DanceTrack (SPLIT in [train, val])
python data/gen_seqmap.py --data-dir $ROOT_DIR/DanceTrack --split $SPLIT
# BFT (SPLIT in [train, val, test])
python data/gen_seqmap.py --data-dir $ROOT_DIR/BFT --split $SPLIT
Finally, you should get the following dataset structure:
DATADIR/
βββ DanceTrack/
β βββ train/
β βββ val/
β βββ test/
β βββ train_seqmap.txt
β βββ val_seqmap.txt
β βββ test_seqmap.txt
βββ SportsMOT/
β βββ train/
β βββ val/
β βββ test/
β βββ train_seqmap.txt
β βββ val_seqmap.txt
β βββ test_seqmap.txt
βββ BFT/
βββ train/
βββ val/
βββ test/
βββ train_seqmap.txt
βββ val_seqmap.txt
βββ test_seqmap.txt
We initialize our model with the official Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at pretrained/deformable_detr.pth
.
We initialize our model with the official DAB-Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at pretrained/dab_deformable_detr.pth
.
Train SambaMOTR with 8 GPUs on ${DATASET}
(one of [DanceTrack, SportsMOT, BFT]
):
python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/sambamotr/${DATASET}/def_detr/train_residual_masking_sync_longer.yaml --outputs-dir ./outputs/sambamotr/${DATASET}/ --batch-size 1 --data-root <your data dir path>
if the model does not fit in your GPU's memory, use the flag --use-checkpoint
to activate gradient checkpointing and reduce the allocated GPU memory.
python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/sambamotr/${DATASET}/def_detr/train_residual_masking_sync_longer.yaml --outputs-dir ./outputs/sambamotr/${DATASET}/ --batch-size 1 --data-root <your data dir path> --use-checkpoint
You can use this script to evaluate the trained model on the ${SPLIT}
(one of [train, val, test]
) set:
python main.py --data-root <your data dir path> --mode eval --eval-mode specific --eval-dir ./outputs/sambamotr/${DATASET}/ --eval-model <filename of the checkpoint> --eval-data-split ${SPLIT} --eval-threads <your gpus num>
for submitting (running inference on the test set), you can use the following scripts:
python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --data-root <your data dir path> --mode submit --submit-dir ./outputs/sambamotr/${DATASET}/ --submit-model <filename of the checkpoint> --submit-data-split test
To reproduce our results, you can download our pre-trained checkpoints from here and move the corresponding one to ./outputs/sambamotr/${DATASET}/
before running the above scripts.
$INPUT_PATH
can be either a folder with frames or an mp4 video located at ${INPUT_PATH}
:
python demo/demo.py --in_video_path "$INPUT_PATH" --output_dir "$OUTPUT_DIR" --config_path "$CONFIG_PATH" --model_path "$MODEL_PATH" --fps "$FPS"
The pretrained checkpoints and output files for SambaMOTR are stored at the following Hugging Face link: HERE
You can use this link to download the necessary files, such as model weights and outputs, to reproduce the results or use the tracker for your own tasks.
Dataset | Method | HOTA | AssA | DetA | IDF1 | MOTA | Cfg | Weights | Output |
---|---|---|---|---|---|---|---|---|---|
DanceTrack | SambaMOTR (Deformable DETR) | 67.2 | 57.7 | 78.6 | 71.3 | 88.1 | Config | Hugging Face | Hugging Face |
SambaMOTR (DAB-D-DETR) | 69.0 | 60.0 | 79.5 | 74.2 | 89.1 | Config | Hugging Face | Hugging Face |
Dataset | Method | HOTA | AssA | DetA | IDF1 | MOTA | Cfg | Weights | Output |
---|---|---|---|---|---|---|---|---|---|
SportsMOT | SambaMOTR (Deformable DETR) | 70.5 | 60.6 | 82.2 | 73.3 | 90.4 | Config | Hugging Face | Hugging Face |
SambaMOTR (DAB-D-DETR) | 72.1 | 62.5 | 83.4 | 74.8 | 91.9 | Config | Hugging Face | Hugging Face |
Dataset | Method | HOTA | AssA | DetA | IDF1 | MOTA | Cfg | Weights | Output |
---|---|---|---|---|---|---|---|---|---|
BFT | SambaMOTR (Deformable DETR) | 69.6 | 74.2 | 65.4 | 81.2 | 70.2 | Config | Hugging Face | Hugging Face |
SambaMOTR (DAB-D-DETR) | 72.1 | 75.4 | 69.2 | 84.6 | 76.4 | Config | Hugging Face | Hugging Face |
If you find any bug in the code, please report to Mattia Segu ([email protected])
If you find our work useful in your research, please consider citing our publication:
@article{segu2024samba,
title={Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking},
author={Segu, Mattia and Piccinelli, Luigi and Li, Siyuan and Yang, Yung-Hsu and Van Gool, Luc and Schiele, Bernt},
journal={arXiv preprint arXiv:2410.01806},
year={2024}
}