Skip to content

mattiasegu/sambamotr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking

arXiv Project Page

Banner 2

Samba: Synchronized Set-of-Sequences Modeling for End-to-end Multiple Object Tracking
Mattia Segu, Luigi Piccinelli, Siyuan Li, Yung-Hsu Yang, Luc Van Gool, Bernt Schiele
ICLR 2025 Spotlight, Paper at arXiv 2410.01806

SambaMOTR is a novel tracking-by-propagation framework for multiple object tracking in complex scenarios like dance, sports, and animal groups. It leverages Samba, a linear-time set-of-sequences model that synchronizes state-spaces across tracklets to capture long-range dependencies, inter-tracklet interactions, and temporal occlusions. With an autoregressive memory mechanism and a simple uncertainty handling strategy (MaskObs), SambaMOTR tracks objects accurately through occlusions without hand-crafted heuristics. It achieves state-of-the-art results on DanceTrack, BFT, and SportsMOT.


DanceTrack

BFT

SportsMOT (Volleyball)

SportsMOT (Football)

News πŸ”₯

  • 2025.03.31: We’re releasing SambaMOTR's weights based on the stronger DAB-D-DETR detector.
  • 2025.03.26: We’re excited to release the main code and checkpoints!
  • 2025.01.22: SambaMOTR has been accepted to ICLR 2025 as a spotlight paper! πŸ₯³ Looking forward to seeing you in Singapore.

Installation

Install with conda

conda create -n sambamotr -y python=3.11  # create a virtual env
conda activate sambamotr               # activate the env
conda install -y pytorch==2.5.1 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y matplotlib pyyaml scipy tqdm tensorboard einops
pip install opencv-python

Install with venv (alternative)

python -m venv venv/sambamotr
export PYTHONPATH=venv/sambamotr/bin/python
source venv/sambamotr/bin/activate

pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install matplotlib pyyaml scipy tqdm tensorboard einops
pip install opencv-python

Build Deformable Attention CUDA ops

You also need to compile the Deformable Attention CUDA ops:

# From https://github.com/fundamentalvision/Deformable-DETR
cd ./models/ops/
# Build for different CUDA architectures (refer to https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)
TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 8.7 8.9" sh make.sh
# You can test this ops if you need:
python test.py

Data

You should put the unzipped DanceTrack, SportsMOT and BFT datasets into the DATADIR/DanceTrack/, DATADIR/SportsMOT/, and DATADIR/BFT/, respectively. If a dataset does not provide the ${SPLIT}_seqmap.txt file, you can generate it with

python data/gen_seqmap.py --data-dir $DATA_DIR --split $SPLIT

For example:

# DanceTrack (SPLIT in [train, val])
python data/gen_seqmap.py --data-dir $ROOT_DIR/DanceTrack --split $SPLIT 

# BFT (SPLIT in [train, val, test])
python data/gen_seqmap.py --data-dir $ROOT_DIR/BFT --split $SPLIT

Finally, you should get the following dataset structure:

DATADIR/
  β”œβ”€β”€ DanceTrack/
  β”‚ β”œβ”€β”€ train/
  β”‚ β”œβ”€β”€ val/
  β”‚ β”œβ”€β”€ test/
  β”‚ β”œβ”€β”€ train_seqmap.txt
  β”‚ β”œβ”€β”€ val_seqmap.txt
  β”‚ └── test_seqmap.txt
  β”œβ”€β”€ SportsMOT/
  β”‚ β”œβ”€β”€ train/
  β”‚ β”œβ”€β”€ val/
  β”‚ β”œβ”€β”€ test/
  β”‚ β”œβ”€β”€ train_seqmap.txt
  β”‚ β”œβ”€β”€ val_seqmap.txt
  β”‚ └── test_seqmap.txt
  └── BFT/
    β”œβ”€β”€ train/
    β”œβ”€β”€ val/
    β”œβ”€β”€ test/
    β”œβ”€β”€ train_seqmap.txt
    β”œβ”€β”€ val_seqmap.txt
    └── test_seqmap.txt

Pretrain (Deformable DETR)

We initialize our model with the official Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at pretrained/deformable_detr.pth.

Pretrain (DAB-DETR)

We initialize our model with the official DAB-Deformable-DETR (with R50 backbone) weights pretrained on the COCO dataset, you can also download the checkpoint we used here. And then put the checkpoint at pretrained/dab_deformable_detr.pth.

Scripts

Training

Train SambaMOTR with 8 GPUs on ${DATASET} (one of [DanceTrack, SportsMOT, BFT]):

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/sambamotr/${DATASET}/def_detr/train_residual_masking_sync_longer.yaml --outputs-dir ./outputs/sambamotr/${DATASET}/ --batch-size 1 --data-root <your data dir path>

if the model does not fit in your GPU's memory, use the flag --use-checkpoint to activate gradient checkpointing and reduce the allocated GPU memory.

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --config-path ./configs/sambamotr/${DATASET}/def_detr/train_residual_masking_sync_longer.yaml --outputs-dir ./outputs/sambamotr/${DATASET}/ --batch-size 1 --data-root <your data dir path> --use-checkpoint

Submit and Evaluation

You can use this script to evaluate the trained model on the ${SPLIT} (one of [train, val, test]) set:

python main.py --data-root <your data dir path> --mode eval --eval-mode specific --eval-dir ./outputs/sambamotr/${DATASET}/ --eval-model <filename of the checkpoint> --eval-data-split ${SPLIT} --eval-threads <your gpus num>

for submitting (running inference on the test set), you can use the following scripts:

python -m torch.distributed.run --nproc_per_node=8 main.py --use-distributed --data-root <your data dir path> --mode submit --submit-dir ./outputs/sambamotr/${DATASET}/ --submit-model <filename of the checkpoint> --submit-data-split test 

To reproduce our results, you can download our pre-trained checkpoints from here and move the corresponding one to ./outputs/sambamotr/${DATASET}/ before running the above scripts.

Demo

$INPUT_PATH can be either a folder with frames or an mp4 video located at ${INPUT_PATH}:

python demo/demo.py --in_video_path "$INPUT_PATH" --output_dir "$OUTPUT_DIR" --config_path "$CONFIG_PATH" --model_path "$MODEL_PATH" --fps "$FPS"

Pretrained SambaMOTR Weights

The pretrained checkpoints and output files for SambaMOTR are stored at the following Hugging Face link: HERE

You can use this link to download the necessary files, such as model weights and outputs, to reproduce the results or use the tracker for your own tasks.

Results (DanceTrack)

Dataset Method HOTA AssA DetA IDF1 MOTA Cfg Weights Output
DanceTrack SambaMOTR (Deformable DETR) 67.2 57.7 78.6 71.3 88.1 Config Hugging Face Hugging Face
SambaMOTR (DAB-D-DETR) 69.0 60.0 79.5 74.2 89.1 Config Hugging Face Hugging Face

Results (SportsMOT)

Dataset Method HOTA AssA DetA IDF1 MOTA Cfg Weights Output
SportsMOT SambaMOTR (Deformable DETR) 70.5 60.6 82.2 73.3 90.4 Config Hugging Face Hugging Face
SambaMOTR (DAB-D-DETR) 72.1 62.5 83.4 74.8 91.9 Config Hugging Face Hugging Face

Results (BFT)

Dataset Method HOTA AssA DetA IDF1 MOTA Cfg Weights Output
BFT SambaMOTR (Deformable DETR) 69.6 74.2 65.4 81.2 70.2 Config Hugging Face Hugging Face
SambaMOTR (DAB-D-DETR) 72.1 75.4 69.2 84.6 76.4 Config Hugging Face Hugging Face

Contributions

If you find any bug in the code, please report to Mattia Segu ([email protected])

Citation

If you find our work useful in your research, please consider citing our publication:

@article{segu2024samba,
  title={Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking},
  author={Segu, Mattia and Piccinelli, Luigi and Li, Siyuan and Yang, Yung-Hsu and Van Gool, Luc and Schiele, Bernt},
  journal={arXiv preprint arXiv:2410.01806},
  year={2024}
}

Acknowledgements

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages