|
1 | | -# ScanNet for 3D Object Detection |
| 1 | +# ScanNet Dataset |
| 2 | + |
| 3 | +MMDetection3D supports LiDAR-based detection and segmentation on ScanNet dataset. This page provides specific tutorials about the usage. |
2 | 4 |
|
3 | 5 | ## Dataset preparation |
4 | 6 |
|
@@ -38,7 +40,7 @@ Under folder `scans` there are overall 1201 train and 312 validation folders in |
38 | 40 | - `scene0001_01.txt`: Meta file including axis-aligned matrix, etc. |
39 | 41 | - `scene0001_01_vh_clean_2.labels.ply`: Annotation file containing the category of each vertex. |
40 | 42 |
|
41 | | -Export ScanNet data by running `python batch_load_scannet_data.py`. The main steps include: |
| 43 | +The procedure of exporting ScanNet data by running `python batch_load_scannet_data.py` mainly includes the following 3 steps: |
42 | 44 |
|
43 | 45 | - Export original files to point cloud, instance label, semantic label and bounding box file. |
44 | 46 | - Downsample raw point cloud and filter invalid classes. |
@@ -224,6 +226,9 @@ scannet |
224 | 226 | - `points/xxxxx.bin`: The `axis-unaligned` point cloud data after downsample. Since ScanNet 3D detection task takes axis-aligned point clouds as input, while ScanNet 3D semantic segmentation task takes unaligned points, we choose to store unaligned points and their axis-align transform matrix. Note: the points would be axis-aligned in pre-processing pipeline [`GlobalAlignment`](https://github.com/open-mmlab/mmdetection3d/blob/9f0b01caf6aefed861ef4c3eb197c09362d26b32/mmdet3d/datasets/pipelines/transforms_3d.py#L423) of 3D detection task. |
225 | 227 | - `instance_mask/xxxxx.bin`: The instance label for each point, value range: \[0, NUM_INSTANCES\], 0: unannotated. |
226 | 228 | - `semantic_mask/xxxxx.bin`: The semantic label for each point, value range: \[1, 40\], i.e. `nyu40id` standard. Note: the `nyu40id` ID will be mapped to train ID in train pipeline `PointSegClassMapping`. |
| 229 | +- `seg_info`: The generated infos to support semantic segmentation model training. |
| 230 | + - `train_label_weight.npy`: Weighting factor for each semantic class. Since the number of points in different classes varies greatly, it's a common practice to use label re-weighting to get a better performance. |
| 231 | + - `train_resampled_scene_idxs.npy`: Re-sampling index for each scene. Different rooms will be sampled multiple times according to their number of points to balance training data. |
227 | 232 | - `posed_images/scenexxxx_xx`: The set of `.jpg` images with `.txt` 4x4 poses and the single `.txt` file with camera intrinsic matrix. |
228 | 233 | - `scannet_infos_train.pkl`: The train data infos, the detailed info of each scan is as follows: |
229 | 234 | - info\['lidar_points'\]: A dict containing all information related to the lidar points. |
@@ -285,8 +290,61 @@ train_pipeline = [ |
285 | 290 | - `RandomFlip3D`: randomly flip the input point cloud horizontally or vertically. |
286 | 291 | - `GlobalRotScaleTrans`: rotate the input point cloud, usually in the range of \[-5, 5\] (degrees) for ScanNet; then scale the input point cloud, usually by 1.0 for ScanNet (which means no scaling); finally translate the input point cloud, usually by 0 for ScanNet (which means no translation). |
287 | 292 |
|
| 293 | +A typical training pipeline of ScanNet for 3D semantic segmentation is as below: |
| 294 | + |
| 295 | +```python |
| 296 | +train_pipeline = [ |
| 297 | + dict( |
| 298 | + type='LoadPointsFromFile', |
| 299 | + coord_type='DEPTH', |
| 300 | + shift_height=False, |
| 301 | + use_color=True, |
| 302 | + load_dim=6, |
| 303 | + use_dim=[0, 1, 2, 3, 4, 5]), |
| 304 | + dict( |
| 305 | + type='LoadAnnotations3D', |
| 306 | + with_bbox_3d=False, |
| 307 | + with_label_3d=False, |
| 308 | + with_mask_3d=False, |
| 309 | + with_seg_3d=True), |
| 310 | + dict( |
| 311 | + type='PointSegClassMapping'), |
| 312 | + dict( |
| 313 | + type='IndoorPatchPointSample', |
| 314 | + num_points=num_points, |
| 315 | + block_size=1.5, |
| 316 | + ignore_index=len(class_names), |
| 317 | + use_normalized_coord=False, |
| 318 | + enlarge_size=0.2, |
| 319 | + min_unique_num=None), |
| 320 | + dict(type='NormalizePointsColor', color_mean=None), |
| 321 | + dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask']) |
| 322 | +] |
| 323 | +``` |
| 324 | + |
| 325 | +- `PointSegClassMapping`: Only the valid category ids will be mapped to class label ids like \[0, 20) during training. Other class ids will be converted to `ignore_index` which equals to `20`. |
| 326 | +- `IndoorPatchPointSample`: Crop a patch containing a fixed number of points from input point cloud. `block_size` indicates the size of the cropped block, typically `1.5` for ScanNet. |
| 327 | +- `NormalizePointsColor`: Normalize the RGB color values of input point cloud by dividing `255`. |
| 328 | + |
288 | 329 | ## Metrics |
289 | 330 |
|
290 | | -Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called. Please refer to [indoor_eval ](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/indoor_eval.py) for more details. |
| 331 | +- **Object Detection **: Typically mean Average Precision (mAP) is used for evaluation on ScanNet, e.g. `[email protected]` and `[email protected]`. In detail, a generic function to compute precision and recall for 3D object detection for multiple classes is called. Please refer to [indoor_eval ](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/indoor_eval.py) for more details. |
| 332 | + |
| 333 | + **Note**: As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D Non-Maximum Suppression (NMS), which is regardless of rotation, is adopted during post-processing . |
| 334 | + |
| 335 | +- **Semantic Segmentation**: Typically mean Intersection over Union (mIoU) is used for evaluation on ScanNet. In detail, we first compute IoU for multiple classes and then average them to get mIoU, please refer to [seg_eval](https://github.com/open-mmlab/mmdetection3d/blob/dev-1.x/mmdet3d/evaluation/functional/seg_eval.py). |
| 336 | + |
| 337 | +## Testing and Making a Submission |
| 338 | + |
| 339 | +By default, our codebase evaluates semantic segmentation results on the validation set. |
| 340 | +If you would like to test the model performance on the online benchmark, add `--format-only` flag in the evaluation script and change `ann_file=data_root + 'scannet_infos_val.pkl'` to `ann_file=data_root + 'scannet_infos_test.pkl'` in the ScanNet dataset's [config](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/scannet_seg-3d-20class.py#L126). Remember to specify the `txt_prefix` as the directory to save the testing results. |
| 341 | + |
| 342 | +Taking PointNet++ (SSG) on ScanNet for example, the following command can be used to do inference on test set: |
| 343 | + |
| 344 | +``` |
| 345 | +./tools/dist_test.sh configs/pointnet2/pointnet2_ssg_16x2_cosine_200e_scannet_seg-3d-20class.py \ |
| 346 | + work_dirs/pointnet2_ssg/latest.pth --format-only \ |
| 347 | + --eval-options txt_prefix=work_dirs/pointnet2_ssg/test_submission |
| 348 | +``` |
291 | 349 |
|
292 | | -As introduced in section `Export ScanNet data`, all ground truth 3D bounding box are axis-aligned, i.e. the yaw is zero. So the yaw target of network predicted 3D bounding box is also zero and axis-aligned 3D Non-Maximum Suppression (NMS), which is regardless of rotation, is adopted during post-processing . |
| 350 | +After generating the results, you can basically compress the folder and upload to the [ScanNet evaluation server](http://kaldir.vc.in.tum.de/scannet_benchmark/semantic_label_3d). |
0 commit comments