Skip to content

(Work in progress) Implementation of adaptive Dataset class which adapts to different data structures

ofhkr requested to merge tr_val_te_splits into main

The original way the dataset Class was implemented only accounted for the specific data structure with:

|-- train
|   |-- images # ordered with numbers
|   |-- labels # same order as 'images'
|-- test
|   |-- images
|   |-- labels

It cannot be assumed that the user has their data already prepared in training and testing folders, but rather in one of the following structures:

Case 1: There are no folder - all images and targets are stored in the same data directory. The image and corresponding target have similar names (eg: data1.tif, data1mask.tif)

        |-- data
            |-- img1.tif
            |-- img1_mask.tif
            |-- img2.tif
            |-- img2_mask.tif
            |-- ...

Case 2: There are two folders - one with all the images and one with all the targets.

        |-- data
            |-- images
                |-- img1.tif
                |-- img2.tif
                |-- ...
            |-- masks
                |-- img1_mask.tif
                |-- img2_mask.tif
                |-- ...

Case 3: There are many folders - each folder with a case (eg. patient) and multiple images.

        |-- data
            |-- patient1
                |-- p1_img1.tif
                |-- p1_img1_mask.tif
                |-- p1_img2.tif
                |-- p1_img2_mask.tif
                |-- ...
            |-- patient2
                |-- p2_img1.tif
                |-- p2_img1_mask.tif
                |-- p2_img2.tif
                |-- p2_img2_mask.tif
                |-- ...
            |-- ...

The dataloader is also changed such that the user can specify if they have a dedicated train, validation and/or test folder that they want to specify.

For the last case (Case 3), I still haven't implemented that the images from the same patients are be kept in the same splits. (#TODO can be found where case 3 is implemented).

This branch gives the user the freedom to specify the path to their data with a larger flexibility on how the dataset is structured.

Edited by ofhkr

Merge request reports

Loading