(Work in progress) Implementation of adaptive Dataset class which adapts to different data structures
The original way the dataset Class was implemented only accounted for the specific data structure with:
|-- train
| |-- images # ordered with numbers
| |-- labels # same order as 'images'
|-- test
| |-- images
| |-- labels
It cannot be assumed that the user has their data already prepared in training and testing folders, but rather in one of the following structures:
Case 1: There are no folder - all images and targets are stored in the same data directory. The image and corresponding target have similar names (eg: data1.tif, data1mask.tif)
|-- data
|-- img1.tif
|-- img1_mask.tif
|-- img2.tif
|-- img2_mask.tif
|-- ...
Case 2: There are two folders - one with all the images and one with all the targets.
|-- data
|-- images
|-- img1.tif
|-- img2.tif
|-- ...
|-- masks
|-- img1_mask.tif
|-- img2_mask.tif
|-- ...
Case 3: There are many folders - each folder with a case (eg. patient) and multiple images.
|-- data
|-- patient1
|-- p1_img1.tif
|-- p1_img1_mask.tif
|-- p1_img2.tif
|-- p1_img2_mask.tif
|-- ...
|-- patient2
|-- p2_img1.tif
|-- p2_img1_mask.tif
|-- p2_img2.tif
|-- p2_img2_mask.tif
|-- ...
|-- ...
The dataloader is also changed such that the user can specify if they have a dedicated train
, validation
and/or test
folder that they want to specify.
For the last case (Case 3), I still haven't implemented that the images from the same patients are be kept in the same splits. (#TODO can be found where case 3 is implemented).
This branch gives the user the freedom to specify the path to their data with a larger flexibility on how the dataset is structured.