Define Dataset and DataLoaders
.
Custom Dataset Class
- Custom Dataset: Create a class (BloodCellDataset) to handle image data.
- Function: getitem() and len() methods to load and provide individual samples.
- Read CSV: Loads the CSV file, reads images, and applies necessary transformations.
class BloodCellDataset(Dataset):
def __init__(self, csv_file, transform=None):
self.dataframe = pd.read_csv(csv_file)
self.transform = transform
def __len__(self):
return len(self.dataframe)
def __getitem__(self, idx):
img_path = self.dataframe.iloc[idx]['filepath']
image = Image.open(img_path).convert('RGB')
binary_label = self.dataframe.iloc[idx]['binary_label']
subtype_label = self.dataframe.iloc[idx]['subtype_label']
if self.transform:
image = self.transform(image)
sample = {
'image': image,
'binary_label': torch.tensor(binary_label, dtype=torch.long),
'subtype_label': torch.tensor(subtype_label, dtype=torch.long)
}
return sample
Data Transformations
- Training Transforms:
- Resizing: Resize images to a fixed size.
- Data Augmentation: Apply random flips, rotations, and color jitter to introduce variations.
- Purpose: Helps the model generalize better by learning features under different conditions.
- Validation and Test Transforms:
- Normalization: Only resize and normalize images (no augmentation).
- Purpose: Maintain consistency during evaluation.
train_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.RandomRotation(20),
transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1),
transforms.RandomAffine(degrees=20, shear=0.2),
transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
val_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
Data Loaders
- Create Data Loaders:
- Training Data Loader: Loads training data in batches with shuffle=True to improve generalization.
- Validation and Test Loaders: Load validation and test data without shuffling.
- Batch Size: Set to 16 for all data loaders.
- Purpose: Efficiently load data in batches, reducing memory usage and speeding up training.
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4, drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=4, drop_last=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4, drop_last=True)