Maro Michailidou

BME Meniscus Project: MRI Image Classification for Meniscus Tears

This project focuses on developing and comparing deep learning systems to classify knee MRI images for detecting meniscus tears. The implementation compares a custom Convolutional Neural Network (CNN) with ResNet architecture to distinguish between normal meniscus (“n” label) and torn meniscus (“p” label) in sagittal MRI images from the Stanford University MRNet database.

Research Context

Problem Statement

Data Source

MRNet database from Stanford University Medical Center

Task Strategy Flowchart

  1. Exploratory Data Analysis
  2. Image Processing and preprocessing
  3. Parameter definition for model training
  4. Train CNN & ResNet on each of 5 folds
  5. Aggregation methods and decision threshold optimization
  6. Performance metrics assessment
  7. Data augmentation for best performing models

1. Data Preparation and Setup

Google Drive Integration

The code begins by mounting Google Drive to access the dataset stored in the cloud, creating a working directory, and copying/unzipping the dataset for processing.

Data Organization

The dataset is organized into three main splits:

Each set contains multiple folds for cross-validation, stored as CSV files with image paths and corresponding labels.

2. Data Loading and Exploration

Reading Metadata

The code identifies all CSV files for training, validation, and testing, then creates dictionaries storing fold information for organized access.

csv_train_files = [filename for filename in os.listdir('dataframe/') if '_train_' in filename]
csv_val_files = [filename for filename in os.listdir('dataframe/') if '_val_' in filename]
csv_test_files = [filename for filename in os.listdir('dataframe/') if '_test_' in filename]

def read_all_images(file_list):
    image_label_df = pd.DataFrame()
    for index, filename in enumerate(file_list):
        temp_df = pd.read_csv('/content/AI-meniscus/dataframe/' + filename)
        image_label_df = pd.concat([image_label_df, temp_df])
    image_label_df.drop_duplicates(inplace = True, ignore_index = True)
    return image_label_df

images_labels_df = read_all_images(csv_train_files)

Data Distribution Analysis

The implementation includes comprehensive analysis of:

# Iterate over each set (train, val, test)
for set_name, folds_dict in [('train', train_folds_dict), ('val', val_folds_dict), ('test', test_folds_dict)]:
    print(f"\n{set_name.upper()} SET:")

    # Iterate over each fold
    for fold, temp_df in folds_dict.items():
        # Get the number of 'n' and 'p' labels in this fold
        num_n_labels = temp_df['label'].value_counts().get('n', 0)
        num_p_labels = temp_df['label'].value_counts().get('p', 0)
        num_images = len(temp_df)

        # Print the counts
        print(f"Fold {fold}:")
        print(f"Number of images: {num_images}")
        print(f"Number of 'n' labels: {num_n_labels}")
        print(f"Number of 'p' labels: {num_p_labels}")

Resizing dataset images The resizing of the images of the dataset was essential since it allowed for the CNN to focus only on the image of the knee in order to extract information. By isolating and training the neural networks only on that part of the image, a lot of the noise in the dataset was removed and the efficiency of the CNN increased significantly.

3. Image Preprocessing

import cv2
import os

def create_traintestval_folder(folder_path='./train', folds_dict=train_folds_dict):
    os.makedirs(folder_path, exist_ok=True)

    for fold in folds_dict.keys():
        fold_folder_path = os.path.join(folder_path, fold)
        os.makedirs(fold_folder_path, exist_ok=True)

        temp_df = folds_dict[fold]

        for img_filename in temp_df['img_dir']:
            img = cv2.imread(img_filename)
            height, width = img.shape[:2]

            # Calculate the center of the image
            center_x, center_y = width // 2, height // 2

            # Define the dimensions of the new square
            new_width, new_height = 128, 128

            # Calculate the top-left coordinates for the cropping
            top_left_x = max(center_x - new_width // 2, 0)
            top_left_y = max(center_y - new_height // 2, 0)

            # Ensure the cropping area does not exceed the image dimensions
            if top_left_x + new_width > width:
                top_left_x = width - new_width
            if top_left_y + new_height > height:
                top_left_y = height - new_height

            # Crop the image
            cropped_img = img[top_left_y:top_left_y + new_height, top_left_x:top_left_x + new_width]

            # Save the cropped image
            cv2.imwrite(os.path.join(fold_folder_path, img_filename.replace('images/', '')), cropped_img)

# Takes at least 2:30 minutes
create_traintestval_folder(folder_path = './train', folds_dict = train_folds_dict)
create_traintestval_folder(folder_path = './val', folds_dict = val_folds_dict)
create_traintestval_folder(folder_path = './test', folds_dict = test_folds_dict)

Center Cropping

The preprocessing function extracts 128×128 pixel center crops from original 256×256 images, maintaining anatomical focus on the meniscus region and creating an organized folder structure for processed images.

Preprocessing Benefits

4. Data Visualization

import matplotlib.pyplot as plt
import pandas as pd

# Iterate over each set (train, val, test)
for set_name, folds_dict in [('train', train_folds_dict), ('val', val_folds_dict), ('test', test_folds_dict)]:
    # Create a dictionary to store the number of 'n' and 'p' labels for each fold
    label_counts = {'n': [], 'p': []}

    # Iterate over each fold
    for fold, temp_df in folds_dict.items():
        # Get the number of 'n' and 'p' labels in this fold
        num_n_labels = temp_df['label'].value_counts().get('n', 0)
        num_p_labels = temp_df['label'].value_counts().get('p', 0)

        # Append the counts to the dictionary
        label_counts['n'].append(num_n_labels)
        label_counts['p'].append(num_p_labels)

    # Convert the dictionary to a DataFrame
    label_counts_df = pd.DataFrame(label_counts)

    # Create a bar chart
    label_counts_df.plot(kind='bar', figsize=(10, 5))
    plt.xlabel('Fold')
    plt.ylabel('Count')
    plt.title(f'Label distribution in {set_name} set')
    plt.legend()
    plt.show()

Label Distribution Analysis

The code generates bar charts showing ‘n’ (normal) vs ‘p’ (torn) distribution across different folds, helping identify class imbalance issues and providing insights into dataset composition.

Sample Image Display

Visualization functions display processed images to verify cropping quality and ensure preprocessing maintains relevant anatomical features.

import matplotlib.pyplot as plt
import cv2
import os

def display_cropped_images(folder_path, num_images=20):
    # Make sure to replace './train' with the actual path to your cropped images
    image_files = [img for img in os.listdir(folder_path) if img.lower().endswith(('.png', '.jpg', '.jpeg'))]
    fig, axes = plt.subplots(nrows=5, ncols=4, figsize=(20, 20))
    for ax, img_file in zip(axes.flat, image_files[:num_images]):
        img_path = os.path.join(folder_path, img_file)
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # Convert from BGR to RGB
        cropped_img = crop_center_square(img)
        ax.imshow(cropped_img)
        ax.axis('off')
    plt.tight_layout()
    plt.show()

# Replace './train' with the correct path to your folder containing cropped images.
display_cropped_images('./train')

5. Data Augmentation Techniques

The project explores several augmentation methods to improve model robustness:

Histogram Equalization

Enhances image contrast by redistributing intensity values, improving visibility of subtle tissue differences.

import matplotlib.pyplot as plt
import cv2

# Select the first 5 images from the first fold of the training set
images = train_folds_dict[list(train_folds_dict.keys())[0]]['img_dir'][:5]

# Create a figure to display the images
fig, axs = plt.subplots(len(images), 2, figsize=(10, 5 * len(images)))

for i, img_path in enumerate(images):
    # Load the image in grayscale mode
    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

    # Check if the image was loaded successfully
    if img is None:
        print(f'Could not open or find the image at {img_path}')
        continue

    # Apply histogram equalization
    img_eq = cv2.equalizeHist(img)

    # Display the original image
    axs[i, 0].imshow(img, cmap='gray')
    axs[i, 0].set_title('Original Image')
    axs[i, 0].axis('off')

    # Display the equalized image
    axs[i, 1].imshow(img_eq, cmap='gray')
    axs[i, 1].set_title('Equalized Image')
    axs[i, 1].axis('off')

plt.tight_layout()
plt.show()

CLAHE (Contrast Limited Adaptive Histogram Equalization)

Advanced contrast enhancement that limits amplification in homogeneous regions, preventing noise amplification in background regions.

import matplotlib.pyplot as plt
import cv2

# Select the first 5 images from the first fold of the training set
images = train_folds_dict[list(train_folds_dict.keys())[0]]['img_dir'][:5]

# Create a figure to display the images
fig, axs = plt.subplots(len(images), 2, figsize=(10, 5 * len(images)))

# Create a CLAHE object (Arguments are optional)
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))

for i, img_path in enumerate(images):
    # Load the image in grayscale mode
    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)

    # Check if the image was loaded successfully
    if img is None:
        print(f'Could not open or find the image at {img_path}')
        continue

    # Apply CLAHE
    img_clahe = clahe.apply(img)

    # Display the original image
    axs[i, 0].imshow(img, cmap='gray')
    axs[i, 0].set_title('Original Image')
    axs[i, 0].axis('off')

    # Display the equalized image
    axs[i, 1].imshow(img_clahe, cmap='gray')
    axs[i, 1].set_title('CLAHE Image')
    axs[i, 1].axis('off')

plt.tight_layout()
plt.show()

Edge Detection

Highlights meniscus boundaries and structural features using kernel-based filtering, potentially helping the model focus on edge-related features indicative of tears.

import matplotlib.pyplot as plt
import cv2

# Select the first 5 images from the first fold of the training set
images = train_folds_dict[list(train_folds_dict.keys())[0]]['img_dir'][:5]

# Create a figure to display the images
fig, axs = plt.subplots(len(images), 2, figsize=(10, 5 * len(images)))

# Edge detection

for i, img_path in enumerate(images):
    # Load the image in grayscale mode
    img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
    kernel = np.array([[0,-1,0], [-1,4,-1], [0,-1,0]])
    # Check if the image was loaded successfully
    if img is None:
        print(f'Could not open or find the image at {img_path}')
        continue

    # Apply CLAHE
    sharpened_img = cv2.filter2D(img, -1, kernel)

    # Display the original image
    axs[i, 0].imshow(img, cmap='gray')
    axs[i, 0].set_title('Original Image')
    axs[i, 0].axis('off')

    # Display the equalized image
    axs[i, 1].imshow(sharpened_img, cmap='gray')
    axs[i, 1].set_title('Edge detection Kernel ')
    axs[i, 1].axis('off')

plt.tight_layout()
plt.show()

6. Dataset Balancing

Random Oversampling

The code addresses class imbalance using RandomOverSampler from imbalanced-learn library, which duplicates minority class samples to ensure equal representation of ‘n’ and ‘p’ classes during training and prevent model bias toward majority class.

#!pip install imbalanced-learn

#Lets use a simple minority oversampler

from imblearn.over_sampling import RandomOverSampler
def balanced_dataset(method: str, inputData):
  if method == "oversampling":
    ros = RandomOverSampler(random_state=0)
    X_resampled, y_resampled = ros.fit_resample(np.array(inputData["img_dir"]).reshape(-1,1), inputData["label"])
    return X_resampled, y_resampled

for set_name, folds_dict in [('train', train_folds_dict)]:
    # print(f"\n{set_name.upper()} SET:")
    for fold_index, df in folds_dict.items():
      resampled_X, resampled_Y = balanced_dataset("oversampling", df)
      finalDf = pd.DataFrame({'img_dir': resampled_X[:,0], 'label': resampled_Y})
      # print(finalDf)
print(len(finalDf.loc[finalDf.label == "p"]),"Positive casses after balancing")
print(len(finalDf.loc[finalDf.label == "n"]),"Negative casses after balancing")

7. CNN Architecture

| Layer (type) | Output Shape | Param # | |—|—|—| | conv2d_3 (Conv2D) | (None, 126, 126, 32) | 320 | | batch_normalization_2 (BatchNormalization) | (None, 126, 126, 32) | 128 | | activation_7 (Activation) | (None, 126, 126, 32) | 0 | | max_pooling2d_1 (MaxPooling2D) | (None, 63, 63, 32) | 0 | | conv2d_4 (Conv2D) | (None, 61, 61, 64) | 18,496 | | activation_8 (Activation) | (None, 61, 61, 64) | 0 | | conv2d_5 (Conv2D) | (None, 59, 59, 128) | 73,856 | | batch_normalization_3 (BatchNormalization) | (None, 59, 59, 128) | 512 | | activation_9 (Activation) | (None, 59, 59, 128) | 0 | | flatten_1 (Flatten) | (None, 445568) | 0 | | dense_4 (Dense) | (None, 1000) | 445,569,000 | | activation_10 (Activation) | (None, 1000) | 0 | | dropout_3 (Dropout) | (None, 1000) | 0 | | dense_5 (Dense) | (None, 1000) | 1,001,000 | | activation_11 (Activation) | (None, 1000) | 0 | | dropout_4 (Dropout) | (None, 1000) | 0 | | dense_6 (Dense) | (None, 1000) | 1,001,000 | | activation_12 (Activation) | (None, 1000) | 0 | | dropout_5 (Dropout) | (None, 1000) | 0 | | dense_7 (Dense) | (None, 1) | 1,001 | | activation_13 (Activation) | (None, 1) | 0 | The core model implements a sequential CNN with the following structure:

import cv2
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, BatchNormalization, Dropout, LeakyReLU
from keras.optimizers import Adam
from keras.losses import BinaryCrossentropy
from keras.metrics import Accuracy

IMDIMENSION = 128  # Set this to your desired value

# Get the first training fold
first_train_fold = next(iter(train_folds_dict.values()))

# Select 50 'p' and 50 'n' labeled images at random
df_p = first_train_fold[first_train_fold['label'] == 'p'].sample(50)
df_n = first_train_fold[first_train_fold['label'] == 'n'].sample(50)

# Concatenate the two dataframes
df = pd.concat([df_p, df_n])

# Preprocess the data
X = np.array([cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) for img_path in df['img_dir']])
X = np.expand_dims(X, axis=-1)  # Add an extra dimension for the grayscale channel
y = np.array(df['label'].map({'n': 0, 'p': 1}))

inputshape = X.shape[1:]  # Now X is a 4D array

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=inputshape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))

model.add(Conv2D(IMDIMENSION, (3, 3)))
model.add(BatchNormalization())
model.add(Activation(LeakyReLU()))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(optimizer=Adam(), loss=BinaryCrossentropy(), metrics=[Accuracy()])

model.fit(X, y, epochs=10)

# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss}, Accuracy: {accuracy}")

Input Layer

Feature Extraction Blocks

model.add(Conv2D(32, (3, 3), input_shape=inputshape))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

Classification Head

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

8. Model Training and Evaluation

Models and Parameters

Model Architectures

Training Parameters

Training Configuration for CNN

model.compile(optimizer=Adam(), loss=BinaryCrossentropy(), metrics=[Accuracy()])

Experimental Approach

The implementation uses a scaled-down experiment with 50 samples from each class for rapid prototyping, testing different image dimensions (32, 64, 128 pixels) and measuring training time vs. performance trade-offs.

Performance Evaluation

from sklearn.metrics import classification_report, f1_score, confusion_matrix
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, BatchNormalization, Dropout, LeakyReLU
from keras.optimizers import Adam
from keras.losses import BinaryCrossentropy
from keras.metrics import Accuracy

# Get the first validation fold
first_val_fold = next(iter(val_folds_dict.values()))

# Preprocess the validation data
X_val = np.array([cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) for img_path in first_val_fold['img_dir']])
X_val = np.expand_dims(X_val, axis=-1)  # Add an extra dimension for the grayscale channel
y_val = np.array(first_val_fold['label'].map({'n': 0, 'p': 1}))

# Predict the labels of the validation set
y_pred = model.predict(X_val)
y_pred = np.round(y_pred).flatten()  # Round the probabilities to get class labels

# Print the classification report
print(classification_report(y_val, y_pred))

# Print the F1 score
print("F1 Score:", f1_score(y_val, y_pred))

# Print the confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_val, y_pred))

Comprehensive evaluation using:

9. Model Persistence

from google.colab import drive
drive.mount('/content/gdrive')
model.save('/content/gdrive/My Drive/128dim50samples.keras')

Trained models are saved to Google Drive for future use, enabling model comparison and deployment.

Key Findings

Performance Metrics Analysis

Area Under Curve (AUC) Results

| Epoch | Batch Size | Model | AUC | |—|—|—|—| | 20 | 30 | ResNet | 0.837 | | 15 | 30 | ResNet | 0.8347 | | 15 | 10 | ResNet | 0.8282 | | 20 | 30 | CNN | 0.8116 | | 20 | 10 | ResNet | 0.8 | | 20 | 30 | Augmented CNN | 0.7866 |

Key Insights:

Performance Goal Optimization

Goal 1: Overall Diagnostic Capability (Youden Index)

Goal 2: Balance Precision & Recall (F1 Score)

Goal 3: Maximize Positive Identification (Recall + AUC)

Critical Conclusions

Key Findings

  1. Increased batch size and epochs consistently improved AUC performance across all models
  2. Soft Voting aggregation proved superior by reducing prediction noise and variance
  3. Threshold of 0.1 was optimal due to severe class imbalance in the dataset
  4. Transfer learning with ResNet significantly enhanced predictive capability with minimal computational overhead
  5. Data augmentation showed no improvement for the selected parameters
  6. Goal-specific optimization is crucial before model selection

Limitations and Challenges

Future Directions

Technical Improvements

Clinical Applications

Project Structure Summary

BME Meniscus Project/