Fuzzy Knowledge Distillation Model for Incomplete Multi-modality Data and Its Application to Brain Network Analysis
Xiaoyu Qi, Jiashuang Huang*, Xueyun Cheng, Mingliang Wang*, Weiping Ding*
MOTIVATION
In brain network analysis, both structural and functional brain networks help elucidate neurological disorders. However, in clinical applications, it often encounters subjects with incomplete multi-modality data due to factors such as patient non-compliance, limited resources, and inadequate data quality.
Knowledge distillation (KD) transfers knowledge from complex to simpler models and can be used to solve the modal missing problem. However, existing methods transmit all information in a point-to-point fashion without considering the problem of mismatched model capacities.
The current approaches to address the modal missing problem in multimodal learning can be categorized into traditional methods and deep learning methods. Traditional methods handle scattered missing data but struggle with entirely absent modalities. Deep learning methods, like generative adversarial networks (GANs), address this but face issues like poor preservation of discriminative features and unstable training with small datasets.
INNOVATION
This paper proposes a novel fuzzy knowledge distillation (fuzzy-KD) model for brain network analysis of incompletemulti-modality data. Specifically, we design a fuzzy-KD module to reducethe information gap between the teacher and student models
The problem of mismatch between the capacity of teacher and student models in knowledge distillation is addressed by our proposed fuzzy-KD module, which incorporates the fuzzy processing and the feature selection operation for teacher information to enhance the learning capacity of the student model.
The proposed approach can be trained in an end-to-end architecture by minimizing three loss functions, with the constraint that the acquired knowledge of the student model closely aligns with the processed teacher knowledge.
METHOD
In this paper, we try to solve the problem of mismatched model capacities between teacher and student models during the KD process. Our proposed approach introduces a novel fuzzy knowledge distillation (fuzzy-KD) module that leverages fuzzy logic and information deletion to enhance the learning capability of the student model. The fuzzy-KD module transmits generalized and selected information, which involves two main operations: fuzzy processing and feature selection.
Fuzzy knowledge distillation module:The fuzzy-KD module can be divided into four steps: fuzzy processing, modelling feature importance, reconstructing feature maps, and feature selection. This module reduces the impact of inconsistent learning capabilities betweenteacher and student models.
Fuzzy processing and feature selection: The process of fuzzy processing primarily involvesthe application of an affiliation function to each channel of the feature mapsusing Gaussian fuzzy within the Fuzzy Processing Module (FPM) to obtaingeneralized information.Feature selection is performed by a feature selectionlayer to reduce the number of channels by retaining the important featuremaps and removing the useless feature maps.
Loss functions: The training of the proposed model relies on minimizing three types oflosses: the representation loss, the predictive distribution loss, andthe cross-entropy loss.
Fig. 1. Framework of this work.
TABLE Ⅰ PERFORMANCE COMPARISON BY FIVE DIFFERENT METHODS
Fig. 2. Performance comparison of three knowledge distillation methods.
Fig. 3. Discriminative connections and brain regions selected by the teacher model, KD student model, and fuzzy-KD student model.