Refereed Conference Proceedings
Probing Representation Forgetting in Supervised and Unsupervised Continual LearningMohammadReza Davari*, Nader Asadi*, Sudhir Mudur, Rahaf Aljundi and Eugene Belilovsky, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), June 2022.
* Equal contribution
Continual Learning (CL) research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.
On the Inadequacy of CKA as a Measure of Similarity in Deep LearningMohammadReza Davari*, Stefan Horoi*, Amine Natik, Guillaume Lajoie, Guy Wolf and Eugene Belilovsky, In Proceedings of the ICLR 2022 Workshop on Geometrical and Topological Representation Learning (ICLR GTRL Workshop 2022), April 2022.
* Equal contribution
Comparing learned representations is a challenging problem which has been approached in different ways. The CKA similarity metric, particularly it's linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of similar networks trained differently, or of models with different architectures trained on the same data. CKA results have been used to make a wide variety of claims about similarity and dissimilarity of these various representations. In this work we investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counterintuitive results. We then study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Indeed we illustrate in some cases the CKA value can be heavily manipulated without substantial changes to the functional behaviour.
Probing Representation Forgetting in Continual LearningMohammadReza Davari and Eugene Belilovsky, In Proceedings of the NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (NeurIPS DistShift Workshop 2021), December 2021.
Continual Learning methods typically focus on tackling the phenomenon of catastrophic forgetting in the context of neural networks. Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model. In supervised learning problems this forgetting is typically measured or observed by evaluating decrease in task performance. However, a model’s representations can change without losing knowledge. In this work we consider the concept of representation forgetting, which relies on using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that through this lens, model representations trained without any special control for forgetting often experience minimal representation forgetting. Furthermore we find that many approaches to continual learning that aim to resolve the catastrophic forgetting problem do not improve the representation forgetting upon the usefulness of the representation.
Semantic Similarity Matching Using Contextualized RepresentationsFarhood Farahnak, Elham Mohammadi*, MohammadReza Davari*, and Leila Kosseim, In Proceedings of the 34th Canadian Conference on Artificial Intelligence (CAIAC 2021), June 2021.
* Equal contribution
Different approaches to address semantic similarity matching generally fall into one of the two categories of interaction-based and representation-based models. While each approach offers its own benefits and can be used in certain scenarios, using a transformer-based model with a completely interaction-based approach may not be practical in many real-life use cases. In this work, we compare the performance and inference time of interaction-based and representation-based models using contextualized representations. We also propose a novel approach which is based on the late interaction of textual representations, thus benefiting from the advantages of both model types.
TIMBERT: Toponym Identifier For The Medical Domain Based on BERTMohammdReza Davari, Leila Kosseim and Tien Bui, In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), December 2020.
In this paper, we propose an approach to automate the process of place name detection in the medical domain to enable epidemiologists to better study and model the spread of viruses. We created a family of Toponym Identification Models based on BERT (TIMBERT), in order to learn in an end-to-end fashion the mapping from an input sentence to the associated sentence labeled with toponyms. When evaluated with the SemEval 2019 task 12 test set (Weissenbacher et al., 2019), our best TIMBERT model achieves an F1 score of 90.85%, a significant improvement compared to the state-of-the-art of 89.10% (Wang et al., 2019).
Toponym Identification in Epidemiology Articles - A Deep Learning ApproachMohammdReza Davari, Leila Kosseim and Tien Bui, In Proceedings of the 20th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2019), April 2019, La Rochelle, France.
🏆 Best Poster Award
When analyzing the spread of viruses, epidemiologists often need to identify the location of infected hosts. This information can be found in public databases, such as GenBank, however, information provided in these databases are usually limited to the country or state level. More fine-grained localization information requires phylogeographers to manually read relevant scientific articles. In this work we propose an approach to automate the process of place name identification from medical (epidemiology) articles. The focus of this paper is to propose a deep learning based model for toponym detection and experiment with the use of external linguistic features and domain specific information. The model was evaluated using a collection of 105 epidemiology articles from PubMed Central provided by the recent SemEval task 12. Our best detection model achieves an F1 score of 80.13%, a significant improvement compared to the state of the art of 69.84%. These results underline the importance of domain specific embedding as well as specific linguistic features in toponym detection in medical journals.