Syntactic Dependencies in Transformers and Their Relation to the Brain
The internal activations of Transformer-based language models have been found to correlate with human brain activity in language processing. But what linguistic features drive this correlation? As part of his internship in the CLC lab, Bram Fresen studied how this model-brain similarity depends on language models’ ability to represent syntactic dependencies. This study formed Bram’s first research project within the MSc Brain & Cognitive Sciences programme, and was supervised by Marianne de Heer Kloots, Rochelle Choenni and Jelle Zuidema. Based on his work, Bram won a Master Student Award from the Dutch Society for Brain & Cognition, allowing him to present the results of his project at the annual NVP Brain & Cognition Winter Conference in 2023. This page contains a short abstract describing the main preliminary findings from Bram’s ongoing research project, as well as a pdf of his poster and some supplementary references.
Poster
Click the thumbnail below to download a pdf of Bram’s poster.
Abstract
Since their emergence, transformers have dominated the field of computational linguistics. Recent studies have shown a resemblance between the activations of transformers and the brain, but we have yet to determine what properties underlie this resemblance. Here, we investigated what role syntactic dependencies have in producing this resemblance. Specifically, we used functional magnetic resonance imaging (fMRI) data from the Mother of All Unification Studies (MOUS) in which participants read sentences with varying syntactic complexity. Note that, due to time constraints, only four participants from the MOUS were analysed, making the results reported here preliminary. The same sentences as used in the MOUS were fed through monolingual and multilingual transformer models, whose contextualised embeddings were then extracted. This allowed us to perform representational similarity analysis (RSA) between model embeddings and the MOUS’ fMRI data, specifically from the left posterior middle temporal gyrus (LpMTG). This resulted in a representational similarity (RS) score for each layer of the transformer models. The accuracy with which these transformers represented the syntactic dependencies present in the sentences was determined with a labelled structural probe called DepProbe. Next, we correlated the RS scores with the accuracy of the dependency representations. This analysis revealed three main findings. 1) Models that more accurately represent dependency information are more similar to the brain. 2) Monolingual models outperform multilingual models in both similarity to the brain and representing dependencies accurately. 3) The relationship between brain similarity and accuracy of dependency representations is mediated by syntactic complexity. All in all, the present study shows that syntactic dependencies are a critical part of brain-like language models.
References
Liang, D., Gonen, H., Mao, Y., Hou, R., Goyal, N., Ghazvininejad, M., … Khabsa, M. (2023). XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models. https://doi.org/10.48550/arXiv.2301.10472 DOI
Caucheteux, C., & King, J.-R. (2022). Brains and algorithms partially converge in natural language processing. Communications Biology, 5, 134. DOI
Müller-Eberstein, M., van der Goot, R., & Plank, B. (2022). Probing for Labeled Dependency Trees. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7711–7726. Dublin, Ireland: Association for Computational Linguistics. DOI
Uddén, J., Hultén, A., Schoffelen, J.-M., Lam, N., Harbusch, K., van den Bosch, A., … Hagoort, P. (2022). Supramodal Sentence Processing in the Human Brain: fMRI Evidence for the Influence of Syntactic Complexity in More Than 200 Participants. Neurobiology of Language, 3, 575–598. DOI
Delobelle, P., Winters, T., & Berendt, B. (2020). RobBERT: a Dutch RoBERTa-based Language Model. Findings of the Association for Computational Linguistics: EMNLP 2020, 3255–3265. Association for Computational Linguistics. DOI
Schoffelen, J. M., Oostenveld, R., Lam, N., Uddén, J., Hultén, A., & Hagoort, P. (2019). Mother of Unification Studies, a 204-subject multimodal neuroimaging dataset to study language processing. https://doi.org/10.34973/37n0-yc51 DOI
Jawahar, G., Sagot Benoı̂t, & Seddah, D. (2019). What Does BERT Learn about the Structure of Language? Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3651–3657. Florence, Italy: Association for Computational Linguistics. DOI
de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., & Nissim, M. (2019). BERTje: A Dutch BERT Model. https://doi.org/10.48550/arXiv.1912.09582 DOI
Abnar, S., Ahmed, R., Mijnheer, M., & Zuidema, W. H. (2017). Experiential, Distributional and Dependency-based Word Embeddings have Complementary Roles in Decoding Brain Activity. Workshop on Cognitive Modeling and Computational Linguistics. https://doi.org/10.18653/v1/W18-0107 DOI
Dozat, T., & Manning, C. D. (2016). Deep Biaffine Attention for Neural Dependency Parsing. CoRR, abs/1611.01734. https://doi.org/10.48550/arXiv.1611.01734 DOI
van der Beek, L., Bouma, G., Malouf, R., & van Noord, G. (2001). The Alpino Dependency Treebank. Computational Linguistics in the Netherlands 2001. https://doi.org/10.1163/9789004334038_003 DOI