Our research
Most of the research in the lab in focused on Interpretability methods in Artificial Intelligence. We apply them to LLMs, neural speech models, vision-language models, and time series models. We develop new interpretability techniques. Much of our work has been inspired by findings in linguistics, cognitive science and neuroscience, and a long term goal is to ‘give back’ to those fields, for instance by using AI models as scaffolds to build predictive models of neuroimaging data or by providing ‘existence proofs’ of possible ways in which networks of neurons might implement specific cognitive functions.
Some things you might (want to) know us for:
- In 2023, 2024 and 2025 we published a series of papers on Speech-AI Interpretability. Check out our papers analyzing speech encoders (such as Wav2Vec2, Hubert), speech recognition systems (finetuned Wav2Vec2, Whisper), and text-to-speech models (such as Parler-TTS, Tacotron-TTS). We adapted some state-of-the-art interpretability techniques from the text domain (including the LogitLens, causal interventions) to the pecularities of speech, and developed some new techniques (including Value Zeroing) to open up the blackbox of this increasingly impactful class of deep learning models. See: (Pouw, de Heer Kloots, Alishahi, & Zuidema, 2024) (de Heer Kloots & Zuidema, 2024)
- In 2023, we proposed Value Zeroing as a novel technique to understand the information flow in Transformer models. See: (Mohebbi, Zuidema, Chrupała, & Alishahi, 2023)
- In 2023, we published two commentaries in Nature, to call for the urgent establishment of guidelines for the use of generative AI in science and scholarship. Our ‘Living guidelines’ in adapted form, were adopted by the European Union. See: (Van Dis, Bollen, Zuidema, Van Rooij, & Bockting, 2023)
- In 2021, we proposed Cosine Contours, a Multipurpose Representation for Melodies. This is part of series of papers where we explore advanced computational techniques to contribute to research on music technology and music cognition. See: (Cornelissen, Zuidema, & Burgoyne, 2021) (Cornelissen, Zuidema, & Burgoyne, 2020)
- In 2020, we proposed Attention Rollout as a novel technique to turn the attention patterns in a Transformer into a input attribution method. In our 2020 paper, the attention patterns were very simple (attention weight + 1, to account for the residual technqiue), but the technique has had many follow-ups where people have used more sophisticated choices for the atomic attention patterns. See: (Abnar & Zuidema, 2020)
- In 2018, we proposed causal interventions, as a way to assess whether probing results are truly reflecting the underling mechanisms in Large Language Models. See: (Giulianelli, Harding, Mohnert, Hupkes, & Zuidema, 2018)
- In 2016, we proposed diagnostic classification, as a simple way to open the blackbox of Large Language Models. Our paper was published around the same time that two other papers appeared on arxiv (Adi et al. 2017; Alain & Bengio, 2017), proposing very similar ideas. The technique is now mostly known as ‘probing’, using the terminology of the latter of those papers. See: (Veldhoen, Hupkes, & Zuidema, 2016), (Hupkes, Veldhoen, & Zuidema, 2018)
- In 2015, we presented the RNN-LSTM, now better known as the TreeLSTM, using terminology from the concurrently published paper of Tai et al. See: (Le & Zuidema, 2015)
References
Pouw, C., de Heer Kloots, M., Alishahi, A., & Zuidema, W. (2024). Perception of phonological assimilation by neural speech recognition models. Computational Linguistics, 50, 1557–1585.
de Heer Kloots, M., & Zuidema, W. (2024). Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0. 4593–4597. DOI
Mohebbi, H., Zuidema, W., Chrupała, G., & Alishahi, A. (2023). Quantifying Context Mixing in Transformers. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3378–3400.
Van Dis, E. A. M., Bollen, J., Zuidema, W., Van Rooij, R., & Bockting, C. L. (2023). ChatGPT: five priorities for research. Nature, 614, 224–226.
Cornelissen, B., Zuidema, W., & Burgoyne, J. A. (2021). Cosine Contours: a Multipurpose Representation for Melodies. Proceedings of the 22th International Conference on Music Information Retrieval. Presented at the Online. Online.
Abnar, S., & Zuidema, W. (2020). Quantifying Attention Flow in Transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4190–4197. Online: Association for Computational Linguistics. DOI
Cornelissen, B., Zuidema, W., & Burgoyne, J. A. (2020). Mode Classification and Natural Units in Plainchant. Proceedings of the 21th International Conference on Music Information Retrieval, 869–875. Montreal, Canada.
Giulianelli, M., Harding, J., Mohnert, F., Hupkes, D., & Zuidema, W. (2018). Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings EMNLP Workshop Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP).
Veldhoen, S., Hupkes, D., & Zuidema, W. (2016). Diagnostic classifiers: revealing how neural networks process hierarchical structure. Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches (at NIPS). PDF
Recent publications
Transformer-specific interpretability
Mohebbi, H., Jumelet, J., Hanna, M., Alishahi, A., & Zuidema, W. (2024). Transformer-specific interpretability. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, 21–26.
Perception of phonological assimilation by neural speech recognition models
Pouw, C., de Heer Kloots, M., Alishahi, A., & Zuidema, W. (2024). Perception of phonological assimilation by neural speech recognition models. Computational Linguistics, 50, 1557–1585.
fl-IRT-ing with Psychometrics to Improve NLP Bias Measurement
Bachmann, D., van der Wal, O., Chvojka, E., Zuidema, W. H., van Maanen, L., & Schulz, K. (2024). fl-IRT-ing with Psychometrics to Improve NLP Bias Measurement. Minds and Machines, 34, 37.
Do Language Models Exhibit Human-like Structural Priming Effects?
Jumelet, J., Zuidema, W., & Sinclair, A. (2024). Do Language Models Exhibit Human-like Structural Priming Effects? Findings of the Association for Computational Linguistics ACL 2024, 14727–14742.
Language Models That Accurately Represent Syntactic Structure Exhibit Higher Representational Similarity To Brain Activity
Fresen, A. J., Choenni, R., Heilbron, M., Zuidema, W., & de Heer Kloots, M. (2024). Language Models That Accurately Represent Syntactic Structure Exhibit Higher Representational Similarity To Brain Activity. Proceedings of the Annual Meeting of the Cognitive Science Society, 46.
Undesirable Biases in NLP: Addressing Challenges of Measurement
van der Wal, O., Bachmann, D., Leidinger, A., van Maanen, L., Zuidema, W., & Schulz, K. (2024). Undesirable Biases in NLP: Addressing Challenges of Measurement. Journal of Artificial Intelligence Research, 1–40.
DOI