Students
Daniela Gottesman
Ohav Barbi
Amit Elhelo
Yoav Gur Arieh
Asaf Avrahamy
Or Shafran
Sohee Yang Visitor Student
Alumni
Daniela Gottesman M.Sc. 2024 → to start Ph.D.
Amit Arnold Levy Guest Student 2024
Dana Ramati Project Student 2024
Publications
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Carlos Niebles, Mor Geva, Junxian He, Jiajun Wu, Manling Li. 2025.
Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Carlos Niebles, Mor Geva, Junxian He, Jiajun Wu, Manling Li. 2025.
Open Problems in Mechanistic Interpretability
Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Bloom, Stella Biderman, Adria Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Tom McGrath. 2025.
Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Bloom, Stella Biderman, Adria Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Tom McGrath. 2025.
Enhancing Automated Interpretability with Output-Centric Feature Descriptions
Yoav Gur-Arieh, Roy Mayan, Chen Agassy, Atticus Geiger, Mor Geva. 2025.
Yoav Gur-Arieh, Roy Mayan, Chen Agassy, Atticus Geiger, Mor Geva. 2025.
Open Problems in Machine Unlearning for AI Safety
Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal. 2025.
Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal. 2025.
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Ido Cohen, Daniela Gottesman, Mor Geva, Raja Giryes. 2024.
Ido Cohen, Daniela Gottesman, Mor Geva, Raja Giryes. 2024.
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel*, Mor Geva*. 2024.
Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel*, Mor Geva*. 2024.
Language Models Encode Numbers Using Digit Representations in Base 10
Amit Arnold Levy, Mor Geva. NAACL 2025.
Amit Arnold Levy, Mor Geva. NAACL 2025.
Eliciting Textual Descriptions from Representations of Continuous Prompts
Dana Ramati, Daniela Gottesman, Mor Geva. 5th Workshop on Trustworthy Natural Language Processing (TrustNLP), NAACL 2025.
Dana Ramati, Daniela Gottesman, Mor Geva. 5th Workshop on Trustworthy Natural Language Processing (TrustNLP), NAACL 2025.
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo, Luke Ong, Philip Torr, Mor Geva, David Krueger, Fazl Barez. ICLR 2025.
Clement Neo, Luke Ong, Philip Torr, Mor Geva, David Krueger, Fazl Barez. ICLR 2025.
CoverBench: A Challenging Benchmark for Complex Claim Verification
Alon Jacovi, Moran Ambar, Eyal Ben-David, Uri Shaham, Amir Feder, Mor Geva, Dror Marcus, Avi Caciularu. 2024.
Alon Jacovi, Moran Ambar, Eyal Ben-David, Uri Shaham, Amir Feder, Mor Geva, Dror Marcus, Avi Caciularu. 2024.
When Can Transformers Count to n?
Gilad Yehudai, Haim Kaplan, Asma Ghandeharioun, Mor Geva, Amir Globerson. 2024.
Gilad Yehudai, Haim Kaplan, Asma Ghandeharioun, Mor Geva, Amir Globerson. 2024.
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Maor Ivgi, Ori Yoran, Jonathan Berant, Mor Geva. 2nd Workshop on Attributing Model Behavior at Scale, NeurIPS 2024.
Maor Ivgi, Ori Yoran, Jonathan Berant, Mor Geva. 2nd Workshop on Attributing Model Behavior at Scale, NeurIPS 2024.
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson. EMNLP 2024.
Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson. EMNLP 2024.
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman, Mor Geva. EMNLP 2024.
Daniela Gottesman, Mor Geva. EMNLP 2024.
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
Marius Mosbach, Vagrant Gautam, Tomás Vergara-Browne, Dietrich Klakow, Mor Geva. EMNLP 2024.
Marius Mosbach, Vagrant Gautam, Tomás Vergara-Browne, Dietrich Klakow, Mor Geva. EMNLP 2024.
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces
Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, Mor Geva. Workshop on Weight Space Learning, ICLR 2025.
Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, Mor Geva. Workshop on Weight Space Learning, ICLR 2025.
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?
Gal Yona, Roee Aharoni, Mor Geva. EMNLP 2024.
Gal Yona, Roee Aharoni, Mor Geva. EMNLP 2024.
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf. EMNLP 2024. Best Paper Award.
Shahar Katz, Yonatan Belinkov, Mor Geva, Lior Wolf. EMNLP 2024. Best Paper Award.
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger. ACL 2024.
Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger. ACL 2024.
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva*, Sebastian Riedel*. ACL 2024.
Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva*, Sebastian Riedel*. ACL 2024.
The Hidden Space of Transformer Language Adapters
Jesujoba O. Alabi, Marius Mosbach, Matan Eyal, Dietrich Klakow, Mor Geva. ACL 2024.
Jesujoba O. Alabi, Marius Mosbach, Matan Eyal, Dietrich Klakow, Mor Geva. ACL 2024.
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins, Roee Aharoni, Mor Geva. ACL 2024.
Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins, Roee Aharoni, Mor Geva. ACL 2024.
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
Gal Yona, Roee Aharoni, Mor Geva. ACL 2024.
Gal Yona, Roee Aharoni, Mor Geva. ACL 2024.
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun*, Avi Caciularu*, Adam Pearce, Lucas Dixon, Mor Geva. ICML 2024.
Asma Ghandeharioun*, Avi Caciularu*, Adam Pearce, Lucas Dixon, Mor Geva. ICML 2024.
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. LREC-COLING 2024.
Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. LREC-COLING 2024.
The Hidden Language of Diffusion Models
Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf. ICLR 2024.
Hila Chefer, Oran Lang, Mor Geva, Volodymyr Polosukhin, Assaf Shocher, Michal Irani, Inbar Mosseri, Lior Wolf. ICLR 2024.
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva. TACL 2024.
Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva. TACL 2024.
In-Context Learning Creates Task Vectors
Roee Hendel, Mor Geva, Amir Globerson. Findings of EMNLP 2023.
Roee Hendel, Mor Geva, Amir Globerson. Findings of EMNLP 2023.
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks
Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut. EMNLP 2023.
Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut. EMNLP 2023.
A Comprehensive Evaluation of Tool-Assisted Generation Strategies
Alon Jacovi, Avi Caciularu, Jonathan Herzig, Roee Aharoni, Bernd Bohnet, Mor Geva. Findings of EMNLP 2023.
Alon Jacovi, Avi Caciularu, Jonathan Herzig, Roee Aharoni, Bernd Bohnet, Mor Geva. Findings of EMNLP 2023.
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen, May Hamri, Mor Geva, Amir Globerson. EMNLP 2023.
Roi Cohen, May Hamri, Mor Geva, Amir Globerson. EMNLP 2023.
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson. EMNLP 2023.
Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson. EMNLP 2023.
Crawling the Internal Knowledge-Base of Language Models
Roi Cohen, Mor Geva, Jonathan Berant, Amir Globerson. Findings of EACL 2023.
Roi Cohen, Mor Geva, Jonathan Berant, Amir Globerson. Findings of EACL 2023.
Understanding Transformer Memorization Recall Through Idioms
Adi Haviv, Ido Cohen, Jacob Gidron, Roei Schuster, Yoav Goldberg, Mor Geva. EACL 2023.
Adi Haviv, Ido Cohen, Jacob Gidron, Roei Schuster, Yoav Goldberg, Mor Geva. EACL 2023.
Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models
I was fortunate to be part of this work by 442 contributors across 132 institutions. TMLR 2023. TMLR Finalist for Outstanding Certification.
I was fortunate to be part of this work by 442 contributors across 132 institutions. TMLR 2023. TMLR Finalist for Outstanding Certification.
Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions
Mihir Parmar*, Swaroop Mishra*, Mor Geva, Chitta Baral. EACL 2023. Outstanding Paper Award.
Mihir Parmar*, Swaroop Mishra*, Mor Geva, Chitta Baral. EACL 2023. Outstanding Paper Award.
Inferring Implicit Relations with Language Models
Uri Katz, Mor Geva, Jonathan Berant. Findings of EMNLP 2022.
Uri Katz, Mor Geva, Jonathan Berant. Findings of EMNLP 2022.
LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models
Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, Yoav Goldberg. System Demonstrations Track, EMNLP 2022.
Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, Yoav Goldberg. System Demonstrations Track, EMNLP 2022.
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva*, Avi Caciularu*, Kevin Ro Wang, Yoav Goldberg. EMNLP 2022.
Mor Geva*, Avi Caciularu*, Kevin Ro Wang, Yoav Goldberg. EMNLP 2022.
SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy. EMNLP 2022.
Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy. EMNLP 2022.
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths through Question Decomposition
Mor Geva, Tomer Wolfson, Jonathan Berant. TACL 2022.
Mor Geva, Tomer Wolfson, Jonathan Berant. TACL 2022.
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models
Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan Berant. EMNLP 2021.
Mor Geva, Uri Katz, Aviv Ben-Arie, Jonathan Berant. EMNLP 2021.
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant. TACL 2021.
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, Jonathan Berant. TACL 2021.
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy. EMNLP 2021.
Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy. EMNLP 2021.
Injecting Numerical Reasoning Skills into Language Models
Mor Geva*, Ankit Gupta*, Jonathan Berant. ACL 2020.
Mor Geva*, Ankit Gupta*, Jonathan Berant. ACL 2020.
Break It Down: A Question Understanding Benchmark
Tomer Wolfson, Mor Geva, Ankit Gupta, Matt Gardner, Yoav Goldberg, Daniel Deutch, Jonathan Berant. TACL 2020.
Tomer Wolfson, Mor Geva, Ankit Gupta, Matt Gardner, Yoav Goldberg, Daniel Deutch, Jonathan Berant. TACL 2020.
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva, Yoav Goldberg, Jonathan Berant. EMNLP-IJCNLP 2019.
Mor Geva, Yoav Goldberg, Jonathan Berant. EMNLP-IJCNLP 2019.
DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion
Mor Geva, Eric Malmi, Idan Szpektor, Jonathan Berant. NAACL 2019.
Mor Geva, Eric Malmi, Idan Szpektor, Jonathan Berant. NAACL 2019.
Emergence of Communication in an Interactive World with Consistent Speakers
Ben Bogin, Mor Geva, Jonathan Berant. Emergent Communication Workshop, NIPS 2018.
Ben Bogin, Mor Geva, Jonathan Berant. Emergent Communication Workshop, NIPS 2018.
Learning to Search in Long Documents Using Document Structure
Mor Geva, Jonathan Berant. COLING 2018.
Mor Geva, Jonathan Berant. COLING 2018.
Evaluating Semantic Parsing against a Simple Web-based Question Answering Model
Alon Talmor, Mor Geva, Jonathan Berant. *SEM 2017.
Alon Talmor, Mor Geva, Jonathan Berant. *SEM 2017.
Patents
Training Set Sufficiency for Custom Face Recognition , joint with Oron Nir. Issued May 11, 2018, us 404234-US-NP.
Media Management System for Video Data Processing and Adaptation Data Generation . June, 2018, us 403684-US-NP.
Methods for Consolidating OCR Detection in Video , joint with Oron Nir. June, 2018, us 403687-US-PSP.
Teaching
Natural Language Processing
Spring 2024/25
Spring 2024/25
Seminar on Interpretability of Large Language Models
Fall 2024/25
Fall 2024/25
Seminar on Interpretability of Large Language Models
Spring 2023/24
Spring 2023/24
Introduction to Machine Learning
Teaching Assistant, Fall 2018/19
Teaching Assistant, Fall 2018/19
NVIDIA DLI Workshop
University Ambassador, 2018/19
University Ambassador, 2018/19