Guest talk: Shiwei Liu

Seminars

The Curse of Depth in LLMs and How Sparsity Mitigates It on May 13, 2026

When

13.5.2026 10:00 – 11:00 (UTC +3)

Where

Onsite & Online

TU6, Maarintie 8, 02150 Espoo

Event language(s)

English

On May 13, 2026, Shiwei Liu (ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems) will give a guest talk The Curse of Depth in LLMs and How Sparsity Mitigates It. Hosted by Jiancheng Yang (ELLIS Institute Finland, Aalto University).

Date and time

May 13, 2026, 10:00-11:00 EEST

Location

Lecture hall TU6, Maarintie 8, 02150 Espoo (Aalto University campus) and Zoom

Abstract

Recent work has demonstrated the curse of depth in large language models (LLMs), where later layers contribute less to learning and representation than earlier layers. Such under-utilization is linked to the accumulated growth of variance in Pre-Layer Normalization, which can push deep blocks toward near-identity behavior. In this paper, we demonstrate that, sparsity, beyond enabling efficiency, acts as a regulator of variance propagation and thereby improves depth utilization. Our investigation covers two sources of sparsity: (i) implicit sparsity, which emerges from training and data conditions, including weight sparsity induced by weight decay and attention sparsity induced by long-context inputs; and (ii) explicit sparsity, which is enforced by architectural design, including key/value-sharing sparsity in Grouped-Query Attention and expert-activation sparsity in Mixture-of-Experts. Our claim is thoroughly supported by controlled depth-scaling experiments and targeted layer effectiveness interventions. Across settings, we observe a consistent relationship: sparsity improves layer utilization by reducing output variance and promoting functional differentiation. We eventually distill our findings into a practical rule-of-thumb recipe for training depth-effective LLMs, yielding a notable 4.6% accuracy improvement on downstream tasks. Our results reveal sparsity, arising naturally from standard design choices, as a key yet previously overlooked mechanism for effective depth scaling in LLMs.

Bio

Shiwei Liu is a group leader at Max Planck Institute for Intelligent System and a PI at ELLIS Institute Tübingen. He was a Royal Society Newton International Fellow at the University of Oxford. He previously served as a Postdoctoral Fellow at the University of Texas at Austin. He obtained his Ph.D. with Cum Laude from Eindhoven University of Technology in 2022. Dr. Liu has received two Rising Star Awards from KAUST and the Conference on Parsimony and Learning (CPAL). His Ph.D. thesis received the 2023 Best Dissertation Runner-up from Informatics Europe. He has a broad interests of life-cycle of large foundation models.

Events

Seminars, meetings, workshops and other events at ELLIS Institute Finland

See all

Five people seated in a semicircle of chairs in front of a screen. One person standing in front of a banner to the side.

Updated: 5.5.2026
Published: 5.5.2026