arXiv

First-Exit Time Analysis for Truncated Heavy-Tailed Dynamical Systems

Xingyu Wang, Chang-Han Rhee

Feb 8, 2026·04:56··Original Paper

First-Exit TimeHeavy-Tailed Dynamical SystemsTruncation OperatorLarge Deviations and MetastabilityCatastrophe PrincipleMetastability

About This Paper

In this paper, we study the first-exit time of stochastic difference equation $X^η_{j+1}(x) = X^η_{j}(x) + ηa\big( X^η_{j}(x)\big) + ησ\big( X^η_{j}(x)\big)Z_{j+1}$ and its truncated variant $X^{η|b}_{j+1}(x) = X^{η|b}_{j}( x) + φ_b\big(ηa\big( X^{η|b}_{j}( x)\big) + ησ\big( X^{η|b}_{j}( x)\big) Z_{j+1}\big)$, where $φ_b(x) = (x/|x|)\min\{|x|, b\}$ and the law of the noise $Z_t$ is multivariate regularly varying. The truncation operator $φ_b(\cdot)$ is often introduced as a modulation mechanism in heavy-tailed systems, such as stochastic gradient descent algorithms in deep learning. By developing a framework that connects large deviations with metastability, we leverage the locally uniform sample-path large deviations for both processes in Wang and Rhee (2024) to obtain precise characterizations of the joint distributions of the first exit times and exit locations. The resulting limit theorem unveils a discrete hierarchy of phase transitions (i.e., exit times) as the truncation threshold $b$ varies, and manifests the catastrophe principle, whereby key events or metastable behaviors in heavy-tailed systems are driven by catastrophic behavior in a few components while the rest of the system behaves nominally. These developments lead to a comprehensive heavy-tailed counterpart of the classical Freidlin-Wentzell theory.

Alex

Welcome to another episode of ResearchPod.

Sam

Today, we're discussing a paper called "First-Exit Time Analysis for Truncated Heavy-Tailed Dynamical Systems" by Xingyu Wang and Chang-Han Rhee.

Sam

It examines how long it takes for a system driven by random steps to leave a stable zone around a resting point. Picture a particle nudged by small predictable drifts plus unpredictable jolts. The jolts follow heavy-tailed noise, where huge outliers happen more often than in everyday randomness like coin flips. They add truncation, a cap on step sizes, much like gradient clipping in training neural networks.

Alex

So this is basically about when these random processes finally break out of their stable zones... but the old math tools for normal randomness don't work here?

Sam

That's right. Classical theory handles systems with tame, bell-shaped noise, predicting exits via exponentially long waits and smooth paths out. But heavy-tailed setups scale differently, often polynomially, driven by rare big jumps. Truncation creates a hierarchy of exit times based on how many capped jumps are needed to escape.

Alex

Without clipping, it's usually one massive leap out, but clipping forces multiple smaller ones?

Sam

Exactly. For truncated heavy-tailed dynamics, exits from a basin require a precise minimum number of jumps. This number comes from the basin's width relative to the clip level—think of the basin as a safe zone with boundaries a certain distance away. With each jump capped at length b, you need the smallest number of those jumps, laid end to end along the drift path, to reach outside. They call this minimum J.

Alex

Like measuring how many fixed-length stones you need to cross a river gap. So if J goes from 1 to 2 as you tighten the clip, what happens to the escape time?

Sam

That stone-stepping picture fits. When no clip, J is always 1, and escape needs one huge jump. But with clipping, escape demands exactly J such jumps. The time to exit grows slower as J increases. Tiny changes in b that bump J up by 1 cause the exit time to jump to a much longer scale.

Alex

Wider basins or tighter clips mean bigger J, longer waits—like the system gets stuck longer in those flat spots during training.

Sam

The paper confirms this with numerics in one dimension. As the clip shrinks, simulations show clear steps in exit times, matching predictions. In machine learning terms, it clarifies why models linger in flat minima during clipped gradient descent.

Alex

How do they figure out exactly how many jumps are needed for a given basin?

Sam

In simple one-dimensional cases where drift pulls everything inward, J is the ceiling of the distance to the nearer boundary divided by b.

Alex

And where does the process land after escape?

Sam

The exit spot follows a specific measure that integrates over J-fold heavy-tail distributions times path lengths. It maps the drift path plus those J truncated jumps to points just outside the basin. They develop a general framework for Markov chains using asymptotic atoms—reliable reset zones the process hits quickly, like respawning at checkpoints in a game level. These atoms cover the basin densely, and exits behave uniformly from them.

Alex

So the atoms let them prove the joint law sharply, by breaking escapes into regeneration cycles. For the truncated dynamics, they verify atoms exist with the right scale.

Sam

Yes. To make it rigorous, they shrink the basin slightly—creating a smaller safe zone where every point is a tiny distance ε away from the edge. They prove lemmas showing paths with exactly J minus one truncated jumps land deep inside, requiring another jump to escape.

Alex

Those lemmas glue the multi-jump paths to the general theory, ensuring scaled exit time goes exponential and location to the measure.

Sam

In machine learning, this ties to SGD where gradients get clipped: if the update exceeds norm b, project it back to length b—like capping a vector pull to safe strength. The numerics trace those predicted phase jumps as clip tightens.

Alex

What's the main takeaway for someone following machine learning training?

Sam

The paper delivers a heavy-tailed version of classical theory tailored to truncated dynamics like clipped SGD. It shows escapes from basins happen via a fixed number of jumps J, creating discrete shifts in time scales that match simulations. This links rare-event math to why models dwell longer in wide flat regions.

Alex

Instead of smooth escapes, it's these stepped phases. Any caveats in the setup?

Sam

The analysis assumes smooth drift and diffusion functions, and multivariate regular variation for the noise tails. Numerics are only in one dimension so far, with global behavior covered in a companion paper.

Alex

That's a meaningful synthesis from multi-jump math to why training behaves this way in clipped setups. Thanks, Sam—this has been a clear dive into the mechanics.

Sam

My pleasure, Alex. The work stands as a notable advance in rare-event analysis for optimization.

Alex

And that's our look at first-exit times in truncated heavy-tailed systems. Thanks for joining ResearchPod.