ACL 2026

Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem

Zeguan Xiao, Siqing Li, Yong Wang, Xuetao Wei, Jian Yang, Yun Chen, Guanhua Chen

A project page for our ACL 2026 paper on making LLM unlearning more faithful to the real objective: preserve general capability first, and only forget where doing so does not damage retention.

arXiv Code

Retention

Treat general capability as the primary objective instead of a side constraint that can be traded away too aggressively.

Forgetting

Inject forgetting only in directions that do not actively conflict with the retain signal.

Guarantee

SAGO builds updates that stay aligned with retention coordinate by coordinate, not just on average.

Comparison of PCGrad and SAGO gradient synthesis

Pareto frontier improvements on WMDP with SAGO

Figure overview. The page starts with the central story of the paper: SAGO changes how forget and retain gradients are combined, and that change pushes the forgetting-retention Pareto frontier upward on WMDP.

Motivation

LLM unlearning is attractive because it aims to remove dangerous or private knowledge from a model itself, rather than only teaching the model to refuse when asked. In principle that is a stronger defense: if a model genuinely no longer retains a harmful capability, jailbreaks should become much less effective.

The hard part is that unlearning is never just a forgetting problem. It is also a retention problem. Push too hard and the model loses useful knowledge and general performance. Push too softly and the target knowledge remains recoverable. This is why many unlearning methods end up living on an uncomfortable frontier between forgetting strength and capability preservation.

Our paper argues that the usual framing is slightly off. The problem is not simply “balance two losses better.” The real question is how to combine updates so that forgetting is allowed only where it does not sabotage retention.

Asymmetric Two-Task Learning

We recast LLM unlearning as an asymmetric two-task learning problem. There are two tasks in play, but they do not have equal status:

Primary task

Retention should be preserved whenever possible because it represents the model's general capability and non-target knowledge.

Auxiliary task

Forgetting is still necessary, but it should operate under a do-no-harm principle relative to retention.

Design shift

This changes the engineering target from loss balancing to retention-prioritized gradient synthesis.

That asymmetric framing is important because it matches how practitioners already evaluate unlearning methods. A method is not considered successful if it forgets well but destroys the model. We should therefore encode that hierarchy directly into the update rule.

How SAGO Synthesizes Gradients

At each update step, we compute two signals: a forget gradient and a retain gradient. The question is how to merge them. A naive difference-style update can work against retention in exactly the coordinates where general capability is stored. PCGrad improves on this by removing part of the conflict, but it still reasons at a coarser level.

SAGO, short for Sign-Align Gradient Optimization, goes finer-grained. It examines the relationship between forget and retain gradients element by element:

If a coordinate is conflicting, SAGO keeps the retain direction and blocks the forget direction there.
If a coordinate is aligned, SAGO allows the forget update through because it does not fight retention.
The final update therefore never points against the retain signal in any coordinate.

The result is a simple rule with a strong intuition: keep the parts of forgetting that are safe, and suppress the parts that would damage what we want to preserve.

What the Results Show

Across WMDP and RWKU, SAGO consistently improves the trade-off curve instead of merely moving to a different operating point. That is the main empirical message of the paper.

Method	MMLU (↑)	Forget Acc (↓)	Target Recovery
Naive	26.7	26.1	44.6%
+ PCGrad	56.4	28.9	94.0%
+ SAGO	57.4	28.2	96.0%

Representative result on WMDP Bio with SimNPO+GD. SAGO recovers more of the original model's general performance while maintaining comparable forgetting strength.

The broader pattern is just as encouraging. On RWKU, SAGO preserves neighboring knowledge much better than stronger but less selective baselines. That matters because practical unlearning is rarely about erasing one isolated fact. It is about removing a target while keeping the surrounding knowledge graph intact.

RWKU Pareto frontier showing better retention with SAGO

More evidence: optimization dynamics and geometry

Loss dynamics during unlearning on WMDP Biosecurity

Method	Forget-Retain	Comb-Retain	Comb-Forget
GradDiff	-0.22	0.55	0.82
PCGrad	-0.22	0.76	0.52
SAGO	-0.15	0.87	0.48

Geometry analysis. SAGO yields the strongest alignment with the retain direction while keeping enough forgetting pressure to remain effective.

Why the Geometry Matters

One reason we like SAGO is that it is not just an empirical trick. Its behavior lines up with the core failure mode of unlearning. When forget and retain gradients disagree, the model is being asked to modify parameters that likely support general capability. Those are exactly the places where a careless update causes collateral damage.

SAGO addresses this directly. Instead of averaging away the conflict, it respects the hierarchy between tasks. That gives it a more faithful objective: forgetting should happen, but only where it is compatible with retention. In that sense, the method is both practical and conceptually clean.

Takeaway

The paper's main message is simple: LLM unlearning should be treated as an asymmetric optimization problem. Once we encode that asymmetry into gradient synthesis, the forgetting-retention trade-off becomes much easier to navigate.

SAGO is appealing because it stays lightweight. It can be plugged into existing forget-plus-retain pipelines, introduces a clear safety-first inductive bias, and consistently pushes the Pareto frontier in the right direction. For practitioners, that means better retention at comparable forgetting strength. For researchers, it suggests that update geometry is at least as important as loss design in unlearning.

Citation

@inproceedings{xiao2026sago,
  title={Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem},
  author={Xiao, Zeguan and Li, Siqing and Wang, Yong and Wei, Xuetao and Yang, Jian and Chen, Yun and Chen, Guanhua},
  booktitle={Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2026}
}