Reading

TorchJD: Solving Gradient Conflict in Multi-Task Learning with Jacobian Descent

TorchJD is a PyTorch extension library that implements the Jacobian Descent algorithm, specifically designed to resolve gradient conflicts between multiple loss functions in multi-task learning.

PyTorch多任务学习Jacobian下降梯度聚合机器学习神经网络优化

Published 2026-05-21 03:45Recent activity 2026-05-21 03:49Estimated read 5 min

TorchJD: Solving Gradient Conflict in Multi-Task Learning with Jacobian Descent

Section 01

[Main Floor] TorchJD: A PyTorch Extension Library for Resolving Gradient Conflicts in Multi-Task Learning

TorchJD is a PyTorch extension library that implements the Jacobian Descent algorithm, specifically designed to resolve gradient conflicts between multiple loss functions in multi-task learning. This article will introduce its background, core methods, usage, application scenarios, etc., to help readers understand the value and significance of this tool.

Section 02

[Background] The Dilemma of Gradient Conflicts in Multi-Task Learning

In the field of deep learning, multi-task learning allows a single neural network to handle multiple related tasks simultaneously. However, loss functions of different tasks often produce conflicting gradient directions. When the inner product of two gradients is negative, simple averaging leads to performance degradation of one task. For example, in visual models, classification tasks tend to focus on global features while localization tasks emphasize local details. When their gradient directions are opposite, traditional methods fail to balance both, and the model easily falls into a suboptimal solution.

Section 03

[Method] Jacobian Descent: A New Paradigm for Multi-Task Optimization

The Jacobian Descent algorithm introduced by TorchJD changes the paradigm of multi-task optimization. Unlike traditional gradient descent which handles scalar losses, it directly operates on the Jacobian matrix corresponding to the loss vector—each row of the matrix represents the gradient of a loss function with respect to the model parameters. By analyzing the matrix structure to identify gradient conflicts and adopting smarter aggregation strategies, it is more mathematically rigorous and practically effective.

Section 04

[Core Mechanism] Conflict-Free Gradient Projection Ensures Task Performance

TorchJD provides more than 10 gradient aggregators. The core idea of the representative UPGrad (Conflict-Free Gradient Projection) is: before aggregation, project each gradient onto the dual cone (a direction that includes non-negative inner products with the original gradient). This projection ensures that with a sufficiently small learning rate, each parameter update is beneficial to all tasks and does not harm the performance of any task.

Section 05

[Practical Usage] Seamless Integration of TorchJD with PyTorch

TorchJD aligns with PyTorch users' habits. You only need to replace the traditional loss.backward() with torchjd.autojac.backward(losses), then use jac_to_grad() to convert the Jacobian to a gradient. The mtl_backward() function for multi-task scenarios calculates the loss gradients for task-specific parameters and the Jacobian matrix for shared parameters separately, preserving independent optimization while resolving conflicts in shared parameters.

Section 06

[Application Scenarios] Wide Applicability and Extensibility of TorchJD

TorchJD not only supports traditional multi-task learning but also applies to instance-level risk minimization (such as personalized recommendation and federated learning). The torchjd.autojac.jac() function in the library allows calculating the Jacobian directly without storing the .jac field, providing a foundation for complex custom algorithms.

Section 07

[Conclusion] TorchJD Opens New Possibilities for Neural Network Optimization

TorchJD brings an important tool to the PyTorch ecosystem. It not only solves the gradient conflict problem in multi-task learning but also expands the possibilities of neural network optimization theory. For deep learning practitioners seeking multi-objective balance, this is a library worth exploring in depth.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54