Zing Forum

Reading

Using Temporal Fusion Transformer to Predict Multi-Category Sales at Gas Stations: A Practical Analysis of the Tatneft Project

This article provides an in-depth analysis of a gas station sales prediction project based on the Temporal Fusion Transformer (TFT), covering multi-target time series modeling, feature engineering, data preprocessing workflows, and a complete model training and evaluation system.

Temporal Fusion Transformer时间序列预测深度学习加油站销售预测PyTorch多目标预测特征工程
Published 2026-05-17 23:45Recent activity 2026-05-17 23:52Estimated read 12 min
Using Temporal Fusion Transformer to Predict Multi-Category Sales at Gas Stations: A Practical Analysis of the Tatneft Project
1

Section 01

[Introduction] Practical Application of Temporal Fusion Transformer in Multi-Category Sales Prediction at Gas Stations (Analysis of the Tatneft Project)

This article provides an in-depth analysis of the gas station sales prediction project by Russian energy giant Tatneft, which is based on the Temporal Fusion Transformer (TFT). The project aims to solve the complex multi-target time series problem of simultaneously predicting the sales volume of 7 fuel types and 5 categories of convenience store products over the next 24 hours. It covers the entire workflow including feature engineering, data preprocessing, model training, and evaluation, demonstrating the advantages of TFT in handling interactions between static features, known future information, and historical observation data, and providing support for inventory management and supply chain optimization in the retail energy industry.

2

Section 02

Project Background and Business Challenges

Project Background and Business Challenges

In the retail energy industry, accurately predicting gas station sales volume is crucial for inventory management, supply chain optimization, and profit maximization. Russian energy giant Tatneft faces a complex prediction challenge: it needs to simultaneously predict the sales volume of seven fuel types and five categories of convenience store products over the next 24 hours. This multi-target time series prediction problem involves many dynamic factors, including weather conditions, traffic flow, promotional activities, competitor pricing, and holiday effects. Traditional statistical prediction methods struggle to capture the complex interactions between so many variables, while single deep learning models often fail to handle static features, known future information, and historical observation data at the same time. This is exactly the scenario where the Temporal Fusion Transformer (TFT) demonstrates its unique advantages.

3

Section 03

Analysis of the Temporal Fusion Transformer Model Architecture

Analysis of the Temporal Fusion Transformer Model Architecture

The TFT was proposed by researchers at Google DeepMind in 2020 and is specifically designed for multi-variable time series prediction. Unlike traditional time series models, TFT can integrate three types of input features simultaneously:

Static features describe the invariant attributes of entities. In this project, there are 30 variables including the road type of the gas station, the number of service facilities, the number of oil tanks, and the area of the convenience store. These features help the model understand the unique operating environment of each gas station.

Known future features include variables that can be known in advance along the time axis. For example, there are 32 variables such as the hour of the day, day of the week, whether it is a holiday, fuel prices, promotional activities, and advertising channels. This type of feature allows the model to use known future planning information.

Historical observation features are variables that can only rely on past observations, including 31 variables such as weather conditions, traffic flow, competitor prices, and historical sales revenue. The model uses an attention mechanism to learn the impact of these historical patterns on future predictions.

The core innovation of TFT lies in its multi-head attention mechanism, which can automatically learn the correlations between different time steps and different variables, while generating interpretable attention weights for each prediction.

4

Section 04

Dataset and Feature Engineering

Dataset and Feature Engineering

The project used hourly operational data from 5 gas stations under Tatneft for the entire year of 2023, totaling 43,800 records. The original data is divided into two files: site metadata contains the static attributes of each gas station, and operational data includes time series features and sales records.

Several key techniques were used in the feature engineering phase:

Missing value handling uses domain-related filling strategies. For example, when the holiday name is missing, it is filled with "No Holiday"; when the advertising channel is missing, it is marked as "No Advertising". This not only preserves information integrity but also avoids time series breaks caused by data deletion.

Outlier handling uses the Winsorization method based on interquartile range. Considering the continuity of the time series, the project chose to truncate rather than delete outliers to ensure the integrity of time steps.

Cyclical time encoding converts periodic variables such as hour, week, and month into sine and cosine combinations, enabling the model to understand the cyclical nature of time. For example, 23:00 and 00:00 are close in physical time, and their vector distance after encoding will also be small.

Target variable transformation applies log1p transformation to all 12 prediction targets (7 fuel types and 5 product categories), converting the right-skewed sales distribution into a distribution closer to normal, which helps stabilize the training of the neural network.

Standardization performs Z-score standardization for each gas station separately, eliminating dimensional differences between different stations while preserving the relative change patterns within each station.

5

Section 05

Model Training and Evaluation

Model Training and Evaluation

The project uses the PyTorch Forecasting library to implement the TFT model and the Lightning framework for distributed training. In the model configuration, the lookback window is set to 168 hours (7 days), and the prediction horizon is 24 hours.

During training, TensorBoard monitors the loss curve and validation metrics in real time. The model optimizes Quantile Loss on the validation set, so that the prediction results include uncertainty intervals.

In the evaluation phase, the project not only focuses on traditional point prediction errors but also pays more attention to the prediction reliability in business scenarios. By comparing the prediction results under different configurations, the final model shows good generalization ability on the test set and can capture the impact of external shocks such as holiday promotions and weather changes on sales.

6

Section 06

Business Value and Application Prospects

Business Value and Application Prospects

This prediction system has brought significant business value to Tatneft. Accurate sales predictions allow gas stations to optimize inventory levels and reduce the risk of fuel shortages and excess inventory. At the same time, the precise grasp of the demand for convenience store products helps to formulate procurement plans and promotional strategies.

From a broader perspective, this project demonstrates the application potential of deep learning in traditional industries. The interpretability of the TFT model allows business personnel to understand the driving factors behind the predictions, enhancing trust in AI decisions. With data accumulation and technological iteration, such prediction systems will become core capabilities for the digital transformation of the energy retail industry.

7

Section 07

Project Summary and Insights

Conclusion

The Tatneft TFT analysis project is a typical case of industrial-grade time series prediction. It not only demonstrates the advantages of the Temporal Fusion Transformer in handling complex multi-variable prediction problems but also provides a complete practical reference from data engineering to model deployment. For teams that want to apply deep learning to business prediction, this open-source project is undoubtedly a valuable learning resource.