Zing Forum

Reading

Indian IPO Machine Learning Decision System: A Complete Practice from Probability Prediction to Portfolio Allocation

A machine learning-based investment decision system for Indian mainboard IPOs, which predicts the probability of new stock listing gains using a logistic regression model and combines backtest-validated allocation strategies to achieve an end-to-end automated investment decision process.

machine learningIPOinvestmentlogistic regressionbacktestingportfolio allocationIndia stock marketFastAPIStreamlit
Published 2026-06-05 19:46Recent activity 2026-06-05 19:49Estimated read 9 min
Indian IPO Machine Learning Decision System: A Complete Practice from Probability Prediction to Portfolio Allocation
1

Section 01

Indian IPO Machine Learning Decision System: Core Overview and Project Source

Title: Indian IPO Machine Learning Decision System: A Complete Practice from Probability Prediction to Portfolio Allocation

Abstract: A machine learning-based investment decision system for Indian mainboard IPOs, which predicts the probability of new stock listing gains using a logistic regression model and combines backtest-validated allocation strategies to achieve an end-to-end automated investment decision process.

Project Source Information:

2

Section 02

Project Background: Key Pain Points of Indian IPO Investment

The initial public offering (IPO) market in India is a high-volatility, high-return investment area. Some new stocks double on their listing day, some plummet by 30%, and most fall somewhere in between. For investors, the information available before the subscription deadline is very limited—mainly public signals such as subscription multiples, issue size, price range, and gray market premium (GMP).

When facing each new stock, investors need to answer two key questions:

  1. Is this new stock worth subscribing to?
  2. If there are multiple new stocks on the same day, how should funds be allocated?

Traditional practices rely on experience and intuition, but this approach is difficult to quantify risks and cannot systematically handle large amounts of IPO data. This project was born to solve this pain point—it builds an end-to-end machine learning decision system that not only predicts IPO listing performance but also provides fund allocation recommendations.

3

Section 03

Core Design and Data Engineering: From Problem Reframing to Multi-source Integration

The project initially tried to use a regression model to predict the exact listing gain of IPOs, but the model performance was poor (cross-validation R² close to 0) due to the right-skewed distribution of returns. The team reframed the problem as a classification task: predicting whether the gain exceeds 5% (considering practical factors such as transaction costs), and the AUC of the held-out test set reached 0.84.

Data sources include:

  • Chittorgarh: Subscription data, issue information, and listing data
  • Investorgain: Gray Market Premium (GMP) snapshots
  • NSE: Opening price on the listing day
  • BSE: IPO start/end dates and price range

The data covers Indian mainboard IPOs from 2006 to 2025 (720 stocks), and rapidfuzz was used to supplement missing fields for 407 IPOs. Features focus on public information at the end of the subscription window: subscription multiples by investor category, issue size, gray market premium, and company fundamental indicators (if applicable).

4

Section 04

Model Selection and Validation: The Victory of Logistic Regression and Its Performance

After comparing logistic regression, random forest, XGBoost, and LightGBM, the team chose logistic regression for the following reasons:

  1. Interpretability: Coefficients reflect the direction and strength of feature impact on probability
  2. Calibration: Output probabilities are well-calibrated
  3. Time series stability: Robust validation via TimeSeriesSplit(gap=30)

Model performance on the 2025 test set (108 IPOs):

  • Cumulative return: +16.4%
  • Daily average return: +0.20%
  • Win rate: 61.3%
  • Sharpe ratio: 0.43
5

Section 05

Decision Engine: From Probability to Fund Allocation Strategy

The model outputs the probability that an IPO's gain exceeds 5%. The decision rules are:

  1. Filtering: Only select IPOs with predicted probability exceeding the threshold t_min ≈ 0.41
  2. Allocation: Equal weight allocation for selected IPOs

The estimated subscription share for a single IPO is 1/max(1, NII subscription multiple) (the NII multiple reflects the popularity of the new stock).

Full backtest results (2017-2025, 444 IPOs):

  • Cumulative return: +242.5%
  • Daily average return: +0.35%
  • Win rate: 62%
  • Sharpe ratio: 0.33
6

Section 06

Engineering Implementation: From Offline Model to Online Service

The project provides a complete engineering implementation:

FastAPI Inference Service

A deployment-ready REST API that accepts IPO feature data, returns predicted probabilities and decision recommendations, including input validation, error handling, and performance optimization.

Streamlit Interactive Dashboard

A visualization interface based on precomputed backtest results:

  • Historical backtest performance display
  • Single IPO prediction tool
  • Portfolio simulator
  • Feature importance analysis

The dashboard has been deployed to Streamlit Cloud and is publicly accessible for experience.

7

Section 07

Project Insights and Limitations

Insights:

  1. Power of problem reframing: Switching from regression to classification improved model performance
  2. Business constraints drive design: The finding that 85% of days have only one IPO simplified the allocation strategy
  3. End-to-end thinking: Covers data acquisition, model training, and service deployment

Limitations:

  • Backtest assumes ideal execution conditions, without considering fund freezing or subscription rate fluctuations
  • The model relies on historical data; future market changes may affect performance
  • Due to Indian market characteristics, the model's applicability to other markets needs verification
8

Section 08

Summary: Project Value and Reference for Quantitative Investment

Nityunj Goel's IPO-ML decision system is an excellent machine learning application case, demonstrating how to decompose complex financial decision problems, validate intuition through data-driven methods, and transform research results into usable products.

For practitioners applying machine learning to quantitative investment, this project provides valuable references: not only code implementation, but also complete thinking on problem analysis, design decisions, and trade-offs.