Reading

Running Large Language Models Inside SAP S/4HANA: A Pure ABAP-Implemented LLM Inference Engine

abap-llm-engine is a groundbreaking project that enables direct execution of large language models (LLMs) inside SAP S/4HANA systems. Built entirely with pure ABAP code, this project features a complete Transformer inference engine without external dependencies like Python, llama.cpp, or ONNX, bringing native AI capabilities to traditional enterprise ERP systems.

SAPABAPLLMTransformer企业级AIS/4HANA本地推理HANA加速

Published 2026-04-04 06:09Recent activity 2026-04-04 06:18Estimated read 6 min

Running Large Language Models Inside SAP S/4HANA: A Pure ABAP-Implemented LLM Inference Engine

Section 01

【Introduction】Pure ABAP-Implemented LLM Inference Engine: Running Large Language Models Inside SAP S/4HANA

abap-llm-engine is the world's first LLM inference engine fully implemented based on ABAP, allowing direct execution of large language models (e.g., SmolLM2-135M) inside SAP S/4HANA systems. Without external dependencies like Python, llama.cpp, or ONNX, this project leverages HANA database acceleration capabilities to bring native AI to traditional enterprise ERP systems.

Section 02

Project Background: Pain Points of AI Integration in Traditional SAP Systems

Traditional SAP systems lack native AI capabilities. External LLM deployment solutions often rely on Python environments, third-party inference libraries, or external API calls, leading to issues like data security risks, high integration complexity, and incompatibility with air-gapped environments. The abap-llm-engine project aims to address these pain points by natively embedding AI capabilities inside SAP systems.

Section 03

Technical Architecture and Core Components

The project uses a modular class structure design, with core components including:

ZCL_LLM_ENGINE: Inference process coordinator
ZCL_LLM_BPE_TOKENIZER: BPE tokenizer
ZCL_LLM_TENSOR: Tensor operation class
ZCL_LLM_TRANSFORMER_BLOCK: Transformer layer implementation (including RMS normalization, ROPE positional encoding, grouped query attention, etc.)
ZCL_LLM_HANA_ACCEL: HANA acceleration module

Model specifications: 135 million parameters, Llama architecture (30 layers, 576 hidden dimensions, 9 attention heads), ~250MB memory usage after INT8 quantization, 8192-token context window.

Section 04

Operation Modes and Performance

The project offers three operation modes:

Pure ABAP Mode: Speed of 5-30 seconds per token, no special configuration needed, suitable for development testing or scenarios without HANA acceleration.
HANA AMDP Acceleration Mode: Speed of 0.5-3 seconds per token, offloads matrix operations to HANA parallel engine for significant speed improvement.
Shared Memory Mode: 30% faster than the basic mode; weight sharing reduces memory usage and enhances concurrency.

Section 05

Deployment and Integration Advantages

This project features zero external dependencies and native SAP integration:

No need for Python, third-party libraries, or external APIs
Direct access to SAP data dictionary (e.g., DD03L) to ensure accurate generated content
Deployable via SAP transport requests for convenient version management
Supports air-gapped environments with no need for external networks
Sub-second inference response achievable with HANA acceleration.

Section 06

Technical Challenges and Solutions

The project overcame several technical challenges:

ABAP Matrix Operations: Designed tensor operation classes to implement efficient matrix multiplication, activation functions, etc.
Memory Management: Controlled model memory to around 250MB via INT8 quantization and shared memory mechanisms.
HANA Acceleration: Used AMDP to offload compute-intensive operations to HANA, achieving parallel acceleration effects similar to GPUs.

Section 07

Application Scenarios and Summary

Application Scenarios: Intelligent report generation, ABAP code assistant development, business process optimization, natural language data querying, automatic training document generation, etc.

Summary: abap-llm-engine represents an important direction for enterprise AI integration—natively embedding into existing business systems without architecture refactoring. For SAP enterprises, it enables access to cutting-edge AI capabilities within the familiar ABAP environment, providing a feasible path for enterprise AI transformation.

Project URL: https://github.com/cadiraca/abap-llm-engine License: Apache 2.0

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15