Section 01
[Main Floor/Introduction] PALUTE: In-Memory Lookup Table-Based Edge LLM Inference Accelerator
PALUTE is an in-memory computing accelerator designed for edge large language model (LLM) inference. Its core innovation lies in using monolithic 3D DRAM (M3D DRAM) to enable in-memory lookup table (LUT) queries. It achieves a throughput of 1264 TPS at 0.16W power consumption and 12.8x higher energy efficiency than existing solutions, offering an efficient solution for deploying LLMs on edge devices.
Original authors: arXiv authors | Source: arXiv (2026-06-08) | Paper link: http://arxiv.org/abs/2606.08891v1