GAMBIT adopts a six-layer architecture design, with core functions of each layer as follows:
1. Ingestion and Preprocessing
Monitor APK uploads, compute hash values (SHA-256/MD5/SHA1) and fuzzy hashes (ssdeep/TLSH), extract metadata, perform pre-checks via VirusTotal, and build case objects. TLSH fuzzy hashing can identify near-identical samples (similarity ≥85% indicates possible association with the same threat actor).
2. Structural Classification
Convert APKs into 512×512 grayscale images, use a fine-tuned ResNet-50 CNN to classify known families; after unpacking DEX files, compute Smali SimHash and store in a Neo4j graph database to associate variants (similarity ≥85%).
3. GenAI Reverse Engineering
Unpack and decompile using apktool/jadx/Androguard, analyze CFG via a three-stage LLM prompt chain (code summarization → intent classification → narrative generation), combined with a bank permission combination classification method (e.g., READ_SMS + INTERNET marks OTP theft).
4. Deep Analysis
Static (manifest/API calls/string extraction) and dynamic (emulator monitoring of network/files/permissions) analysis run in parallel, merging feature vectors.
5. Behavioral Attribution and RAG Enhancement
Attribute threat actors via semantic matching of historical data in a vector database, and map to the MITRE ATT&CK framework.
6. Risk Assessment and Report Generation
Compute risk scores using XGBoost/LightGBM, generate reports containing executive summaries, MITRE mappings, IoCs, etc.