Section 01
ConvexTok: Introduction to the New Method for Tokenizer Construction Based on Convex Optimization
This article introduces ConvexTok, a new method for constructing tokenizers using convex optimization instead of greedy algorithms. Compared to locally optimal algorithms like BPE and Unigram, ConvexTok formulates tokenizer construction as a linear programming problem, which can be proven to be close to the global optimum and achieves improvements in multiple metrics.