Model Architecture
Byte Pair Encoding (BPE)
A common tokenisation method that splits text into frequently occurring character sequences.
Definition
Byte pair encoding is one of the most widely used methods for tokenising text before feeding it to a language model. It starts with individual characters and progressively merges the most frequently occurring pairs into single tokens. This produces a vocabulary that handles common words as single tokens while breaking unusual or rare words into smaller, manageable pieces. Most GPT-style models use BPE or a variation of it.
Related Terms
Heard enough terminology — ready to talk outcomes?
We translate AI concepts into measurable business results. No upfront fees — you pay only when independently verified results are delivered.
Disclaimer
This definition is provided for educational and informational purposes only. It represents a general explanation of a technical concept and does not constitute professional, technical, or investment advice. Artificial intelligence is a rapidly evolving field; terminology, techniques, and capabilities change frequently. Coaley Peak Ltd makes no warranty as to the accuracy, completeness, or currency of the information provided. Nothing on this page should be relied upon as the sole basis for commercial, technical, legal, or investment decisions without independent professional advice.
Document reference: ISO_webpage_knowledge-base_glossary_v1
Last modified: 29 March 2026
Knowledge Base·Model Architecture·Byte Pair Encoding (BPE)