Foundational Concepts
Corpus
The full collection of text used to train an AI model.
Definition
A corpus (plural: corpora) is the dataset an AI is trained on. For a large language model, the corpus might include vast swathes of the internet, digitised books, academic papers, code repositories, and curated high-quality text. The corpus defines the model's knowledge base — what it knows, what languages it speaks, and what biases it may carry. Curating a good corpus is one of the most important and least visible parts of building a capable AI.
Related Terms
Heard enough terminology — ready to talk outcomes?
We translate AI concepts into measurable business results. No upfront fees — you pay only when independently verified results are delivered.
Disclaimer
This definition is provided for educational and informational purposes only. It represents a general explanation of a technical concept and does not constitute professional, technical, or investment advice. Artificial intelligence is a rapidly evolving field; terminology, techniques, and capabilities change frequently. Coaley Peak Ltd makes no warranty as to the accuracy, completeness, or currency of the information provided. Nothing on this page should be relied upon as the sole basis for commercial, technical, legal, or investment decisions without independent professional advice.
Document reference: ISO_webpage_knowledge-base_glossary_v1
Last modified: 29 March 2026