Review:

Lancaster Oslo Bortun (lob) Corpus

overall review score: 4.2
score is between 0 and 5
The Lancaster-Oslo-Bergen (LOB) Corpus is a large, balanced collection of written British English texts, compiled for linguistic research and corpus linguistics. It consists of approximately 1 million words drawn from various genres such as fiction, non-fiction, news articles, and academic writing, aiming to provide a representative sample of contemporary British English usage.

Key Features

  • Approximate size of 1 million words
  • Balanced across different text genres and registers
  • Provides rich context for lexical and grammatical analysis
  • Annotated with metadata including genre and publication date
  • Designed primarily for linguistic research and language learning applications

Pros

  • Comprehensive and well-balanced corpus suitable for diverse linguistic studies
  • Includes detailed metadata facilitating nuanced analysis
  • Widely used and cited in academic research, ensuring reliability
  • Accessible for both researchers and students interested in British English

Cons

  • Limited to written language; lacks spoken or multimedia content
  • Size may be insufficient for training large-scale machine learning models compared to newer corpora
  • Contains texts only from a specific time period (early 1990s), possibly affecting modern relevance
  • Requires some linguistic expertise or tools for effective analysis

External Links

Related Items

Last updated: Thu, May 7, 2026, 10:35:40 AM UTC