Controlled Vocabulary

A controlled vocabulary is a curated, managed set of terms used consistently across a system. Libraries and information systems have used them for decades because the alternative — letting each author invent terms — produces unusable archives. Most PKM vaults drift into the ungoverned-vocabulary state within a year of use; a controlled vocabulary is the simplest and most neglected fix.

What It Is

The core idea is simple: a controlled vocabulary defines the allowed terms for a given slot, documents what each term means, specifies preferred terms when alternatives exist, and manages changes over time. A library catalog uses controlled vocabularies for subjects, authors, and locations. A website's CMS uses them for content types, categories, and tags. A PKM system that wants reliable queries and consistent navigation needs them for tags and types.

A controlled vocabulary typically includes:

  • Preferred terms — the canonical form that should be used
  • Variants and synonyms — terms that mean the same thing but should point to the preferred form
  • Scope notes — explicit definitions of what the term covers and what it excludes
  • Relationships — broader, narrower, and related terms in a light hierarchy
  • Change history — how the vocabulary has evolved

The PKM Tag Problem

Without a controlled vocabulary, tags proliferate and drift. A vault accumulates #pkm, #PKM, #personal-knowledge-management, #knowledge-management, #km, #second-brain, and #note-taking, all partially overlapping, all used inconsistently. Searches miss content that is semantically present because it was tagged with a variant. Queries return noise because tags cover different scopes than the query assumed. Over time, the tag set becomes a liability rather than an asset.

The fix is to treat tags as a controlled vocabulary: curated, documented, governed. This is tedious work, but it is the single highest-leverage investment most mature vaults can make.

Controlled Vocabulary vs Folksonomy

The free-tagging approach — everyone invents tags as needed, the system tolerates any tag — is called a folksonomy. It is fast, low-friction, and captures emergent terminology. It is also what produces the mess described above. The trade-off is:

  • Folksonomy: low friction, high recall for novel concepts, high noise, unreliable query behavior
  • Controlled vocabulary: higher friction, lower novelty capture, low noise, reliable query behavior

Most serious PKM practices end up somewhere in the middle: a controlled core vocabulary for frequently-used tags and types, with an explicit "capture zone" (like a WIP folder or an #untriaged tag) where free-form capture is allowed and periodically migrated into the controlled vocabulary.

Structure of a PKM Vocabulary

A practical vocabulary for a PKM vault has several namespaces.

  • Domain tags — what the note is about topically (domain/ai, domain/pkm, domain/writing)
  • Type tags — what kind of note it is, if not handled by note type (type/idea, type/literature)
  • Status tags — where in a workflow the note sits (status/wip, status/ready, status/archived)
  • Zone tags — the top-level organizational zone (zone/meta, zone/actions)
  • Attribute tags — intrinsic properties (rating/5, public, private)

Namespacing (the / separator) makes the vocabulary both human-navigable and machine-queryable. Each namespace should have documented scope and an enumerated value list.

Managing Change

Vocabularies evolve. New concepts enter; old ones fall out of use. Managing change is where most PKM vocabularies fail. Three practices help.

Additions go through a check: before adding a tag, search for near-synonyms. Often the "new" concept is already covered by an existing term. This is the single most effective discipline.

Deprecations are recorded: when a tag is retired or renamed, the vocabulary lists the old form and its replacement. Old notes can be migrated in batches; the deprecation record prevents the old tag from being reinvented later.

Periodic audits: quarterly or annually, review the tag set. Merge near-duplicates. Retire unused tags. Update scope notes to reflect actual usage. This is the equivalent of refactoring for vocabulary.

See Tagging and Metadata and PKM Anti-Patterns for related discipline.

Vocabulary as Schema

A controlled vocabulary is the tag-and-tag-value equivalent of a schema. It makes the implicit explicit. Once written down, it becomes a reference for:

  • New note authors (including future-you) deciding how to tag
  • Automated tools validating compliance
  • LLM agents understanding the vault's structure
  • External observers understanding what the vault covers

Without a written vocabulary, this knowledge lives only in the owner's head and drifts as quickly as memory does.

Tooling Support

Several PKM-tool features support controlled vocabularies.

  • Tag autocomplete surfaces existing tags, discouraging invention
  • Tag indexes or pages document scope and relationships
  • Linter rules catch non-vocabulary tags and flag them for review
  • Renaming tools migrate old forms to new ones safely
  • Query validation catches tags that no note uses (possibly misspelled or invented)

The tooling does not enforce vocabulary by itself, but it makes vocabulary discipline much easier when the practitioner commits to it.

The Payoff

A vault with a controlled vocabulary supports reliable querying, cleaner navigation, better LLM reasoning, and easier migration when the system changes. The cost is a small amount of upfront design and periodic maintenance. The payoff compounds over years — an ungoverned tag set becomes quadratically harder to fix as it grows, while a controlled vocabulary stays manageable indefinitely.

Key Points

  • Controlled vocabulary: curated, documented, governed set of terms used consistently across a system
  • Without it, tags proliferate and drift into noise — #pkm / #PKM / #personal-knowledge-management all meaning the same thing
  • Components: preferred terms, synonyms/variants, scope notes, relationships, change history
  • Folksonomy (free tags) is faster but noisier; most mature vaults compromise with a controlled core plus a capture zone for new terms
  • Practical PKM vocabularies use namespaces: domain, type, status, zone, attribute
  • Change management matters: check before adding, record deprecations, audit periodically
  • Tooling (autocomplete, indexes, linters) makes vocabulary discipline easier but does not replace commitment
  • Payoff compounds: controlled vocabularies stay manageable as the vault grows; ungoverned ones become quadratically harder to fix

Open Questions

  • What is the right cadence and depth for vocabulary audits?
  • Can LLM agents help maintain vocabularies — detecting near-synonyms, proposing consolidations, migrating old tags?
  • How do you balance vocabulary stability (useful queries) with vocabulary evolution (emergent concepts)?

References

  • ANSI/NISO Z39.19 — "Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies"
  • Library of Congress Subject Headings — the largest real-world controlled vocabulary
  • Hedden, H. (2016). The Accidental Taxonomist
  • Schema.org — structured vocabulary for web entities