Skip to content

Vaults & knowledge sources

A vault is a content source the runtime can read and search. Hive distinguishes two flavors:

  • Local vaults — a folder on this machine, possibly an Obsidian library. Each peer configures their own; not shared automatically.
  • Workspace-shared vaults — a pointer to an upstream source (GitHub today; more later). Every peer in the workspace sees the same source list and fetches independently with their own credentials.

Local sources

[[vaults]]
id = "alice-notes"
name = "My research notes"
kind = "folder"             # or "obsidian"
location = "/Users/alice/Notes"
index = true

Or via Settings → Vaults → Add → "Folder (local)" / "Obsidian (local)". The path is local to this peer; Bob doesn't see Alice's notes by joining the workspace.

Workspace-shared sources

Configure once; every peer in the workspace gets the source automatically via the event log.

In Settings → Vaults → Add → "GitHub (shared)":

  • Repo: owner/name slug.
  • Ref: branch, tag, or commit SHA. Defaults to main.
  • Paths: comma-separated globs to narrow the fetch (empty = whole repo).
  • PAT env var: optional. Without it, public repos work; private repos require it.

Each peer's Hive fetches independently using its own env-var PAT. Alice sets HIVE_GH_TOKEN to her PAT scoped to private repos; Bob has his own PAT scoped to whatever he can access. Neither token leaves the peer that owns it.

The fetched content lands at ~/Library/Caches/Hive/vaults/<vaultID>/ on each peer.

Local-only. Each peer configures their own path; not shared.

Same as Folder, scoped to an Obsidian library directory.

How shared sources sync

sequenceDiagram
    autonumber
    participant A as Alice (owner)
    participant LogA as Workspace event log
    participant B as Bob
    participant GH as GitHub
    A->>LogA: vaultSourceAdded { id, source: github(...) }
    LogA->>B: envelope arrives via P2P
    Note over B: Bob's session.vaults now contains the new entry
    B->>GH: fetch with Bob's PAT
    GH-->>B: tree + files
    Note over B: cache at ~/Library/Caches/Hive/vaults/<id>/

Only owners and admins emit vaultSourceAdded / vaultSourceRemoved events — same authz threshold as MCP server catalog changes. Contributors and viewers see the resulting source list but can't mutate it.

Authentication

Auth is per-peer. Hive never transmits tokens between peers. The workspace event records only the env-var name (not the value); each peer's runtime reads its own env at fetch time.

If a peer's env doesn't have the named token set:

  • Public content fetches succeed anonymously.
  • Private content fetches return 403; the file is missing from that peer's cache.
  • The Files pane surfaces "auth needed for these N files" rather than silently empty.

Caching & refresh

  • Cache location: ~/Library/Caches/Hive/vaults/<vaultID>/.
  • Refresh: on-demand. The cache warms automatically the first time a workspace is opened in a session, and you can re-fetch any vault from its card in Settings → Vaults. Periodic polling + webhook-driven refresh are on the roadmap.
  • Eviction: never automatic; clear the cache manually if you want to force a re-fetch:
rm -rf ~/Library/Caches/Hive/vaults/<vaultID>/

Indexing

Each peer indexes their own cached content for retrieval (semantic search, keyword match). The index lives at ~/Library/Caches/Hive/vault-indices/<vaultID>/ and is rebuilt when the cache changes.

Why per-peer indexing instead of shared:

  • Embeddings are tied to a specific model; if Alice uses text-embedding-3-large and Bob uses bge-large, their indices aren't interoperable.
  • Index size > content size in typical setups; we'd be syncing more bytes than we save.
  • Computing the index uses your own runtime quota.

The trade-off: indexing duplicates compute. We think that's the right call given the heterogeneity of LLM stacks.

Future source kinds

The VaultSource enum is designed to grow. Likely additions:

  • GitLab — same shape as GitHub, different API.
  • Notion — pages + databases.
  • Google Drive — folders with OAuth.
  • HTTPS — a single URL with optional bearer token.
  • S3 / R2 — bucket + prefix.

If you need a source that isn't supported yet:

  1. Write a small fetcher conforming to VaultFetcher in hive-runtime.
  2. Add the case to VaultSource with its own associated config struct.
  3. The Settings UI gains a new source-tag option automatically when you extend VaultSourceTag.

Or: route the source through an MCP server that exposes vault-style tools (read_vault_file, search_vault). MCP works today without adding a new source kind.

On-disk format

The persisted vault entry uses a discriminated source field:

{
  "id": "alice-notes",
  "name": "My research notes",
  "source": { "tag": "folder", "path": "/Users/alice/Notes" },
  "indexed": true
}

A hand-written kind + path shape from an older config also decodes transparently — useful when migrating an existing hive.config.toml by hand. The encoder always writes the discriminated form, so any save normalizes the entry.