Skip to content

Overview

crate API (v2 — cluster-first)

crate is the aggregation gateway for a fleet of music-data producers. You ask it about an artist, a label, or a festival; it joins every signal the fleet can see about that entity and hands you back one composed picture — collector behavior, editorial coverage, live-circuit demand, breakout momentum, web presence — each piece labelled with where it came from and how fresh it is.

Two things make crate different from a typical catalogue API, and the rest of this doc exists to teach them:

  1. Identity is a cluster_id, not a Discogs/MusicBrainz/Bandcamp id. The same artist scattered across those platforms collapses to one canonical key. You resolve to it once, then key everything off it.
  2. An empty answer is a first-class answer. crate returns HTTP 200 and tells you plainly what it can’t see (present:false, a null field, state:"honest_gap") instead of 404-ing or fabricating data. Only 4xx/5xx are errors.

v2 is cluster-first: the agent (artist / label / festival) is the prime resource, and sources (Bandcamp, Discogs) and releases attach to it as dimensions rather than being top-level nouns. If you do nothing else, internalize the cold-start recipe: resolve a name or pasted link → cluster_id → dossier. GET /api/v2 returns this recipe live, and GET /api/v2/resolve?q=<name> is the front door.

Authentication

The API is keyed: every operation requires an X-API-Key header (ck_(live|test)_<…> format) except the two public front-door endpoints that opt out — GET /api/v2 (the root index) and GET /api/v2/openapi.json (this spec). One family of endpoints uses a different credential entirely — see beacons below.

Concepts

cluster_id — the canonical artist identity

cluster_id is crate’s prime key for an artist: a pe-norm-v1 hex string derived from the artist’s name, designed so that the same artist’s Discogs page, MusicBrainz entry, and Bandcamp profile all collapse to one cluster_id. Key all artist data off it.

  • It is an opaque string — pass it through verbatim. Never numericize it, parse it, or assume structure.
  • cluster_id: null is an honest gap, not an error: crate couldn’t resolve a canonical identity for that lookup. Roughly half of the long-tail booking artists have neither a Discogs id nor an MBID, so null is normal.
  • You get one by calling GET /api/v2/resolve (from a name, a pasted link, or a foreign id). You then address the artist dossier directly as GET /api/v2/artist/{key}, where {key} is the 64-hex cluster_id or a human slug.
  • A 64-hex key resolves identity directly from the cluster — it deliberately skips the Discogs lookup so a hex address never silently re-anchors onto a same-name Discogs row. Foreign locators (discogs:<id>, mbid:<uuid>) are not canonical addresses: convert them via /resolve first, or /artist/{key} returns 400 (the response’s next field is a ready-to-call /resolve URL).

dossier · grains — the composed per-entity picture

A dossier is the full picture crate composes for one entity by joining every fleet signal. v2 has three agent grains:

grainaddressed bywhat it is
artistGET /api/v2/artist/{key} or GET /api/v2/dossier/artist/{slug}identity + collector behavior + editorial + emergence + live presence + web + compositions + discography + bandcamp dimensions…
labelGET /api/v2/label/{key} or GET /api/v2/dossier/label/{slug}label identity + sublabel→parent lineage + collector behavior
festivalGET /api/v2/dossier/festival/{slug}de-fragmented festival identity + consolidated editions ⋈ lineup

Releases are not a top-level grain in v2. In the cluster-first model a release (a Discogs release-group, what a consumer calls a “release”) attaches to the artist as the dossier’s discography dimension, keyed by the artist’s cluster_id — there is no standalone release resource. Likewise Bandcamp is a dimension of the artist dossier (bandcamp_emergence / bandcamp_tastemaker), not a top-level surface.

Every dossier facet carries a classified state (e.g. present, empty/absent, honest_gap) and the dossier ships a provenance manifest — an array where each entry names the producer, sourceTable, refreshCadence, tier, and honestGapState for a field. Read GET /api/v2/dossier/manifest (the data dictionary) to discover the entire field surface across grains — including grains that are deliberately unavailable (e.g. song, because the fleet has no track key; and the demoted master grain, whose detail now lives in the artist’s discography) — without hitting every entity endpoint.

honest gap — empty is 200, not 404

This is crate’s defining principle: crate shows what it can see and is explicit about what it can’t, rather than 404-ing or faking data. An unresolved or empty lookup returns HTTP 200 with one of:

  • present: false (e.g. a dossier dimension with no match),
  • a null field (cluster_id: null, identity: null),
  • state: "honest_gap" on a dossier facet.

Branch on the body, not the status, for “did I get data?”. An unresolved artist slug returns identity:null at 200 — never 404. Reserve your error handling for genuine 4xx/5xx.

resolved_via · resolved_from — binding tier and match method

When crate resolves an identity it tells you two orthogonal things:

  • resolved_via = the binding tier, i.e. how trustworthy the identity is:
    • 'discogs' — canonical, Discogs-bound (verified).
    • 'cluster'observed/unverified: the identity came from the booking graph with no Discogs bind. Surface it flagged as unverified, never as canonical.
    • null — did not resolve.
  • resolved_from = how you addressed it on /resolve: 'url' (a pasted link), 'name', or 'locator' (a foreign id). matched_on names the surface that matched; note explains a recognized-but-unresolved link (e.g. a Twitter URL crate recognizes but does not yet cross-reference).

A 64-hex cluster_id address always yields resolved_via: 'cluster' (observed tier) by design.

the cube · cube_quadrant — the behavioral-signal model

The cube is crate’s behavioral model of a release: a 3-bit code (string, e.g. "101") placing it on three independent behavioral axes — who OWNS it (collector), who PLAYS it (DJ), who WRITES about it (critic). Each bit is 0/1, so the eight quadrants run "000" (“No signal”) through "111" (“Full intersection”): "100" = collector-only, "010" = DJ-only, "101" = collector + critic, and so on. The collector-vs-DJ split — who owns it vs who plays it — is the heart of the model. cube_quadrant: null means it isn’t yet classified (an honest gap). You meet cube_quadrant (with companion owner_count / dj_count / critic_count magnitudes + a link_to_cube explorer deep-link) on the master-grain result rows of GET /api/v2/search.

tastemakers · breakouts — discovery surfaces

Two read-only discovery surfaces, served from offline-published snapshots (no DB checkout) and fail-soft: each returns a state of present / empty / degraded, where degraded is a 200 honest-gap (a read failure never 500s), and stale:true flags a snapshot older than 7 days.

  • tastemakers (GET /api/v2/tastemakers, …/ones-to-watch) — influential curators and the richest artist-grain analytics crate has: rank, own-tier, brokerage score, corroborating axes, lead-times, Bandcamp demand. ?limit= bounds each array (1..200).
  • breakouts (GET /api/v2/breakouts) — emerging artists on the rise (“ones to watch”): booking-momentum signal cross-validated against press. ?tier=breakout|rising and ?corroboration=corroborated|booking_ahead filter it; ?limit= is clamped to 200.

beacons — search-event telemetry (different credential)

Beacons are client-side telemetry about search behavior: POST /api/v2/search-events/observed (a result was served from cache) and …/refined (the user changed facets). They are not authenticated with your X-API-Key. Each search response issues a short-lived per-search JWT bound to one search_event_id; send it as Authorization: Bearer <token>, and the token must match the body’s search_event_id. Bodies are capped at 512 bytes and beacons are idempotent (a duplicate is a 204 no-op). Beacon 400s carry a Zod flattened details object (not the array shape of normal validation errors), so they use a distinct error schema.

sparse fieldsets — ?fields= (opt-out trim)

The artist dossier is default-rich: GET /api/v2/artist/{key} returns the whole dossier in one round-trip. If you want less, ?fields=identity,discography trims to the named top-level facets (the envelope is always present). An unknown field returns 400 invalid_fields with the exact valid set and a copy-pasteable example — so you never guess.

In v2 Bandcamp is an analytical dimension of the artist dossier, not a release surface. Two facets carry it, both keyed by the artist’s cluster_id and both signals/metrics (not per-release listings):

  • bandcamp_emergence — purchase-backed demand signals (emergence class, demand lead/ratio, owner reach, wishlist demand, distinct releases, earliest-wished date).
  • bandcamp_tastemaker — early-supporter quality scores (supporter-cohort size, aesthetic-quality scores, mean first-buyer earliness).

Per-release Bandcamp listings — and the bandcamp_item_id / track_url per-release fields — are not a v2 surface; they belong to the v1 Bandcamp endpoints. (The discography dimension is the Discogs release-group attachment, keyed by discogsMasterId — distinct from Bandcamp.)

crate is link-only everywhere it does carry links: every artwork item is a hotlink url with rehost:false — crate never fetches or re-hosts bytes (a Cover Art Archive URL is best-effort and may 404 if no cover exists), and crate never stores Bandcamp audio streams (tokenized, expiring, out of ToS bounds).

Opaque ids — the universal rule

cluster_id is always a string, always opaque. Round-trip it verbatim; do not parse, increment, numericize, or infer structure from it. (The same discipline applies to any id crate hands you.) This keeps your client correct across id-scheme changes.

Versioning

The API major version lives in the URL path (/api/v2). The spec’s info.version (currently 2.0.0) bumps on every spec change and is drift-guarded: the document is generated from code, so docs cannot drift from runtime behavior. Operations carry stable operationIds for codegen, and keyed 2xx responses declare X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers (back off on 429 using retry_after_seconds / Retry-After). v2 is the cluster-first stable major; the frozen v1 predecessor remains available during a time-boxed, announced deprecation — the route-by-route migration map is at /docs/migration/v1-to-v2.

Information

  • License: Commercial — see Terms of Service
  • OpenAPI version: 3.1.0

Customer API key in ck_(live|test)_<32-base62> format

Security scheme type: apiKey

Header parameter name: X-API-Key

Short-lived per-search beacon JWT (issued with the search response), sent as Authorization: Bearer <token>. Distinct from the X-API-Key customer key; the token is bound to a single search_event_id.

Security scheme type: http

Bearer format: JWT