¡truÍSMO! the idiom truthiness machine
In honor of my father, Dr. Glen R. Elliott, PhD MD. A man who has quietly, over 5 decades, helped author at least 105 manuscripts in his whisper quiet, but prolific, disruptiveinnovation creating large and enduring change in an industry/field, and prestigious career. all while being a loving, nurturing, and hilarious husband and father, intermittent pastor, casual software developer (RIP Farkle), OG gamer, president of a non-profit summer camp for over 25 years, amongst countless other things I can't think of right now.
He also quietly and [by all reports] adroitly, managed to collaborate to create the Multimodal Treatment of ADHD study, which remains, to my knowledge, the most rigorous and longitudinal interventional study in mental health to date.
and, he's always loved idioms, and you know.. apples and trees and stuff.
so, I made this weird thing. in tribute to him. as a token of my genuine love and profound gratitude for the father you've been to me.
I think I'm only just beginning to understand precisely how lucky I've truly been.
thank you.
mark
how it works
When a saying arises from multiple cultures, there are (so far) three possible reasons: the cultures inherited it from one ancestor, they copied it from each other, or they each arrived at it alone. Only the third is wonder; the first two are .. no wonder. humans are wonderful meme spreaders, and it's easy to mistake copying for ingenuity. Echo is the instrument that hopes to tell them apart. The machine gathers truisms, superstitions, and proverbs [idioms of any kind, really], across languages, verifies every attestation against real sources with an adversarial skeptic on every claim, dates the parts, and labels each lineage: inherited, borrowed, contested, or independent. Then it collapses everything that is apparently descendent, gates everything unproven, and counts what survives. What survives is the point: Confucius and Hillel arriving at the same rule before contact, four unrelated cultures hammering down the nail that sticks out. The recurrence that outlasts the audit is evidence of a kind of truth, a truthy, kind of truth. Each phrase/idiom gets a composite 'truthiness' score. our beautiful machine's vague estimate of just how much truth there is in an idea.
the machine, stage by stage
THE truÍSMO MACHINE: stage by stage
STAGE 0 — Intake
You drop raw text anywhere (chat, dropbox/idioms.txt — markdown bold, numbering, typos, all fine). A deterministic script parses it: strips formatting, normalizes (Unicode NFKD, stopword-stripped fuzzy matching), and dedupes against everything already in the pipeline so re-drops cost nothing (your 175-line drop → 42 known, 119 new). Every item keeps its source line number forever — provenance starts at the moment of donation. Output: intake/drop_*_chunks.json.
STAGE 1 — Triage (classification fleet)
A parallel fleet (your drop took 9 agents: 7 truism batches + 1 superstition specialist + 1 synthesizer) classifies each item from knowledge — not research; that's deliberate, triage is cheap and gates are named for later. Each item gets:
- track: image (concrete picture) / proposition (imageless claim) / superstition (ritual-luck complex) / form-class (interesting as a form, like tautologies)
- tier: 1 crown candidate / 2 stock / 3 descent-trap (known-author, teaching shelf) / 4 park
- provenance smell: lineage / convergence-candidate / modern-coinage / known-author (named)
- relations: counter_saying_of, antithetical_pair_with, inversion_of, family
- confidence, 0-100, on everything
Then the synthesis agent reads all batches at once: merges family-name drift, builds the relation graph, ranks the top candidates, and — critically — catches the triage's own inconsistencies (it found a duplicate item triaged twice, a tier-policy drift, a track misassignment). I rule on each flagged inconsistency by hand. Output: intake/triage_*_batchD.md + machine-readable items.
STAGE 2 — Curation (research fleets, the anti-confabulation core)
Tier-1 items get researcher agents under the HONESTY header, the non-negotiables:
- Never invent a quotation, foreign phrase, date, or source. "Could not verify" is a valuable answer.
- Verified negatives are findings (Pliny's missing salt omen is a result).
- Every claim carries confidence; every source is tier-ranked (critical edition > major reference > Wikipedia-marked-tertiary > forums-marked-low).
- WebSearch/WebFetch mandatory; where memory and source disagree, the source wins, and say so.
- Group by the IMAGE each tongue reaches for, never flat by language.
Every researcher is paired with an adversarial checker who re-fetches the 4-6 most load-bearing attestations in their native script. Unfindable foreign phrases = fabrication = fail. This layer has caught five fabrications so far (spliced Chinese couplets, an invented "American" attribution, a provenance-chain misattribution) — each repaired with the checker's source-confirmed correction, flagged in-file, never silently. Output: truismo/seeds/*.json + spec/motif_source_*.md.
STAGE 3 — Gates (the lineage-vs-arrival deciders)
Separate research fleets attack the dated questions that decide everything: Is the Arabic form pre-1656? Is isseki-nicho a Meiji loan? Does the Mahabharata's written stratum postdate contact? Each gate gets a researcher + skeptic pair and returns one of: resolves-lineage / resolves-borrowed / resolves-independent / remains-gated (an honorable verdict — never force what evidence won't support). Gate results amend live payloads: Arabic moved gated→borrow on a 6.68-million-page verified negative; Greece moved gated→counted on Herodotus; Sanskrit moved counted→gated by our own dating rule. The corpus gets more honest in both directions.
STAGE 4 — Annotation (vocabulary-claude's lane)
Vocab keys each motif into the formal vocabularies: image schemas S1-S28 + pragmatic functions F01-F14, the ratified 3-field record (schema / filler_domain / filler — the level something echoes at IS the finding), proposition keying (claim-shape / polarity / tradition), and the pilot superstition keying (mechanism / vehicle / valence+quantum). Friction reports are mandatory — where the keying fights the data, that's evidence, and it has already forced two new functions, a third polarity question, and the function-only matcher tier.
STAGE 5 — Payload authoring (my lane)
Each motif becomes one JSON conforming to truismo_motif_payload_schema_v0.1.md. The load-bearing parts:
- Attestations carry: language, family block, cluster, provenance class (anchor / cog / borrow / conv / gated), native phrase, transliteration, gloss, the honest lit paragraph, attested_in_corpus: null (until a real corpus confirms — no exceptions), confidence.
- Edges encode transmission: cog (inheritance), borrow (documented copy), contested (scholarly dispute — never merges components; independence is never collapsed on a disputed claim).
- THE COUNTS ARE NEVER HAND-TYPED. The cardinal rule. Tongues = distinct languages; tradition blocks = distinct families; arrivals = connected components of the non-gated global graph. Gated entries are excluded and surfaced separately ("+N candidates awaiting gates"). So the lineage map IS the measurement — data cannot disagree with display, because display is computed from data.
- The honesty block is schema-REQUIRED: who curated, what was repaired, every owed gate. A payload without it fails validation. The discipline travels with the data, not with anyone's goodwill.
- A validation script then checks everything mechanical: duplicate keys, dangling edges, illegal prov values, em-dashes, the brand string byte-exact (Í = U+00CD).
STAGE 6 — Render (design-claude's lane)
One generator (build-workup.mjs) eats any payload and emits a five-act page: the turning prism of attestations (a true chord-tiled n-gon), the decomposition act (per-track variant), the lineage constellation (stars colored by provenance, gold descent lines, dashed-rose borrows, dotted-gold contested), the measurement panel (the naive count animating down to the collapsed count, with "awaiting-O3" tiles where no computed number exists — never a fake number), and the honesty rail. Adding truism #500 = one JSON + one manifest line, zero new code.
Then design runs its own 8-agent skeptic pass against my payload (independent of my checkers — it caught the Icelandic single-character miscopy my layer missed), plus the acceptance check: the generator must re-derive my predicted numbers exactly (9/6/3+3, 13/9/5+3...). A mismatch means one of us is wrong, loudly.
STAGE 7 — The gates before humans see it
- a11y-claude's formal sweep — separate from design's automated axe runs, with manual keyboard/screen-reader judgment. It properly failed us once (focus indicators, arrow-key capture: things axe structurally can't see); the fix was template-level, one render healed all pages. Echo's hard rule: design AND a11y both pass, or it isn't user-facing.
- Legal: the truÍSMO mark, and license rulings on any corpus we'd ingest (CC-BY-SA, Opie & Tatem). No ingestion before clearance.
- You: greenlight per surface. Always last, always yours.
STAGE 8 — The store and the scorer (data-claude's lane, O3)
The echo_findings store (research Postgres, 33 columns, 6 tables) holds what payloads hold, relationally: the 3-field record, match methods with rho weights (how much a match type counts: exact 1.0 → gloss 0.45 → image_schema 0.40 → function_only 0.35 → legacy 0.25), and promotion_ok=false flags that enforce never-auto-promote in the schema itself — a function-only match physically cannot advance without a human.
The scorer computes what the pages currently promise: Tier B (every tongue counted, Σ=I — "an honest lie"), Tier C (family blocks), Tier C-plus (full relatedness topology: N_eff = 1ᵀΣ⁻¹1, then surprisal = −log P(recurrence | N_eff), in bits). The first run is judged against four acceptance tests pre-registered before any number existed: gift-horse must collapse to ~zero, isseki-nicho must collapse, the Persian-vs-Chinese arrow must hold, golden-rule's residual must survive. If the corrector fails those, the corrector is broken — not the corpus.
STAGE 9 — Governance (how it stays honest at speed)
- Fork pattern: new architecture never opens corpus-wide sight-unseen. One specimen → rendered card → you ratify from the finished thing (proposition track: golden-rule; superstition track: salt, in flight). Pilots are designed to surface friction, and both have.
- Every ruling is a file (spec/ruling_*.md), every decision reaches you as an MCQ, every amendment is versioned. Frozen artifacts never silently change.
- The amendment loop never closes: gates resolve → payloads amend → one-command re-render → a11y re-checks. The golden-rule card has been through three versions in 24 hours and got stronger each time, because the audit only ever removes overclaim.
The epistemic spine (what all of it serves)
Single lineage vs. independent arrival — count lineages, not languages. Filler-constancy is the lineage fingerprint; filler-variance under a shared schema is what convergence looks like. Unknowns are gated, never guessed. Honest nulls get rendered too (piece-of-cake: "not everything echoes"). For superstitions, two-level provenance (surfaces descend, mechanisms converge) plus the firewall: belief-fact, never world-fact. And the back-projections people make about their own sayings (the salt ritual's fake Roman pedigree) are themselves data, dated and displayed.
The full gathering, in order
- The early bird, in many tongues“The early bird catches the worm.”18 tongues · 11 arrivals
- The gift horse, in many tongues“Don't look a gift horse in the mouth.”8 tongues · 1 arrival
- Two birds, one stone, in many tongues“Kill two birds with one stone.”13 tongues · 5 arrivals · 3 gated
- An arm and a leg, in many tongues“It cost an arm and a leg.”8 tongues · 5 arrivals
- The golden rule, in many traditions“Do unto others as you would have them do unto you.”9 tongues · 3 arrivals · 3 gated
- A piece of cake, in many tongues“It's a piece of cake.”8 tongues · 12 arrivals · 1 gated
- Breaking the ice, in many tongues“Break the ice.”6 tongues · 1 arrival
- The broken clock, in many tongues“Even a broken clock is right twice a day.”3 tongues · 3 arrivals
- The ball in your court, in many tongues“The ball is in your court.”4 tongues · 1 arrival · 1 gated
- The apple and the tree, in many tongues“The apple does not fall far from the tree.”13 tongues · 4 arrivals · 2 gated
- The drowning man's straw, in many tongues“A drowning man will clutch at a straw.”14 tongues · 5 arrivals · 3 gated
- Still waters, in many tongues“Still waters run deep.”10 tongues · 5 arrivals · 1 gated
- Try, try again, in many tongues“If at first you don't succeed, try, try again.”12 tongues · 9 arrivals · 2 gated
- The only constant, in many traditions“Change is the only constant.”6 tongues · 3 arrivals
- An eye for an eye, across the law codes“An eye for an eye, a tooth for a tooth.”5 tongues · 2 arrivals · 1 gated
- Turn the other cheek, and the saying built to break a saying“If anyone slaps you on the right cheek, turn to them the other cheek also.”5 tongues · 2 arrivals · 2 gated
- The bird that lays golden eggs, in many tongues“To kill the goose that lays the golden eggs.”8 tongues · 2 arrivals
Dr. Elliott's contributions to the literature
deeper summary available
Placeholder rows for layout; the verified bibliography arrives by injection and replaces them automatically.
In honor of the Good Dr. Glen R. Elliott, PhD MD for searing most of these into my brain since before I can remember.
The grammar of the verdicts: descent is a single lineage, many witnesses giving one testimony; convergence is many lineages arriving at one shape; the count that matters is lineages, not languages. Curated gatherings in the spirit of Echo: lineages named honestly, borrowings marked never hidden, unknowns counted as unknowns. Order is curation-batch order; once the instrument computes corrected surprisal, the wheel will say so and order by it. Misses are signal: ask for a saying we have not gathered and it joins the queue.
gathered and instrumented by ECHO · the instrument for measuring synchronicities