Yandori News Flow - Source Citation Graph Update

Based on feedback from the HN community, I reworked the core system to explicitly model source attribution and article-to-article references, not just textual similarity.

What changed:

Citation-based clustering

Clusters still start from full-text embeddings, but are now refined using explicit citations and source mentions inside articles.

External source nodes

Sites we don't actively crawl are added as nodes when referenced, so lineage isn't limited to the monitored set. This captures many large outlets that don't publish usable RSS feeds.

Validated "first to publish"

Attribution no longer relies purely on RSS timestamps — publish times are validated against citation order and reference structure.

Derivation graph view

The flow view now represents an inferred derivation graph, not just a timeline of similar headlines.

Search & historical archives

Browse and search across past stories to inspect earlier propagation patterns.

Context

The main criticism from the previous HN post was that similarity + RSS timestamps aren't sufficient to identify who actually broke a story, and that large sources were missing. Both were fair. This update addresses those issues by modeling explicit citation relationships and including referenced external sources as graph nodes.

Still English-only for now — trying to get attribution right before expanding.

Yandori – News Flow

Update: Improved news propagation tracking via source citation graphs