YandoriBot is a respectful RSS/Atom feed discovery crawler supporting the Yandori temporal prediction research system. Our crawler discovers and monitors syndication feeds to provide real-time content streams for multi-scale temporal prediction experiments.
The crawler implements graph-theoretic endpoint enumeration across heterogeneous web architectures, prioritizing HTML-embedded feed autodiscovery links before expanding to heuristic-based fallback paths.
YandoriBot/1.0 (+https://yandori.io/bot.html)
The crawler implements a respectful discovery pattern designed to minimize server load:
Exponential backoff with jitter based on HTTP status codes:
YandoriBot follows a systematic discovery process optimized for efficiency:
Add this to your /robots.txt file:
User-agent: YandoriBot Disallow: /
To block only feed discovery:
User-agent: YandoriBot Disallow: /feed Disallow: /rss Disallow: /*.xml
Block by user agent in Apache (.htaccess):
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Yandori [NC]
RewriteRule .* - [F,L]
Or in nginx:
if ($http_user_agent ~* "Yandori") {
return 403;
}
Collected feed data is used exclusively for:
All data collection is limited to publicly available RSS/Atom feeds. We respect copyright and DMCA takedown requests. Data is not sold to third parties.
Language: Go 1.24 (compiled with PGO + LTO) Database: MySQL 8.0 (InnoDB with Adaptive Hash Index) Protocols: HTTP/1.1, HTTP/2 (with ALPN negotiation) Feed Formats: RSS 2.0, Atom 1.0, JSON Feed 1.1
If you have questions, concerns, or need to report abuse:
Please include: