marco@altran ~ blog

marco@altran ~/blog $

posts

2026-03-09

Crawling and indexing 100M+ pages at Kagi

How I built the distributed crawling and indexing infrastructure behind Kagi's search engine — Frontera, Kafka, Redis Bloom deduplication, Vespa HA, and the trade-offs of running it in production.

search crawling distributed-systems kafka vespa