Back to writing

Cutting public-procurement search from 13 seconds to 150 milliseconds

As an intern, the first system I owned end to end was a search engine for Brazilian public procurement. Buyers inside government agencies needed to find tenders, contracts, and court rulings scattered across Comprasnet, the TCU, and the AGU: portals that didn’t talk to each other and had no real search. A study from the CGU found that 85% of public agencies were under-resourced for procurement planning, largely because finding precedent was so painful. People did it by hand.

Our first version queried Postgres directly, with ILIKE and a pile of joins. Over a corpus that kept growing (we were ingesting around 600 new tenders a day, plus rulings and reference terms), a single search took about 13 seconds. For a tool meant to replace manual lookup, 13 seconds is the same as broken.

Where the time went

The slowness wasn’t one bug; it was the wrong tool. Full-text search over hundreds of thousands of long documents with ILIKE '%term%' can’t use an index, so every query scanned the whole table. Every extra OR clause for relevance made it worse. We were asking a relational database to be a search engine.

What I changed

I moved search to Apache Solr and treated indexing as a first-class part of the pipeline:

  • Modeled documents around how people actually searched (by object, agency, value range, and date) instead of mirroring the database schema.
  • Tuned analyzers for Portuguese (stemming, accent folding, stopwords) so “aquisição” and “aquisicao” matched.
  • Built a daily indexing job that normalized and enriched each new document before it hit the index, keeping queries fast as the corpus grew.
  • Kept Postgres as the system of record and let Solr own retrieval.

The result

Query time dropped from ~13 seconds to ~150 milliseconds, about a 98% reduction. The same search that used to make people wait now returned before they finished reading the page. The engine went on to back our procurement-planning product (DTR40) for three government clients, including the Ministry of Agriculture and the Federal District’s IT secretariat.

What it taught me

The lesson wasn’t “Solr is fast.” It was that performance problems are usually disguised modeling problems. Before reaching for caches or bigger machines, I now ask whether the data is even shaped for the question being asked. Most of the 98% came from that one decision: moving retrieval to a tool built for it, and designing the index around the user, not the table.