Skip to main content
Stacklane

RAG pipelines, retrieval that earns the answer, citations users can verify.

RAG fails quietly. The model returns a confident answer; the answer cites the wrong section; the user doesn't notice until they're talking to their boss. We build retrieval pipelines where the citation is the proof, evaluation is continuous, and the answer falls back to 'I don't know' instead of hallucinating.

What we build

  • Chunking that respects document structure

    Markdown gets chunked on heading boundaries, PDFs on page breaks, code on function boundaries, transcripts on speaker turns. No 512-token sliding window for every input type. Chunks carry their parent document, position, and structural breadcrumb back.

  • Hybrid retrieval, not just embeddings

    Vector similarity for semantic match plus BM25 for exact terms (model names, API methods, error codes). Reranked with a small cross-encoder before the LLM call. Pure embedding retrieval misses too many keyword-shaped queries to ship to production.

  • Eval harness for retrieval, not just generation

    Recall@k and MRR measured against a labelled question/answer set that ships with the product. Every retrieval-layer change runs the eval suite. We don't ship 'the embeddings feel better' without numbers backing it.

  • Citations that link to the source span

    Every claim in the LLM output is bound to a retrieval chunk via citation markers the UI renders as inline footnotes. Click the footnote, see the source span highlighted in the original document. Hallucinations become reportable, not invisible.

  • Refusal over hallucination

    If retrieval returns nothing above the relevance floor, the model is instructed to refuse instead of synthesizing. 'I don't have a source for that' beats a confident wrong answer when the user is making a decision off the response.

  • Re-indexing without downtime

    Embeddings change (new model version, new chunker). Re-index runs to a shadow table, validates against the eval suite, and atomic-swaps. The product never serves a half-indexed corpus to a customer.

Where this fits

  1. You shipped a RAG demo that works on the seed corpus and fails the moment the customer's real documents arrive.

  2. Your AI feature is generating answers but the support team can't verify them because there are no citations.

  3. You're embedding documents with one model version, querying with another, and the relevance has been drifting for months.

Tech stack

  • TypeScript
  • pgvector
  • OpenAI Embeddings
  • Postgres
  • BullMQ

Want this for your team?

30 minutes with a founder or senior engineer. We'll scope what you need and tell you straight whether Stacklane fits.

Book a Free Call

Related capabilities

Other patterns in this area

Back to For AI Products