Skip to content

lemmas

Reliability primitives for any LLM API. No framework, no SDK lock-in -- just small modules that wrap whatever provider you use.

pip install lemmas
from lemmas import cove
from lemmas.adapters import openai_complete
from openai import OpenAI

complete = openai_complete(OpenAI(), model="gpt-4o-mini")
r = cove(complete, query="Who invented the laser?")
print(r.final)

What's in the box

Primitive Paper What it does
cove Dhuliawala 2023 Generate, plan verification questions, answer each independently, revise.
self_consistency Wang 2022 Sample N, vote on plurality answer.
best_of_n classic Sample N, score each, pick best.
reflexion Shinn 2023 Try -> critique -> retry loop with feedback.
debate Du, Li, Mordatch 2023 Multi-agent debate; agents revise after seeing others.
DriftDetector rolling centroid Detect prompt drift per bucket via z-score.
race Dean & Barroso 2013 Hedged execution: race N callables, first wins.

Every primitive has an async sibling and accepts an optional tracer for observability.

Why lemmas exists

Modern LLM platforms (LangChain, LlamaIndex, LiteLLM) give you routing and abstractions. They don't give you the inference-time reliability methods from the research literature -- you end up reimplementing CoVe and self-consistency by hand in every project. Lemmas is the lowest-friction implementation of those methods, designed to drop into any stack.

What lemmas does NOT do

  • No router (use LiteLLM, Portkey, or your own).
  • No observability backend (use Langfuse, Phoenix, Datadog -- lemmas just emits events via a tracer protocol).
  • No retrieval / RAG (use LlamaIndex, LangChain, a real vector DB).
  • No agent loop framework.

These are deliberate. Lemmas is a small library, not a framework.