lemmas¶
Reliability primitives for any LLM API. No framework, no SDK lock-in -- just small modules that wrap whatever provider you use.
from lemmas import cove
from lemmas.adapters import openai_complete
from openai import OpenAI
complete = openai_complete(OpenAI(), model="gpt-4o-mini")
r = cove(complete, query="Who invented the laser?")
print(r.final)
What's in the box¶
| Primitive | Paper | What it does |
|---|---|---|
cove |
Dhuliawala 2023 | Generate, plan verification questions, answer each independently, revise. |
self_consistency |
Wang 2022 | Sample N, vote on plurality answer. |
best_of_n |
classic | Sample N, score each, pick best. |
reflexion |
Shinn 2023 | Try -> critique -> retry loop with feedback. |
debate |
Du, Li, Mordatch 2023 | Multi-agent debate; agents revise after seeing others. |
DriftDetector |
rolling centroid | Detect prompt drift per bucket via z-score. |
race |
Dean & Barroso 2013 | Hedged execution: race N callables, first wins. |
Every primitive has an async sibling and accepts an optional tracer for observability.
Why lemmas exists¶
Modern LLM platforms (LangChain, LlamaIndex, LiteLLM) give you routing and abstractions. They don't give you the inference-time reliability methods from the research literature -- you end up reimplementing CoVe and self-consistency by hand in every project. Lemmas is the lowest-friction implementation of those methods, designed to drop into any stack.
What lemmas does NOT do¶
- No router (use LiteLLM, Portkey, or your own).
- No observability backend (use Langfuse, Phoenix, Datadog -- lemmas just emits events via a tracer protocol).
- No retrieval / RAG (use LlamaIndex, LangChain, a real vector DB).
- No agent loop framework.
These are deliberate. Lemmas is a small library, not a framework.