lemmas¶

Reliability primitives for any LLM API. No framework, no SDK lock-in -- just small modules that wrap whatever provider you use.

pip install lemmas

from lemmas import cove
from lemmas.adapters import openai_complete
from openai import OpenAI

complete = openai_complete(OpenAI(), model="gpt-4o-mini")
r = cove(complete, query="Who invented the laser?")
print(r.final)

What's in the box¶

Primitive	Paper	What it does
`cove`	Dhuliawala 2023	Generate, plan verification questions, answer each independently, revise.
`self_consistency`	Wang 2022	Sample N, vote on plurality answer.
`best_of_n`	classic	Sample N, score each, pick best.
`reflexion`	Shinn 2023	Try -> critique -> retry loop with feedback.
`debate`	Du, Li, Mordatch 2023	Multi-agent debate; agents revise after seeing others.
`DriftDetector`	rolling centroid	Detect prompt drift per bucket via z-score.
`race`	Dean & Barroso 2013	Hedged execution: race N callables, first wins.

Every primitive has an async sibling and accepts an optional tracer for observability.

Why lemmas exists¶

Modern LLM platforms (LangChain, LlamaIndex, LiteLLM) give you routing and abstractions. They don't give you the inference-time reliability methods from the research literature -- you end up reimplementing CoVe and self-consistency by hand in every project. Lemmas is the lowest-friction implementation of those methods, designed to drop into any stack.

What lemmas does NOT do¶

No router (use LiteLLM, Portkey, or your own).
No observability backend (use Langfuse, Phoenix, Datadog -- lemmas just emits events via a tracer protocol).
No retrieval / RAG (use LlamaIndex, LangChain, a real vector DB).
No agent loop framework.

These are deliberate. Lemmas is a small library, not a framework.