Evals & ObservabilityTypeScript

langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

Stars

25k

Forks

2.5k

Last push

today

License

NOASSERTION

Topics

analyticsautogenevaluationlangchainlarge-language-modelsllama-indexllmllm-evaluation

More Evals & Observability

See all

promptfoo

promptfoo/promptfoo

Evals & Observability

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

20k 1.7kTypeScripttoday

No open beginner issues

opik

comet-ml/opik

Evals & Observability

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

19k 1.4kPythontoday

No open beginner issues

evals

openai/evals

Evals & Observability

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

18k 2.9kPython2d ago

No open beginner issues