www.braintrust.dev Open in urlscan Pro
76.76.21.93  Public Scan

Submitted URL: http://braintrust.dev/
Effective URL: https://www.braintrust.dev/
Submission: On November 18 via manual from IN — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

 * Docs
 * Pricing
 * Blog
 * Careers
 * Chat with us
 * Sign in
 * Sign up for free




SHIP LLM PRODUCTS THAT WORK.

Braintrust is the end-to-end platform for building world-class AI apps.

Get startedChat with us
Trusted by AI teams at



EvalLogPlayReview


EVALUATE YOUR PROMPTS AND MODELS

Non-deterministic models and unpredictable natural language inputs make building
robust LLM applications difficult. Adapt your development lifecycle for the AI
era with Braintrust's iterative LLM workflows.

Easily answer questions like “which examples regressed when we changed the
prompt?” or “what happens if I try this new model?”

Learn more in docs
PlaygroundEval via UIEval via SDK
Translation
51

o1 mini
1,539 TOK
8.07s
~$0.018

Levenshtein distance
67

Mistral Nemo
1,021 TOK
4.83s
~$0.002

Similarity
89

GPT-4o
2,992 TOK
12.44s
~$0.010

Moderation
60

Claude 3.5 Sonnet
1,958 TOK
11.24s
~$0.008

Security
54

Gemini Pro
2,610 TOK
9.23s
~$0.008

Hallucination
33

Llama 3.5
1,620 TOK
10.2s
~$0.014

Summary
29

Sonar large online
1,004 TOK
12.2s
~$0.004

Translation
51

o1 mini
1,539 TOK
8.07s
~$0.018

Levenshtein distance
67

Mistral Nemo
1,021 TOK
4.83s
~$0.002

Similarity
89

GPT-4o
2,992 TOK
12.44s
~$0.010

Moderation
60

Claude 3.5 Sonnet
1,958 TOK
11.24s
~$0.008

Security
54

Gemini Pro
2,610 TOK
9.23s
~$0.008

Hallucination
33

Llama 3.5
1,620 TOK
10.2s
~$0.014

Summary
29

Sonar large online
1,004 TOK
12.2s
~$0.004



ANATOMY OF AN EVAL

Braintrust evals are composed of three components—a prompt, scorers, and a
dataset of examples.


PROMPT

Tweak LLM prompts from any AI provider, run them, and track their performance
over time. Seamlessly and securely sync your prompts with your code.

Prompts guide


SCORERS

Use industry standard autoevals or write your own using code or natural
language. Scorers take an input, the LLM output, and an expected value to
generate a score.

Scorers guide


DATASET

Capture rated examples from staging and production and incorporate them into
“golden” datasets. Datasets are integrated, versioned, scalable, and secure.

Datasets guide


FEATURES FOR EVERYONE

Intuitively designed for both technical and non-technical team members, and
synced between code and UI.


TRACES

Visualize and analyze LLM execution traces in real-time to debug and optimize
your AI apps.

Tracing guide


MONITORING

Monitor real-world AI interactions with insights to ensure your models perform
optimally in production.

Logging and monitoring


ONLINE EVALS

Continuously evaluate with automatic, asynchronous server-side scoring as you
upload logs.

Online evaluation docs


FUNCTIONS

Define functions in TypeScript and Python, and use as custom scorers or callable
tools.

Functions reference


SELF-HOSTING

Deploy and run Braintrust on your own infrastructure for full control over your
data and compliance requirements.

Self-hosting guide



JOIN INDUSTRY LEADERS

“Braintrust fills the missing (and critical!) gap of evaluating
non-deterministic AI systems.”
Mike Knoop
Cofounder/Head of AI
“I’ve never seen a workflow transformation like the one that incorporates evals
into ‘mainstream engineering’ processes before. It’s astonishing.”
Malte Ubl
CTO
“Braintrust finally brings end-to-end testing to AI products, helping companies
produce meaningful quality metrics.”
Michele Catasta
President
“We log everything to Braintrust. They make it very easy to find and fix
issues.”
Simon Last
Cofounder
“Every new AI project starts with evals in Braintrust—it’s a game changer.”
Lee Weisberger
Eng. Manager, AI
Get started

























































 * Resources
 * Docs
 * Eval via UI
 * Eval via SDK
 * Guides
 * Cookbook
 * Changelog

 * Company
 * Pricing
 * Blog
 * Careers
 * Contact us
 * Terms of Service
 * Privacy Policy

 * Community
 * GitHub
 * Discord
 * LinkedIn
 * X