www.braintrust.dev
Open in
urlscan Pro
76.76.21.93
Public Scan
Submitted URL: http://braintrust.dev/
Effective URL: https://www.braintrust.dev/
Submission: On November 18 via manual from IN — Scanned from DE
Effective URL: https://www.braintrust.dev/
Submission: On November 18 via manual from IN — Scanned from DE
Form analysis
0 forms found in the DOMText Content
* Docs * Pricing * Blog * Careers * Chat with us * Sign in * Sign up for free SHIP LLM PRODUCTS THAT WORK. Braintrust is the end-to-end platform for building world-class AI apps. Get startedChat with us Trusted by AI teams at EvalLogPlayReview EVALUATE YOUR PROMPTS AND MODELS Non-deterministic models and unpredictable natural language inputs make building robust LLM applications difficult. Adapt your development lifecycle for the AI era with Braintrust's iterative LLM workflows. Easily answer questions like “which examples regressed when we changed the prompt?” or “what happens if I try this new model?” Learn more in docs PlaygroundEval via UIEval via SDK Translation 51 o1 mini 1,539 TOK 8.07s ~$0.018 Levenshtein distance 67 Mistral Nemo 1,021 TOK 4.83s ~$0.002 Similarity 89 GPT-4o 2,992 TOK 12.44s ~$0.010 Moderation 60 Claude 3.5 Sonnet 1,958 TOK 11.24s ~$0.008 Security 54 Gemini Pro 2,610 TOK 9.23s ~$0.008 Hallucination 33 Llama 3.5 1,620 TOK 10.2s ~$0.014 Summary 29 Sonar large online 1,004 TOK 12.2s ~$0.004 Translation 51 o1 mini 1,539 TOK 8.07s ~$0.018 Levenshtein distance 67 Mistral Nemo 1,021 TOK 4.83s ~$0.002 Similarity 89 GPT-4o 2,992 TOK 12.44s ~$0.010 Moderation 60 Claude 3.5 Sonnet 1,958 TOK 11.24s ~$0.008 Security 54 Gemini Pro 2,610 TOK 9.23s ~$0.008 Hallucination 33 Llama 3.5 1,620 TOK 10.2s ~$0.014 Summary 29 Sonar large online 1,004 TOK 12.2s ~$0.004 ANATOMY OF AN EVAL Braintrust evals are composed of three components—a prompt, scorers, and a dataset of examples. PROMPT Tweak LLM prompts from any AI provider, run them, and track their performance over time. Seamlessly and securely sync your prompts with your code. Prompts guide SCORERS Use industry standard autoevals or write your own using code or natural language. Scorers take an input, the LLM output, and an expected value to generate a score. Scorers guide DATASET Capture rated examples from staging and production and incorporate them into “golden” datasets. Datasets are integrated, versioned, scalable, and secure. Datasets guide FEATURES FOR EVERYONE Intuitively designed for both technical and non-technical team members, and synced between code and UI. TRACES Visualize and analyze LLM execution traces in real-time to debug and optimize your AI apps. Tracing guide MONITORING Monitor real-world AI interactions with insights to ensure your models perform optimally in production. Logging and monitoring ONLINE EVALS Continuously evaluate with automatic, asynchronous server-side scoring as you upload logs. Online evaluation docs FUNCTIONS Define functions in TypeScript and Python, and use as custom scorers or callable tools. Functions reference SELF-HOSTING Deploy and run Braintrust on your own infrastructure for full control over your data and compliance requirements. Self-hosting guide JOIN INDUSTRY LEADERS “Braintrust fills the missing (and critical!) gap of evaluating non-deterministic AI systems.” Mike Knoop Cofounder/Head of AI “I’ve never seen a workflow transformation like the one that incorporates evals into ‘mainstream engineering’ processes before. It’s astonishing.” Malte Ubl CTO “Braintrust finally brings end-to-end testing to AI products, helping companies produce meaningful quality metrics.” Michele Catasta President “We log everything to Braintrust. They make it very easy to find and fix issues.” Simon Last Cofounder “Every new AI project starts with evals in Braintrust—it’s a game changer.” Lee Weisberger Eng. Manager, AI Get started * Resources * Docs * Eval via UI * Eval via SDK * Guides * Cookbook * Changelog * Company * Pricing * Blog * Careers * Contact us * Terms of Service * Privacy Policy * Community * GitHub * Discord * LinkedIn * X