www.databricks.com
Open in
urlscan Pro
2606:4700::6812:3b3
Public Scan
URL:
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
Submission: On March 29 via api from US — Scanned from DE
Submission: On March 29 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Skip to main content * Why Databricks * * Discover * For Executives * For Startups * Lakehouse Architecture * Mosaic Research * Customers * Featured Stories * See All Customers * Partners * Cloud Providers Databricks on AWS, Azure, and GCP * Technology Partners Connect your existing tools to your Lakehouse * Data Partners Access the ecosystem of data consumers * Built on Databricks Build, market and grow your business * Consulting & System Integrators Experts to build, deploy and migrate to Databricks * C&SI Partner Program Build, deploy or migrate to the Lakehouse * Partner Solutions Find custom industry and migration solutions * Product * * Databricks Platform * Platform Overview A unified platform for data, analytics and AI * Sharing An open, secure, zero-copy sharing for all data * Governance Unified governance for all data, analytics and AI assets * Artificial Intelligence Build and deploy ML and GenAI applications * DBRX The new standard for high-quality and efficient LLMs * Data Management Data reliability, security and performance * Data Warehousing ETL and orchestration for batch and streaming data * Real-Time Analytics Real-time analytics, AI and applications made simple * Data Engineering ETL and orchestration for batch and streaming data * Data Science Collaborative data science at scale * Integrations and Data * Marketplace Open marketplace for data, analytics and AI * IDE Integrations Build on the Lakehouse in your favorite IDE * Partner Connect Discover and integrate with the Databricks ecosystem * Pricing * Databricks Pricing Explore product pricing, DBUs and more * Cost Calculator Estimate your compute costs on any cloud * Open Source * Open Source Technologies Learn more about the innovations behind the platform * Solutions * * Databricks for Industries * Communications * Financial Services * Healthcare & Life Sciences * Manufacturing * Media and Entertainment * Public Sector * Retail * See All Industries * Cross Industry Solutions * Customer Data Platform * Cyber Security * Migration & Deployment * Data Migration * Professional Services * Solution Accelerators * Explore Accelerators Move faster toward outcomes that matter * Resources * * Training and Certification * Learning Overview Hub for training, certification, events and more * Training Overview Discover curriculum tailored to your needs * Databricks Academy Sign in to the Databricks learning platform * Certification Gain recognition and differentiation * University Alliance Want to teach Databricks? See how. * Events * Data + AI Summit * Data + AI World Tour * Data Intelligence Days * Event Calendar * Blog and Podcasts * Databricks Blog Explore news, product announcements, and more * Databricks Mosaic Research Blog Discover the latest in our Gen AI research * Data Brew Podcast Let’s talk data! * Champions of Data + AI Podcast Insights from data leaders powering innovation * Get Help * Customer Support * Documentation * Community * Dive Deep * Resource Center * Demo Center * About * * Company * Who We Are * Our Team * Databricks Ventures * Contact Us * Careers * Working at Databricks * Open Jobs * Press * Awards and Recognition * Newsroom * Security and Trust * Security and Trust * DATA + AI SUMMIT JUNE 10–13 | SAN FRANCISCO Choose from 500+ sessions plus training and networking. REGISTER * Login * Try Databricks 1. All 2. / Mosaic Research 3. / Introducing DBRX: A New State-of-the-Art Open LLM INTRODUCING DBRX: A NEW STATE-OF-THE-ART OPEN LLM by The Mosaic Research Team March 27, 2024 in Mosaic AI Research -------------------------------------------------------------------------------- Share this post Today, we are excited to introduce DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to our measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM. This state-of-the-art quality comes with marked improvements in training and inference performance. DBRX advances the state-of-the-art in efficiency among open models thanks to its fine-grained mixture-of-experts (MoE) architecture. Inference is up to 2x faster than LLaMA2-70B, and DBRX is about 40% of the size of Grok-1 in terms of both total and active parameter-counts. When hosted on Mosaic AI Model Serving, DBRX can generate text at up to 150 tok/s/user. Our customers will find that training MoEs is also about 2x more FLOP-efficient than training dense models for the same final model quality. End-to-end, our overall recipe for DBRX (including the pretraining data, model architecture, and optimization strategy) can match the quality of our previous-generation MPT models with nearly 4x less compute. Figure 1: DBRX outperforms established open source models on language understanding (MMLU), Programming (HumanEval), and Math (GSM8K). The weights of the base model (DBRX Base) and the finetuned model (DBRX Instruct) are available on Hugging Face under an open license. Starting today, DBRX is available for Databricks customers to use via APIs, and Databricks customers can pretrain their own DBRX-class models from scratch or continue training on top of one of our checkpoints using the same tools and science we used to build it. DBRX is already being integrated into our GenAI-powered products, where - in applications like SQL - early rollouts have surpassed GPT-3.5 Turbo and are challenging GPT-4 Turbo. It is also a leading model among open models and GPT-3.5 Turbo on RAG tasks. Training mixture-of-experts models is hard. We had to overcome a variety of scientific and performance challenges to build a pipeline robust enough to repeatably train DBRX-class models in an efficient manner. Now that we have done so, we have a one-of-a-kind training stack that allows any enterprise to train world-class MoE foundation models from scratch. We look forward to sharing that capability with our customers and sharing our lessons learned with the community. Download DBRX today from Hugging Face (DBRX Base, DBRX Instruct), or try out DBRX Instruct in our HF Space, or see our model repository on github: databricks/dbrx. WHAT IS DBRX? DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and we found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses the GPT-4 tokenizer as provided in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments. DBRX was pretrained on 12T tokens of carefully curated data and a maximum context length of 32k tokens. We estimate that this data is at least 2x better token-for-token than the data we used to pretrain the MPT family of models. This new dataset was developed using the full suite of Databricks tools, including Apache Spark™ and Databricks notebooks for data processing, Unity Catalog for data management and governance, and MLflow for experiment tracking. We used curriculum learning for pretraining, changing the data mix during training in ways we found to substantially improve model quality. QUALITY ON BENCHMARKS VS. LEADING OPEN MODELS Table 1 shows the quality of DBRX Instruct and leading established, open models. DBRX Instruct is the leading model on composite benchmarks, programming and mathematics benchmarks, and MMLU. It surpasses all chat or instruction finetuned models on standard benchmarks. Composite benchmarks. We evaluated DBRX Instruct and peers on two composite benchmarks: the Hugging Face Open LLM Leaderboard (the average of ARC-Challenge, HellaSwag, MMLU, TruthfulQA, WinoGrande, and GSM8k) and the Databricks Model Gauntlet (a suite of over 30 tasks spanning six categories: world knowledge, commonsense reasoning, language understanding, reading comprehension, symbolic problem solving, and programming). Among the models we evaluated, DBRX Instruct scores the highest on two composite benchmarks: the Hugging Face Open LLM Leaderboard (74.5% vs. 72.7% for the next highest model, Mixtral Instruct) and the Databricks Gauntlet (66.8% vs. 60.7% for the next highest model, Mixtral Instruct). Programming and mathematics. DBRX Instruct is especially strong at programming and mathematics. It scores higher than the other open models we evaluated on HumanEval (70.1% vs. 63.2% for Grok-1, 54.8% for Mixtral Instruct, and 32.2% for the best-performing LLaMA2-70B variant) and GSM8k (66.9% vs. 62.9% for Grok-1, 61.1% for Mixtral Instruct, and 54.1% for the best-performing LLaMA2-70B variant). DBRX outperforms Grok-1, the next best model on these benchmarks, despite the fact that Grok-1 has 2.4x as many parameters. On HumanEval, DBRX Instruct even surpasses CodeLLaMA-70B Instruct, a model built explicitly for programming, despite the fact that DBRX Instruct is designed for general-purpose use (70.1% vs. 67.8% on HumanEval as reported by Meta in the CodeLLaMA blog). MMLU. DBRX Instruct scores higher than all other models we consider on MMLU, reaching 73.7%. Model DBRX Instruct Mixtral Instruct Mixtral Base LLaMA2-70B Chat LLaMA2-70B Base Grok-11 Open LLM Leaderboard2 (Avg of next 6 rows) 74.5% 72.7% 68.4% 62.4% 67.9% — ARC-challenge 25-shot 68.9% 70.1% 66.4% 64.6% 67.3% — HellaSwag 10-shot 89.0% 87.6% 86.5% 85.9% 87.3% — MMLU 5-shot 73.7% 71.4% 71.9% 63.9% 69.8% 73.0% Truthful QA 0-shot 66.9% 65.0% 46.8% 52.8% 44.9% — WinoGrande 5-shot 81.8% 81.1% 81.7% 80.5% 83.7% — GSM8k CoT 5-shot maj@13 66.9% 61.1% 57.6% 26.7% 54.1% 62.9% (8-shot) Gauntlet v0.34 (Avg of 30+ diverse tasks) 66.8% 60.7% 56.8% 52.8% 56.4% — HumanEval5 0-Shot, pass@1 (Programming) 70.1% 54.8% 40.2% 32.2% 31.0% 63.2% Table 1. Quality of DBRX Instruct and leading open models. See footnotes for details on how numbers were collected. Bolded and underlined is the highest score. QUALITY ON BENCHMARKS VS. LEADING CLOSED MODELS Table 2 shows the quality of DBRX Instruct and leading closed models. According to the scores reported by each model creator, DBRX Instruct surpasses GPT-3.5 (as described in the GPT-4 paper), and it is competitive with Gemini 1.0 Pro and Mistral Medium. Across nearly all benchmarks we considered, DBRX Instruct surpasses or - at worst - matches GPT-3.5. DBRX Instruct outperforms GPT-3.5 on general knowledge as measured by MMLU (73.7% vs. 70.0%) and commonsense reasoning as measured by HellaSwag (89.0% vs. 85.5%) and WinoGrande (81.8% vs. 81.6%). DBRX Instruct especially shines on programming and mathematical reasoning as measured by HumanEval (70.1% vs. 48.1%) and GSM8k (72.8% vs. 57.1%). DBRX Instruct is competitive with Gemini 1.0 Pro and Mistral Medium. Scores for DBRX Instruct are higher than Gemini 1.0 Pro on Inflection Corrected MTBench, MMLU, HellaSwag, and HumanEval, while Gemini 1.0 Pro is stronger on GSM8k. Scores for DBRX Instruct and Mistral Medium are similar for HellaSwag, while Mistral Medium is stronger on Winogrande and MMLU and DBRX Instruct is stronger on HumanEval, GSM8k, and Inflection Corrected MTBench. Model DBRX Instruct GPT-3.57 GPT-48 Claude 3 Haiku Claude 3 Sonnet Claude 3 Opus Gemini 1.0 Pro Gemini 1.5 Pro Mistral Medium Mistral Large MT Bench (Inflection corrected, n=5) 8.39 ± 0.08 — — 8.41 ± 0.04 8.54 ± 0.09 9.03 ± 0.06 8.23 ± 0.08 — 8.05 ± 0.12 8.90 ± 0.06 MMLU 5-shot 73.7% 70.0% 86.4% 75.2% 79.0% 86.8% 71.8% 81.9% 75.3% 81.2% HellaSwag 10-shot 89.0% 85.5% 95.3% 85.9% 89.0% 95.4% 84.7% 92.5% 88.0% 89.2% HumanEval 0-Shot pass@1 (Programming) 70.1% temp=0, N=1 48.1% 67.0% 75.9% 73.0% 84.9% 67.7% 71.9% 38.4% 45.1% GSM8k CoT maj@1 72.8% (5-shot) 57.1% (5-shot) 92.0% (5-shot) 88.9% 92.3% 95.0% 86.5% (maj1@32) 91.7% (11-shot) 66.7% (5-shot) 81.0% (5-shot) WinoGrande 5-shot 81.8% 81.6% 87.5% — — — — — 88.0% 86.7% Table 2. Quality of DBRX Instruct and leading closed models. Other than Inflection Corrected MTBench (which we measured ourselves on model endpoints), numbers were as reported by the creators of these models in their respective whitepapers. See footnotes for additional details. QUALITY ON LONG-CONTEXT TASKS AND RAG DBRX Instruct was trained with up to a 32K token context window. Table 3 compares its performance to that of Mixtral Instruct and the latest versions of the GPT-3.5 Turbo and GPT-4 Turbo APIs on a suite of long-context benchmarks (KV-Pairs from the Lost in the Middle paper and HotpotQAXL, a modified version of HotPotQA that extends the task to longer sequence lengths). GPT-4 Turbo is generally the best model at these tasks. However, with one exception, DBRX Instruct performs better than GPT-3.5 Turbo at all context lengths and all parts of the sequence. Overall performance for DBRX Instruct and Mixtral Instruct are similar. Model DBRX Instruct Mixtral Instruct GPT-3.5 Turbo (API) GPT-4 Turbo (API) Answer in Beginning Third of Context 45.1% 41.3% 37.3%* 49.3% Answer in Middle Third of Context 45.3% 42.7% 37.3%* 49.0% Answer in Last Third of Context 48.0% 44.4% 37.0%* 50.9% 2K Context 59.1% 64.6% 36.3% 69.3% 4K Context 65.1% 59.9% 35.9% 63.5% 8K Context 59.5% 55.3% 45.0% 61.5% 16K Context 27.0% 20.1% 31.7% 26.0% 32K Context 19.9% 14.0% — 28.5% Table 3. The average performance of models on the KV-Pairs and HotpotQAXL benchmarks. Bold is the highest score. Underlined is the highest score other than GPT-4 Turbo. GPT-3.5 Turbo supports a maximum context length of 16K, so we could not evaluate it at 32K. *Averages for the beginning, middle, and end of the sequence for GPT-3.5 Turbo include only contexts up to 16K. One of the most popular ways to leverage a model’s context is retrieval augmented generation (RAG). In RAG, content relevant to a prompt is retrieved from a database and presented alongside the prompt to give the model more information than it would otherwise have. Table 4 shows the quality of DBRX on two RAG benchmarks - Natural Questions and HotPotQA - when the model is also provided with the top 10 passages retrieved from a corpus of Wikipedia articles using the embedding model bge-large-en-v1.5. DBRX Instruct is competitive with open models like Mixtral Instruct and LLaMA2-70B Chat and the current version of GPT-3.5 Turbo. Model DBRX Instruct Mixtral Instruct LLaMa2-70B Chat GPT 3.5 Turbo (API) GPT 4 Turbo (API) Natural Questions 60.0% 59.1% 56.5% 57.7% 63.9% HotPotQA 55.0% 54.2% 54.7% 53.0% 62.9% Table 4. The performance of the models measured when each model is given the top 10 passages retrieved from a Wikipedia corpus using bge-large-en-v1.5. Accuracy is measured by matching within the model’s answer. Bold is the highest score. Underlined is the highest score other than GPT-4 Turbo. TRAINING EFFICIENCY Model quality must be placed in the context of how efficient the model is to train and use. This is especially so at Databricks, where we build models like DBRX to establish a process for our customers to train their own foundation models. We found training mixture-of-experts models to provide substantial improvements in compute-efficiency for training (Table 5). For example, training a smaller member of the DBRX family called DBRX MoE-B (23.5B total parameters, 6.6B active parameters) required 1.7x fewer FLOPs to reach a score of 45.5% on the Databricks LLM Gauntlet than LLaMA2-13B required to reach 43.8%. DBRX MoE-B also contains half as many active parameters as LLaMA2-13B. Looking holistically, our end-to-end LLM pretraining pipeline has become nearly 4x more compute-efficient in the past ten months. On May 5, 2023, we released MPT-7B, a 7B parameter model trained on 1T tokens that reached a Databricks LLM Gauntlet score of 30.9%. A member of the DBRX family called DBRX MoE-A (7.7B total parameters, 2.2B active parameters) reached a Databricks Gauntlet score of 30.5% with 3.7x fewer FLOPs. This efficiency is the result of a number of improvements, including using an MoE architecture, other architecture changes to the network, better optimization strategies, better tokenization, and - very importantly - better pretraining data. In isolation, better pretraining data made a substantial impact on model quality. We trained a 7B model on 1T tokens (called DBRX Dense-A) using the DBRX pretraining data. It reached 39.0% on the Databricks Gauntlet compared to 30.9% for MPT-7B. We estimate that our new pretraining data is at least 2x better token-for-token than the data used to train MPT-7B. In other words, we estimate that half as many tokens are necessary to reach the same model quality. We determined this by training DBRX Dense-A on 500B tokens; it outperformed MPT-7B on the Databricks Gauntlet, reaching 32.1%. In addition to better data quality, another important contributor to this token-efficiency may be the GPT-4 tokenizer, which has a large vocabulary and is believed to be especially token-efficient. These lessons about improving data quality translate directly into practices and tools that our customers use to train foundation models on their own data. Model Total Params Active Params Gauntlet Score Relative FLOPs DBRX MoE-A 7.7B 2.2B 30.5% 1x MPT-7B (1T tokens) — 6.7B 30.9% 3.7x DBRX Dense-A (1T tokens) — 6.7B 39.0% 3.7x DBRX Dense-A (500B tokens) — 6.7B 32.1% 1.85x DBRX MoE-B 23.5B 6.6B 45.5% 1x LLaMA2-13B — 13.0B 43.8% 1.7x Table 5. Details of several test articles that we used to validate the training efficiency of the DBRX MoE architecture and end-to-end training pipeline INFERENCE EFFICIENCY Figure 2 shows the end-to-end inference efficiency of serving DBRX and similar models using NVIDIA TensorRT-LLM with our optimized serving infrastructure and 16-bit precision. We aim for this benchmark to reflect real-world usage as closely as possible, including multiple users simultaneously hitting the same inference server. We spawn one new user per second, each user request contains an approximately 2000 token prompt, and each response comprises 256 tokens. In general, MoE models are faster at inference than their total parameter-counts would suggest. This is due to the fact that they use relatively few parameters for each input. We find that DBRX is no exception in this respect. DBRX inference throughput is 2-3x higher than a 132B non-MoE model. Inference efficiency and model quality are typically in tension: bigger models typically reach higher quality, but smaller models are more efficient for inference. Using an MoE architecture makes it possible to attain better tradeoffs between model quality and inference efficiency than dense models typically achieve. For example, DBRX is both higher quality than LLaMA2-70B and - thanks to having about half as many active parameters - DBRX inference throughput is up to 2x faster (Figure 2). Mixtral is another point on the improved pareto frontier attained by MoE models: it is smaller than DBRX, and it is correspondingly lower in terms of quality but reaches higher inference throughput. Users of the Databricks Foundation Model APIs can expect to see up to 150 tokens per second for DBRX on our optimized model serving platform with 8-bit quantization. Figure 2. Inference throughput for various model configurations on our optimized serving infrastructure using NVIDIA TensorRT-LLM at 16-bit precision with the best optimization flags we could find. Models are run in tensor-parallel across the entire node. The input prompt contains approximately 2000 prompt tokens and we generate 256 output tokens. One new user spawns every second. HOW WE BUILT DBRX DBRX was trained on 3072 NVIDIA H100s connected by 3.2Tbps Infiniband. The main process of building DBRX - including pretraining, post-training, evaluation, red-teaming, and refining - took place over the course of three months. It was the continuation of months of science, dataset research, and scaling experiments, not to mention years of LLM development at Databricks that includes the MPT and Dolly projects and the thousands of models we have built and brought to production with our customers. To build DBRX, we leveraged the same suite of Databricks tools that are available to our customers. We managed and governed our training data using Unity Catalog. We explored this data using newly acquired Lilac AI. We processed and cleaned this data using Apache Spark™ and Databricks notebooks. We trained DBRX using optimized versions of our open-source training libraries: MegaBlocks, LLM Foundry, Composer, and Streaming. We managed large scale model training and finetuning across thousands of GPUs using our Mosaic AI Training service. We logged our results using MLflow. We collected human feedback for quality and safety improvements through Mosaic AI Model Serving and Inference Tables. We manually experimented with the model using the Databricks Playground. We found the Databricks tools to be best-in-class for each of their purposes, and we benefited from the fact that they were all part of a unified product experience. GET STARTED WITH DBRX ON DATABRICKS If you’re looking to start working with DBRX right away, it’s easy to do so with the Databricks Mosaic AI Foundation Model APIs. You can quickly get started with our pay-as-you-go pricing and query the model from our AI Playground chat interface. For production applications, we offer a provisioned throughput option to provide performance guarantees, support for finetuned models, and additional security and compliance. To privately host DBRX, you can download the model from the Databricks Marketplace and deploy the model on Model Serving. CONCLUSIONS At Databricks, we believe that every enterprise should have the ability to control its data and its destiny in the emerging world of GenAI. DBRX is a central pillar of our next generation of GenAI products, and we look forward to the exciting journey that awaits our customers as they leverage the capabilities of DBRX and the tools we used to build it. In the past year, we have trained thousands of LLMs with our customers. DBRX is only one example of the powerful and efficient models being built at Databricks for a wide range of applications, from internal features to ambitious use-cases for our customers. As with any new model, the journey with DBRX is just the beginning, and the best work will be done by those who build on it: enterprises and the open community. This is also just the beginning of our work on DBRX, and you should expect much more to come. CONTRIBUTIONS The development of DBRX was led by the Mosaic team that previously built the MPT model family, in collaboration with dozens of engineers, lawyers, procurement and finance specialists, program managers, marketers, designers, and other contributors from across Databricks. We are grateful to our colleagues, friends, family, and the community for their patience and support over the past months. In creating DBRX, we stand on the shoulders of giants in the open and academic community. By making DBRX available openly, we intend to invest back in the community in hopes that we will build even greater technology together in the future. With that in mind, we gratefully acknowledge the work and collaboration of Trevor Gale and his MegaBlocks project (Trevor’s PhD adviser is Databricks CTO Matei Zaharia), the PyTorch team and the FSDP project, NVIDIA and the TensorRT-LLM project, the vLLM team and project, EleutherAI and their LLM evaluation project, Daniel Smilkov and Nikhil Thorat at Lilac AI, and our friends at the Allen Institute for Artificial Intelligence (AI2). ABOUT DATABRICKS Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow. To learn more, follow Databricks on LinkedIn, X, and Facebook. -------------------------------------------------------------------------------- 1 Numbers as reported by xAI. Due to a lack of Hugging Face-compatible checkpoint at release time, we could not evaluate Grok-1 ourselves on our full suite of benchmarks. 2 DBRX was measured by us using the EleutherAI Harness. All other numbers were as reported on the Hugging Face Open LLM Leaderboard. 3 DBRX was measured by us using the EleutherAI Harness with the same older commit that is used by the Hugging Face Open LLM Leaderboard. All other numbers were as reported on the Hugging Face Open LLM Leaderboard. Note that when using the latest commit of the EleutherAI Harness, which includes several parsing fixes, DBRX’s 5-shot score on GSM8k goes up to 72.8% as reported in Table 2. LLaMA2-70B Chat also goes up to 48.4%. 4 Measured by Databricks using Gauntlet v0.3.0 in LLM Foundry. 5 Unless otherwise noted, measured by Databricks. 6 This number is from the Mixtral Arxiv paper. We report this number because it is higher than what we measured when evaluating the model ourselves (36.7%) 7 All scores as reported in the GPT-4 paper. We could not collect Inflection Corrected MTBench because this version of GPT-3.5 is not available. We found the current version of GPT-3.5 Turbo to score 8.58 ± 0.04 on Inflection Corrected MTBench compared to 8.39 +/- 0.08 for DBRX Instruct. 8 All scores as reported in the GPT-4 paper. We could not collect Inflection Corrected MTBench because this version of GPT-4 is not available. We found the current version of GPT-4 Turbo to score 9.27 ± 0.10 on Inflection Corrected MTBench compared to 8.39 +/- 0.08 for DBRX Instruct. RELATED POSTS March 21, 2024 TURBOCHARGED TRAINING: OPTIMIZING THE DATABRICKS MOSAIC AI STACK WITH FP8 May 5, 2023 INTRODUCING MPT-7B: A NEW STANDARD FOR OPEN-SOURCE, COMMERCIALLY USABLE LLMS * * Why Databricks * Discover * For Executives * For Startups * Lakehouse Architecture * Mosaic Research * Customers * Featured * See All * Partners * Cloud Providers * Technology Partners * Data Partners * Built on Databricks * Consulting & System Integrators * C&SI Partner Program * Partner Solutions * Why Databricks * Discover * For Executives * For Startups * Lakehouse Architecture * Mosaic Research * Customers * Featured * See All * Partners * Cloud Providers * Technology Partners * Data Partners * Built on Databricks * Consulting & System Integrators * C&SI Partner Program * Partner Solutions * * Product * Databricks Platform * Platform Overview * Sharing * Governance * Artificial Intelligence * DBRX * Data Management * Data Warehousing * Real-Time Analytics * Data Engineering * Data Science * Pricing * Pricing Overview * Pricing Calculator * Open Source * Integrations and Data * Marketplace * IDE Integrations * Partner Connect * Product * Databricks Platform * Platform Overview * Sharing * Governance * Artificial Intelligence * DBRX * Data Management * Data Warehousing * Real-Time Analytics * Data Engineering * Data Science * Pricing * Pricing Overview * Pricing Calculator * Open Source * Integrations and Data * Marketplace * IDE Integrations * Partner Connect * * Solutions * Databricks For Industries * Communications * Financial Services * Healthcare and Life Sciences * Manufacturing * Media and Entertainment * Public Sector * Retail * View All * Cross Industry Solutions * Customer Data Platform * Cyber Security * Data Migration * Professional Services * Solution Accelerators * Solutions * Databricks For Industries * Communications * Financial Services * Healthcare and Life Sciences * Manufacturing * Media and Entertainment * Public Sector * Retail * View All * Cross Industry Solutions * Customer Data Platform * Cyber Security * Data Migration * Professional Services * Solution Accelerators * * Resources * Documentation * Customer Support * Community * Training and Certification * Learning Overview * Training Overview * Certification * University Alliance * Databricks Academy Login * Events * Data + AI Summit * Data + AI World Tour * Data Intelligence Days * Full Calendar * Blog and Podcasts * Databricks Blog * Databricks Mosaic Research Blog * Data Brew Podcast * Champions of Data & AI Podcast * Resources * Documentation * Customer Support * Community * Training and Certification * Learning Overview * Training Overview * Certification * University Alliance * Databricks Academy Login * Events * Data + AI Summit * Data + AI World Tour * Data Intelligence Days * Full Calendar * Blog and Podcasts * Databricks Blog * Databricks Mosaic Research Blog * Data Brew Podcast * Champions of Data & AI Podcast * * About * Company * Who We Are * Our Team * Databricks Ventures * Careers * Open Jobs * Working at Databricks * Press * Awards and Recognition * Newsroom * Security and Trust * About * Company * Who We Are * Our Team * Databricks Ventures * Careers * Open Jobs * Working at Databricks * Press * Awards and Recognition * Newsroom * Security and Trust Databricks Inc. 160 Spear Street, 13th Floor San Francisco, CA 94105 1-866-330-0121 * * * * * * See Careers at Databricks * * * * * * © Databricks 2024. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. * Privacy Notice * |Terms of Use * |Your Privacy Choices * |Your California Privacy Rights * WE CARE ABOUT YOUR PRIVACY By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. Reject all cookies Accept all cookies Manage Preferences PRIVACY PREFERENCE CENTER * YOUR PRIVACY * STRICTLY NECESSARY COOKIES * PERFORMANCE COOKIES * FUNCTIONAL COOKIES * TARGETING COOKIES YOUR PRIVACY When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer. More information STRICTLY NECESSARY COOKIES Always Active These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. PERFORMANCE COOKIES Performance Cookies These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. FUNCTIONAL COOKIES Functional Cookies These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly. TARGETING COOKIES Targeting Cookies These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. If you do not allow these cookies, you will experience less targeted advertising. Back Button COOKIE LIST Filter Button Consent Leg.Interest checkbox label label checkbox label label checkbox label label Clear checkbox label label Apply Cancel Confirm My Choices Reject All Allow All