Top 5 Gen AI Takeaways from the Databricks State of Data + AI Report

Gen AI Insights from the latest Databricks Report on Customer Adoption of Data & AI

Click to get your copy of the 2024 Data + AI Report

There are only a handful of companies that have the kind of 30,000-foot perspective on “what’s happening” in Data and AI that Databricks does, and most of them aren’t producing comprehensive research reports on adoption trends across their customers. Thankfully, Databricks has chosen to share what’s happening with its customers in its latest ‘2024 State of Data + AI’ Report. The report highlights adoption trends and usage across a cross-section of 10,000 Databricks customers. I have to say it’s one of the more informative and valuable white-papers I’ve read this year (this is not a sponsored post and I’m not affiliated with Databricks in any way!). I’ve summarized five key highlights relevant to what’s happening in Gen AI here, but there’s much more related to Data and AI in the report… so head over to Databricks and download your copy today.

1. Gen AI is Dominating the Data + AI Product Adoption Charts

Of the top five Data and AI products tracked by Databricks, Huggingface Transformers was #2, moving up two spots in the past year. Huggingface Transformers is a Python library that allows developers to work with a wide range of pretrained, open-source AI models. Langchain, the open-source toolchain designed to simplify the creation, deployment, and management of applications that utilize large language models (LLMs), has moved into the #4 spot. The presence of both in the top 5 signals that enterprises are adopting the core model frameworks and toolkits needed to both fine-tune and build applications for LLMs. They also appear to be getting smarter about using them…

2. Enterprises are getting better at moving more modeling experiments into production

Databricks looked at the ratio of logged-to-registered models across all their customers to assess progress in fine-tuning models and developing related applications. In February 2023, the ratio of logged-to-registered models for Databricks customers was 16-to-1. By the end of March 2024, the ratio of logged-to-registered models for the same customers decreased to 5-to-1, an improvement of 3x. Moreover, Databricks saw customers put 11x more AI models into production this year. Financial Services led the way, becoming nearly 3x more efficient at getting models into production. Getting a harness on model output is likely one of the key drivers behind this increasing efficiency, and that control appears to be coming from increasing adoption of Retrieval Augmented Generation…

3. The RAG Train has left the station

Based on Databricks research, the adoption of Retrieval Augmented Generation caught fire last year, with the entire vector database category growing 377% YoY. RAG enhances traditional language model responses by combining information retrieval using vector databases with LLM text generation. Databricks saw 186% growth in the vector database category just since the launch of its Public Preview of Databricks Vector Search. Its research showed that 70% of companies leveraging GenAI are using tools, retrieval and vector databases to customize models. The desire to customize models has also led to an explosion in adoption of open source models like Meta Llama3…

4. Open-source LLMs dominate adoption, with Meta Llama3 leading the Way

Databricks found that 76% of companies that use LLMs are choosing open-source models, often alongside proprietary models. Meta Llama 3 launched on April 18, 2024 and, according to the Databricks report, within four weeks of launch it accounted for 39% of all open source LLM usage. All versions of Llama (Llama2 and Llama3) combined dominate open source adoption, with a super majority of open-source usage attributable to these models, followed by Mistral and Mixtra in a distance second and third place respectively.  

5. Financial Services is leading the way in GPU Usage for LLM Model Serving

According to Databricks data on model serving, Financial Services is leading all industries in GPU consumption for LLMs, followed by Communications, Media & Entertainment, Healthcare & Life Sciences, Manufacturing and Autos, and Retail and Consumer Goods. Not only did Financial Services lead the way, but GPU consumption in the industry grew an astonishing 88% in the last 6 months of the research period. Financial Services has always been one of the smartest industries when it comes to utilizing data, so it’s no surprise they are out in front in putting LLMs and Gen AI to work across business-critical use cases like fraud, risk, and customer service.

To read the full 2024 State of Data + AI report by Databricks, get your copy here.

Previous
Previous

MAGNeT: GEN AI-Powered Music Generation from META

Next
Next

How to Build a Brand-Aware Gen AI Content Engine