Founder Conversations: A Week of LLM Insights in the Bay Area

Valtteri Karesto, Joni Juup, Arttu LaitinenOctober 3, 2023
At gatherings like Ray Summit, a shift was evident: while RAG is gaining traction in LLM deployment, it's not without challenges. Fine-tuning smaller models, like the LLama 7B, offers cost-effective competition to giants like GPT-4. As LLM applications grow in intricacy, the industry is pivoting from 'chaining' to 'orchestration,' grappling with performance evaluations and the nuances of open-source vs. proprietary models.

Introduction

We spent a week talking to founders and builders at Ray Summit, TechCrunch Disrupt, and various Bay Area GenAI meetups to understand the challenges they face when building value-providing LLM-based apps.


Overview

  • RAG is omnipresent. Retrieval Augmented Generation is the current trend everyone is working on. While it addresses many challenges of deploying LLMs in production, it introduces several others. We'll delve deeper into these later in this post.
  • Fine-tuned smaller models are making an impact. In certain use cases, a fine-tuned LLama 7B model can outperform GPT-4 at a fraction of the cost.
  • The term "orchestration" seems to be largely replacing "chaining." As LLM apps become more intricate, orchestration is a challenge that development teams are grappling with.
  • Another challenge is evaluating LLM performance, or "evals", which involves determining if your chain/agent is producing accurate answers, or whether the changes you implement improve or degrade the system.
  • The essence of RAG-based system performance lies in chunking strategies and the subsequent vectorization of data chunks. If the contextual data you retrieve to aid your LLM in answering questions is inaccurate or of poor quality, your LLM will not respond correctly.
  • If the retrieved context is correct, it works. Open-source LLMs (like Llama 2 models) and Proprietary models (like GPT-3 and GPT-4 models) seem to perform almost equally well at forming answers.
  • Anyscale released reasonably cheap endpoints for Llama models. They will add fine-tuning endpoints later this year. Switching from OpenAI models is easy since the API is almost 1-to-1 match with OpenAI API.

Ray Summit 2023 — Highlights

John Schulman, Co-founder, OpenAI

Ray Summit Day 1 Keynote Watch the keynote

Paraphrasing John regarding whether they were surprised by ChatGPT's success: "We were very surprised. We had friends and family using it for a few months beforehand. There were some enthusiastic users, particularly those using it for coding, but overall, the excitement was muted. Not all users kept returning to it. I believe that once it became widely accessible, users taught each other how to use it effectively. The social element was crucial."

→ Sometimes testing with real users is not the whole truth, especially when it comes to new technology that might have use cases beyond the initial imagination.

Goku Mohandas, ML & Product, Anyscale & Philipp Moritz, Co-founder and CTO, Anyscale

Ray Summit Day 1 Keynote Watch the keynote

Discussing the development of a chat co-pilot for Ray documentation:

  • The assistant employs a hybrid routing solution based on a fine-tuned LLama2-70B. However, it redirects some of the more intricate queries to OpenAI's GPT-4, merging cost efficiency with superior quality when required.
  • To control or minimize hallucinations, they've implemented RAG and evaluations of queries. This in turn aids in routing queries between LLama2 and GPT-4.

Albert Greenberg, VP of Engineering, Uber & Wei Kong, Engineering Management, Uber

Ray Summit Day 2 Keynote Watch the keynote

  • Uber introduced an AI-powered coding assistant into their development tools, trained on their unique codebase, to enhance development speed and the user experience.
  • Their AI-driven app testing tool, DragonCrawl, leverages generative AI to replace manual tests and improve app quality.
  • Uber integrates both broad and task-specific LLMs in their AI toolkit.
  • On Generative AI's impact, they noted: "GenerativeAI democratizes and benefits almost everyone in the company."
  • They underscored Generative AI's primary roles in Creation, Summarization, Discovery, and Automation.

M Waleed Kadous, Chief Scientist, Anyscale

Open Source LLMs: Viable for Production or a Low-Quality Toy? Read slides

  • M Waleed Kadous of Anyscale discussed the distinctions between proprietary and open LLMs, their applications, and current gaps.
  • Anyscale's RayAssistant uses both Llama 2 models and GPT-4, with fine-tuned Llama models directing requests to the most suitable model.
  • Specialized fine-tuned models can outperform proprietary ones like GPT-3.5 or GPT-4.
  • Solely using GPT-4 would have yearly costs of ~$35,000; smart use of open models reduces this to around $900 annually.
  • Anyscale offers endpoints for Llama 2 models at $1/M tokens and plans to release fine-tuning endpoints later this year.

GenAI Collective Meetup — Highlights

We participated in the GenAI Collective meetup on September 18th in San Francisco. Werqwise co-working space was packed with people and the event was sold out. Meetup also had a few interesting product demos.

Matt Huang, Knowbl & Ryan Reede, MovieBot

Product demos

  • Knowbl, a platform that can process your knowledge base and then build on-site search, and agent assistant on top of it. Their copilot keeps the brand message in check by providing only pre-approved answers.
  • Moviebot, an application that lets you prompt a conversation between characters you have created with their editor, and then it uses a game engine to render a video from it.

Sophie McNaught, Vouch Insurance

Insuring GenAI Products

  • To drive the adoption of their genAI services, Microsoft and AWS already offer defense to users of their generative AI tools if they're sued for copyright infringement.
  • However, there are many use cases that they don't cover.
  • Insuring the inherent risks of using generative AI in products is emerging as an area of interest, and it's something founders should monitor closely.

Continue reading

Thoughts on the Model Context Protocol Part 2

In the previous post we introduced some concerns around Model Context Protocol, things have updated quite a bit, here's a quick review.

Valtteri Karesto

Valtteri Karesto

CTO, Founder

Thoughts on the Model Context Protocol

Model Context Protocol (MCP) is a new open protocol that aims to standardize how AI applications connect to custom data sources (files, photos) and tools (eg. functions that can fetch inform from third party systems). It was released by Anthropic, but in theory, any AI application could support it. At the moment, Claude Desktop and Cursor are among the most popular applications that support MCP. This means that third party developers can build custom tools and other capabilities host applications like Claude Desktop can use.

Valtteri Karesto

Valtteri Karesto

CTO, Founder

Synthetic Data in 2025: Revolutionizing GenAI Model Performance

How Synthetic Data is Powering the Next Generation of Efficient, Specialized AI Models

Valtteri Karesto

Valtteri Karesto

CTO, Founder

Optimizing current business vs. unlocking new business opportunities using GenAI

Transformative impact of Generative AI, particularly Large Language Models, on business optimization and the creation of new opportunities, highlighting its applications in various industries like real estate and the importance of exploring beyond current business models to fully leverage AI's potential

Tuomas Martin

Tuomas Martin

VP of Sales

The HARG Truth: AI's Need for the Human Element

The Human-Augmented Retrieval Generation (HARG) method builds upon the Retrieval Augmented Generation (RAG) model, integrating a crucial human touch to the pipeline.

Valtteri Karesto

Valtteri Karesto

CTO, Founder

Intentface: Human-Centric Computing Through Intent-Driven Interactions

Imagine a future where clicking dropdowns, filling input fields, and browsing through abstract data visualizations are things of the past. That's the future we can build with intentfaces.

Joni Juup

Joni Juup

CDO, Founder