Langfuse raises $4M
We're thrilled to announce that Langfuse has raised a $4m seed round from Lightspeed Venture Partners, La Famiglia and Y Combinator.
With Langfuse, we are building open source observability and product analytics for LLM-based applications. Our goal is to enable product and engineering teams to harness the power of Generative AI with the most useful and open suite of devtools focused on visibility and insights.
When we first joined Y Combinator for the winter batch in early 2023, we iterated quickly on a number of LLM-native applications ranging from LLM-powered web-scraping — for research and go-to-market use cases — to generating engaging educational content on virtually any topic. All these applications shared a common trait: the excitement of early users (including us, of course) surrounding the initial demo and the technology's potential to solve problems that historically demanded extensive manual work or specialized model training.
While the potential of LLMs to solve these problems was clear, we experienced the frustrating challenges of going from an impressive prototype to a reliable production-grade product. We quickly realized we weren’t alone in facing those challenges! Many teams from startups to enterprises have done the work of identifying high-impact applications of GenAI for their internal workflows or consumer-facing products. Many also built POCs for the most promising ones. However, consumers often do not see these products as most teams are reluctant to push them into production due to a lack of visibility and trust in these stochastic/complex systems.
We took these first-hand learnings and began working on solutions. As engineers, we initially worked on a test framework to increase the confidence in application outputs while overcoming the problems when applying traditional CI frameworks – which are management of test set, latency and cost of running model-based evaluations in CI, and finding good evaluations in the first place. While this was a good contribution, we quickly realized that tests give a false sense of security as most teams have insufficient test data and evaluations which deviate a lot from the user experience in production. Thus, we built a first version of Langfuse focused on logging, debugging and analyzing LLM apps and tested it with other founders. After very positive initial feedback, we decided to open-source the project (repo (opens in a new tab)) and became product of the day when launching on Product Hunt in August.
“We were one of the first customers of Langfuse and it has really made our lives easier by meeting all our LLM analytics needs, allowing us to focus on building our product with great visibility on how it's used in production and how to best improve it. The team is super nice, responsive and dedicated to building the best observability and analytics solution. We're truly happy customers!”
– Yan Fu, Co-founder & CEO, Berry, AI-powered CSMs to automate SaaS onboarding
“We've used Langfuse since inception and it's been of great value while we scaled Alphawatch from POCs to production deployments at larger enterprises. It helped us to ship quickly, act on issues as they arise and saved us many hours of prompt and latency optimization.”
– Jackson Chen, Co-founder & CTO, Alphawatch, AI business automation for knowledge work
Challenges of building LLM applications and how Langfuse helps
In implementing popular LLM use cases – such as retrieval augmented generation, agents using internal tools & APIs, or background extraction/classification jobs – developers face a unique set of challenges that is different from traditional software engineering:
Tracing & Control Flow: Many valuable LLM apps rely on complex, repeated, chained or agentic calls to a foundation model. This makes debugging these applications hard as it is difficult to pinpoint the root cause of an issue in an extended control flow.
With Langfuse, it is simple to capture the full context of an LLM application. Our client SDKs and integrations are model and framework agnostic and able to capture the full context of an execution. Users commonly track LLM inference, embedding retrieval, API usage and any other interaction with internal systems that helps pinpoint problems. Users of frameworks such as Langchain benefit from automated instrumentation, otherwise the SDKs offer an ergonomic way to define the steps to be tracked by Langfuse.
Output quality: In traditional software engineering, developers are used to testing for the absence of exceptions and compliance with test cases. LLM-based applications are non-deterministic and there rarely is a hard-and-fast standard to assess quality. Understanding the quality of an application, especially at scale, and what ‘good’ evaluation looks like is a main challenge. This problem is accelerated by changes to hosted models that are outside of the user’s control.
With Langfuse, users can attach scores to production traces (or even sub-steps of them) to move closer to measuring quality. Depending on the use case, these can be based on model-based evaluations, user feedback, manual labeling or other e.g. implicit data signals. These metrics can then be used to monitor quality over time, by specific users, and versions/releases of the application when wanting to understand the impact of changes deployed to production.
Mixed intent: Many LLM apps do not tightly constrain user input. Conversational and agentic applications often contend with wildly varying inputs and user intent. This poses a challenge: teams build and test their app with their own mental model but real world users often have different goals and lead to many surprising and unexpected results.
With Langfuse, users can classify inputs as part of their application and ingest this additional context to later analyze their users behavior in-depth.
Why now
Observability and monitoring may seem like very serious terms in an industry that is so nascent, but given the stakes for production-applications, companies shouldn't just "hope for the best and plan for the worst". Langfuse is here to help them "plan for the best and mitigate the worst" by providing the data architecture for the whole lifecycle of an LLM application.
The open data layer for the whole lifecycle
Langfuse establishes an open-source data model to track all invocations throughout the lifecycle of an application. Thereby, teams have the full history to evaluate how usage patterns changed, new use cases emerged, and to understand the impact of new releases on production metrics. This data should not be in a closed ecosystem, but needs to be open. That’s why all our SDKs, integration and application are open source under MIT license, simple to self host and have a GET API to reuse the data downstream (e.g. for usage based pricing based on token usage by customer).
We also deeply believe that there is a unique opportunity to build an integrated, yet independent open source data layer that helps teams switch models and application or evaluation frameworks for specific use cases. We are excited about all of our partnerships around integrations (LiteLLM, Flowise, Langflow) and evaluations (Ragas).
“With RAGAS, we're running a popular open-source project focused on evals of RAG-applications. The Langfuse team has built a very generalized solution to effectively log LLM-based applications in production irrespective of the model or framework that's used. We are excited about our collaboration as it helps users to apply our RAGAS evals on production data to gain valuable insights over time.”
– Shahul Es, Founder, Ragas, evaluation framework for RAG applications
Where we are at
Since the launch, we’ve onboarded numerous teams onto our managed cloud and self-hosting offerings. We are especially excited that we hit a nerve with individual contributors in enterprises who host Langfuse to move fast with their applications while retaining ownership of sensitive production data. At the core, most users find a lot of value in debugging their complex application by inspecting a subset of problematic traces while having a high-level understanding of important metrics (cost, latency and overall quality).
“When it comes to productionising LLMs in the enterprise and especially in highly-regulated industries, Langfuse as one of the LLM monitoring solutions has been incredibly helpful to monitor user activity, solution performance and cost while being able to self-host strictly confidential data. Also, it is easy to evaluate our RAG application based on model-based evals and user feedback to quickly improve features in our product. More importantly, Langfuse is supported by a strong and highly responsive team of technical experts. I look forward to following the project and using it for more of my client projects!”
– Kai, Sr Software Engineer, BCG X
What’s next
After having built the data and observability layer we close the loop and return to the problem of testing applications before they move to production. We aim to solve the problems we initially identified around selecting the best evaluations and maintaining a test set that is representative of production inputs. With Langfuse, users can quickly append new items to these test sets when identifying new edge cases in production.
Based on the foundation of our existing observability stack, we are building out the leading open source LLM analytics platform. The job of our users is not easy as there is lots of short-term noise about frameworks, models, and emerging techniques for working with LLMs. These are exciting yet hard to navigate times. We aspire to guide developers with the right insights to make informed decisions grounded in real-world data, user feedback and evaluations.
We are open, the project is super easy to self-host, and you can try Langfuse Cloud without a credit card. Join us on Discord, give Langfuse a spin (sign-up (opens in a new tab)), or try the live demo.
Thank you all!
Clemens, Max and Marc
PS: If you're intrigued by solving some of the hard problems we've laid out above, don't hesitate to reach out, we'll need smart and driven people on the journey alongside us!