GuideSep 27, 2024

Build Your First LangChain Python Application

We’ll show you how to get started quickly building Python-based AI apps using LangChain and LangFlow to prototype and iterate, reducing the time and effort required to bring AI solutions to market.

Get Started with DataStax Langflow
Build Your First LangChain Python Application

LangChain is an open-source framework that developers use to chain components from multiple modules and data sources to build AI applications. Using techniques such as retrieval-augmented generation (RAG), one of the standout features of LangChain is how it integrates seamlessly with other tools and libraries to provide a unified developer platform. LangChain’s tools and APIs make it easy to code chatbots, virtual agents, and multi-agent AI applications with large language models like OpenAI.

LangChain as a Python implementation remains the most popular on GitHub by far, with 90k stars

LangChain’s Python library of pre-built components and off-the-shelf chains is the most popular way to use LangChain, reducing code complexity, and empowering devs to experiment efficiently. It’s a versatile choice for developers who deploy LangChain runnables and chains, transforming these elements into accessible REST APIs for users.

Additionally, developers can leverage LangFlow, a low-code visual tool that makes it easier to build LangChain-based AI applications.

In this article, we’ll look at the benefits of working with LangChain in Python. Then, we’ll show you how to get started quickly building Python-based AI apps using LangChain and LangFlow to prototype and iterate, reducing the time and effort required to bring AI solutions to market.

JUMP TO THE QUICKSTART

LangChain in Python vs LangChain.js: a comparison

No matter which language you use, you can leverage LangChain to build complex generative AI applications with multiple components. These include multiple large language models (LLMs) as well as other data sources, such as data retrieved from RAG.

LangChain was originally built for Python but it now also boasts a powerful JavaScript library. There are a few similarities—as well as differences—between the two implementations.

Speed and performance

Performance will vary between implementations. In general, Python, which is already a popular tool in the data engineering community, handles CPU-intensive tasks very well. JavaScript is more suited to dynamic and real-time interactions, particularly in customer-facing web-based apps.

Popularity and integrations

Based on their GitHub repositories, the Python implementation remains most popular by far, with 92K stars and 14.7K forks, as opposed to the JavaScript project’s 12K stars and 2.1K forks. In terms of integration with other services, the JavaScript implementation is approaching parity with its Python counterpart. DataStax has integrations with both LangChain in Python and JavaScript frameworks.

Learning curve

LangChain is fairly easy to learn and isn’t any more difficult to use in Python versus JavaScript. What matters here is what language you already know. Typically, data engineers will already be well-versed in Python (and related data processing libraries, such as Pandas and NumPy). Meanwhile, front-end devs and Node.js developers will take more quickly to the JavaScript version.

DataStax makes it easy to build a full GenAI stack that feeds context-specific data via RAG to LLMs

Other data processing modules & LangChain libraries

Here’s why Python really shines:

Since it’s been the lingua franca for data workloads for years, there is a plethora of data processing, AI, and machine learning Python LangChain libraries. By contrast, the number of data processing and math libraries in JavaScript is significantly more limited.

Inheritance

Inheritance enables code reuse by defining code in a class and then subclassing it to add additional, context-specific behavior. Both Python and JavaScript support subclassing though in different ways—Python through class-based inheritance, and JavaScript via a prototype chain-based approach.

Numeric types

Python has a little more flexibility as it supports integers, floating-point numbers, and complex numbers. JavaScript has a single numeric type—a 64-bit floating point number.

Web and mobile development

JavaScript is built for high performance; frameworks such as Node.js have made it an extremely popular tool for server-side web development. Additionally, JavaScript is the common programming language of client-side scripting. (Note: you can build a server-side LangChain application in either Python or JavaScript, expose it via a REST API, and call either from JavaScript.)

Python also sports a number of frameworks for server-side development, including Django and Flask. This, plus Python’s popularity with data engineers, helps explain why it’s become the third most popular language in developer surveys, behind JavaScript and HTML/CSS.

most popular technologies language

Why DataStax values LangChain

The good news is that, whether you’re a Python or a JavaScript developer, you can incorporate LangChain into your GenAI-driven apps.

For Python developers, the language brings a lot of benefits when it comes to building AI applications:

  • Python’s simplicity and readability make it easy to code complex AI tasks in ways that other devs on a team quickly understand. This improves the long-term maintainability of your code base.
  • Python has a mature ecosystem and an extensive set of libraries. In particular, as noted above, it supports a number of libraries related to math, data representation, and data processing. That’s made it popular among many AI application developers.
  • Many teams are already using Python for data processing. That makes using LangChain from Python a natural choice.

At DataStax, we’ve worked hard to build support for LangChain—both Python and JavaScript—into our platform. DataStax simplifies building a full GenAI stack that incorporates context-specific data via RAG, giving LLMs the data they need to draw correct and up-to-date conclusions. The result is 20% higher relevance in LLM results with 80% lower total cost of ownership (TCO).

DataStax integrates with LangChain to build a processing pipeline that assists LLMs in producing more accurate and relevant responses to inputs. For example, using LangChain and DataStax, you can integrate RAG support by calling DataStax’s Astra DB vector database to retrieve context, which LangChain then incorporates into the prompt it sends to LLMs. Check out our LangChain Python integration!

Both DataStax and LangChain simplify the complexity involved in building GenAI apps, for you to move your ideas quickly from prototype to production.

Python's simplicity and readability make it easy to code complex AI tasks

Quickstart: How to use LangChain in Python with DataStax

Enough talking!

Let’s see how this works in practice. We’ll walk through the basics of how to build a RAG-powered LangChain LLM workflow using Langflow, a low-code tool for creating AI apps. We’ll use Langflow to help us generate the necessary Python code quickly.

To get started, create a new Astra account.

From the dashboard, select Databases > Create Database.

Astra dashboard databases screen

Make sure Serverless (Vector) is selected for the database type and give it a database name of MyFirstVectorDB.

For Region, select us-east-2.

Then, select Create Database.

Astra create database dialog

Wait a few minutes until the message Your database is initializing… disappears and you see the details for your new vector database.

Astra vector database screen

Congratulations—you’ve created a vector database!

It’s not particularly useful, though—it doesn’t have any data in it.

In the next step, you’ll load data into the database, so you’ll need credentials to connect back to Astra DB.

You can connect back to your Astra DB database using an API token. To generate a token, on your Astra DB page, in Application Tokens, select Generate Token.

Astra generate application API token screen

Copy the token to a safe location on your local machine. You won’t be able to access this value again so, if you lose the token, delete the old one and generate a new one.

Creating a Langflow

Next, let’s design a LangChain that creates two flows:

  • A document ingestion flow, which imports new documents and embeds them into the vector database; and
  • A RAG application flow, which responds to queries and generates new responses based on the documentation embedded in your database.

One of the easiest ways to work with LangChain is to use Langflow. Langflow is a visual IDE and framework for RAG (retrieval-augmented generation) and multi-agent AI applications that makes it easy to build, test, and deploy LangChain-based applications. It’s available as an open-source project that you can download and run locally. DataStax also provides Langflow as a platform as a service (PaaS) so that you can design and launch complex AI application stacks easily. Get started with DataStax Langflow for free!

We’ll use OpenAI as our LLM for this walkthrough.

If you don’t already have an OpenAI API key, create an account and generate one now.

From your Astra Dashboard, select Langflow from the application dropdown:

Astra dashboard DataStax Langflow application

This will put you into the Langflow UI, where you can start a new project.

DataStax Langflow UI where you can start a new project

Select New Project > Vector Store RAG.

You’ll see a visual editor for the new LangChain flow which contains a number of components for two workflows: the document ingestion flow (at the bottom) and the RAG application flow (up top).

We’ll explain each of these components in more detail below.

visual editor for the new LangChain flow which contains a number of components for two workflows

Note that this diagram is a visual representation of Python code components.

You can switch between designing your application visually and editing the underlying Python at will.

For example, select the title bar of the document ingestion flow’s Astra DB component, select the ellipses in the popup menu, and then select Code.

underlying Python code components snippet

Edit the Python code snippet here to your liking and then return to the visual diagram. Design your application visually while dropping down to code to implement more complex logic.

To make this work for your use case, let’s configure several of these components:

First, let’s add the OpenAI key to the OpenAI components in each flow by creating a new global variable. Zoom in the first OpenAI component, named OpenAI, and, in OpenAI API Key, select the globe icon.

OpenAI component in DataStax Langflow flow

Select Add New Variable.

In the Create Variable window, give it a Variable Name of OpenAIAPIKey and a Type of Credential.

For Value, paste in your OpenAI API key from earlier.

Then, in Apply To Fields, select OpenAI API Key.

Finally, select Save Variable.

Create Variable window to give a Variable Name of OpenAIAPIKey and a Type of Credential

Go to the second OpenAI component, OpenAI Embeddings, and select the variable you just created for the value of OpenAI API Key.

You’ll also need to specify credentials for the Astra DB database. In the Astra DB component for the document ingestion flow, use the same process above to create a token for your Astra DB database called AstraDBApplicationToken.

Then, set the following fields:

  • Database: Select the database you created above
  • Collection: Select Create new collection and create a collection named documents with a Dimensions setting of 1.

Make sure to apply these same settings to the second Astra DB component for the RAG application flow.

Now, we’ll need to ingest our data. For this example, we’ll use a set of question/answer data generated from United States Securities and Exchange Commission 10Q financial report filings in CSV format. In the File component at the beginning of the document ingestion chain, select Path and then select the CSV file.

File component at the beginning of the document ingestion chain

Components

LangChain is built on modular components known as LangChain primitives. These components include models, chains, memory, prompt templates, indexes, and output parsers. Together, they form a flexible and powerful framework for developing AI solutions.

The LangChain chains above contain a fair amount of complexity. Here are the major components involved in these flows:

  • LLM. The large language model you specify (in this case, OpenAI) is a foundational model that is trained with basic generative AI capabilities.
  • RAG (retrieval-augmented generation). RAG builds on the LLM by querying an auxiliary data store—in this case, our vector database, Astra DB—to provide more context-specific data relevant to our user’s expected questions. For a customer support chatbot, for example, this might be a database of Knowledge Base articles and successfully resolved customer support cases.
  • LangChain Python agents. As noted above, each component in this flow is a Python code component. When you run a flow on DataStax Langflow, it runs on our cloud infrastructure.
  • Chains. Chains are workflows that link together multiple components that perform complex tasks (like data ingestion or user interaction). In the above example, we had two chains: one for data ingestion and one that acts as a user-facing chatbot application. You can change the workflow by adding and removing components in the chain.
  • Memory. Memory components store and manage the state of an application, so it maintains context across interactions. This is particularly useful for applications like chatbots, where maintaining a coherent conversation requires tracking previous exchanges.
  • Prompt templates. Prompt templates specify a parameterized prompt in an LLM query. They standardize the way queries are formulated, ensuring consistent and relevant responses from the models. For example, you can use prompt templates to adjust variables such as the record count requested or to supply additional information retrieved via RAG.
  • Indexes. Indexes efficiently preprocess and store data to aid in faster retrieval for specific query types. For RAG, this means converting data to a vector format to easily search the content of a document.
  • Output parsers. The output parser defines how the responses from the LLMs are processed and formatted so output is structured and ready for use in the application.

How to augment your data with RAG

Now you know how to use components, chains, and security best practices to generate a LangChain with Python application using Langflow. We also showed a basic use case for adding your own supplementary data using retrieval-augmented generation, or RAG, based on queries to a vector database.

We encourage you to explore further resources and start building your own LangChain projects, quickly moving from prototype to production.

This is just the tip of the iceberg of what you can accomplish with RAG and LangChain in Python.

Happy coding!

FAQs

What is LangChain in Python?

LangChain is a Python framework designed to automate AI applications and focus on real-time data analysis and integration with LLMs.

Can LangChain run Python code?

LangChain itself doesn't directly execute Python code. LangChain is a framework for developing applications powered by language models, and it provides tools and utilities for working with various AI models and data sources.

However, LangChain does offer integration with tools that can execute Python code. For example:

  • Python REPL tool: LangChain has a PythonREPL tool that can execute Python code within a LangChain application.
  • Jupyter Notebook integration: LangChain can be used within Jupyter Notebooks, where Python code can be executed.
  • Custom tools: You can create custom tools in LangChain that execute Python code as part of your application logic.

These integrations allow you to execute Python code within a LangChain application using a Python interpreter or environment in which the LangChain application is running. So if you're looking to execute Python code as part of a larger language model-powered application, LangChain provides the framework to integrate such functionality.

Why do we use LangChain?

LangChain helps AI developers use language models with other sources of data.

LangChain is used because it simplifies the process of working with large language models (LLMs) and provides a unified developer platform for building advanced AI applications. Here's why we use LangChain, incorporating the key concepts you've mentioned:

  • Integration with Language Models: LangChain provides a standard interface to interact with various chat models and LLMs, making it easier to experiment with and deploy different models.
  • Structured Output: It offers tools for generating structured output from language models, enhancing the usability of AI-generated content.
  • Chains and Primitives: LangChain's components include chains and primitives, which are building blocks for creating complex AI workflows. These include off-the-shelf chains for common tasks and the ability to create custom chains.
  • LangChain Expression Language (LCEL): This domain-specific language allows for more flexible and powerful ways to compose LangChain components.
  • Runnables: LangChain runnables provide a way to package and deploy LangChain applications efficiently.
  • Vector Databases: It integrates seamlessly with vector databases, enabling efficient storage and retrieval of vector embeddings for semantic search and similarity comparisons.
  • Retrieval-Augmented Generation (RAG): LangChain simplifies the implementation of RAG techniques, allowing models to access and use external knowledge.
  • Prompt Engineering: With prompt templates, LangChain makes it easier to design and manage prompts for language models.
  • End-to-End Agents: It provides tools for creating intelligent agents that can perform complex tasks by breaking them down into subtasks.
  • Python Integration: LangChain is built with Python, allowing developers to leverage Python's ecosystem. You can easily import optional dependencies and use existing Python libraries alongside LangChain.
  • Machine Learning Pipeline: It serves as a crucial component in machine learning pipelines, especially for natural language processing tasks.
  • Extensibility: LangChain's architecture allows for easy extension of existing chains and creation of custom components.
  • Shipping AI Apps: It provides the necessary tools and abstractions to quickly develop and ship LangChain apps in production environments.

By offering these features, LangChain significantly reduces the complexity of working with LLMs and provides a comprehensive toolkit for developers to create sophisticated AI applications.

Does ChatGPT use LangChain?

The current LangChain supports using the ChatGPT plugin but the response is not robust as there are many errors when the number exceeds a token.

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.