What is LangChain in Python?
LangChain is a Python framework designed to automate AI applications and focus on real-time data analysis and integration with LLMs.
We’ll show you how to get started quickly building Python-based AI apps using LangChain and LangFlow to prototype and iterate, reducing the time and effort required to bring AI solutions to market.
LangChain is an open-source framework that developers use to chain components from multiple modules and data sources to build AI applications. Using techniques such as retrieval-augmented generation (RAG), one of the standout features of LangChain is how it integrates seamlessly with other tools and libraries to provide a unified developer platform. LangChain’s tools and APIs make it easy to code chatbots, virtual agents, and multi-agent AI applications with large language models like OpenAI.
LangChain’s Python library of pre-built components and off-the-shelf chains is the most popular way to use LangChain, reducing code complexity, and empowering devs to experiment efficiently. It’s a versatile choice for developers who deploy LangChain runnables and chains, transforming these elements into accessible REST APIs for users.
Additionally, developers can leverage LangFlow, a low-code visual tool that makes it easier to build LangChain-based AI applications.
In this article, we’ll look at the benefits of working with LangChain in Python. Then, we’ll show you how to get started quickly building Python-based AI apps using LangChain and LangFlow to prototype and iterate, reducing the time and effort required to bring AI solutions to market.
No matter which language you use, you can leverage LangChain to build complex generative AI applications with multiple components. These include multiple large language models (LLMs) as well as other data sources, such as data retrieved from RAG.
LangChain was originally built for Python but it now also boasts a powerful JavaScript library. There are a few similarities—as well as differences—between the two implementations.
Performance will vary between implementations. In general, Python, which is already a popular tool in the data engineering community, handles CPU-intensive tasks very well. JavaScript is more suited to dynamic and real-time interactions, particularly in customer-facing web-based apps.
Based on their GitHub repositories, the Python implementation remains most popular by far, with 92K stars and 14.7K forks, as opposed to the JavaScript project’s 12K stars and 2.1K forks. In terms of integration with other services, the JavaScript implementation is approaching parity with its Python counterpart. DataStax has integrations with both LangChain in Python and JavaScript frameworks.
LangChain is fairly easy to learn and isn’t any more difficult to use in Python versus JavaScript. What matters here is what language you already know. Typically, data engineers will already be well-versed in Python (and related data processing libraries, such as Pandas and NumPy). Meanwhile, front-end devs and Node.js developers will take more quickly to the JavaScript version.
Here’s why Python really shines:
Since it’s been the lingua franca for data workloads for years, there is a plethora of data processing, AI, and machine learning Python LangChain libraries. By contrast, the number of data processing and math libraries in JavaScript is significantly more limited.
Inheritance enables code reuse by defining code in a class and then subclassing it to add additional, context-specific behavior. Both Python and JavaScript support subclassing though in different ways—Python through class-based inheritance, and JavaScript via a prototype chain-based approach.
Python has a little more flexibility as it supports integers, floating-point numbers, and complex numbers. JavaScript has a single numeric type—a 64-bit floating point number.
JavaScript is built for high performance; frameworks such as Node.js have made it an extremely popular tool for server-side web development. Additionally, JavaScript is the common programming language of client-side scripting. (Note: you can build a server-side LangChain application in either Python or JavaScript, expose it via a REST API, and call either from JavaScript.)
Python also sports a number of frameworks for server-side development, including Django and Flask. This, plus Python’s popularity with data engineers, helps explain why it’s become the third most popular language in developer surveys, behind JavaScript and HTML/CSS.
The good news is that, whether you’re a Python or a JavaScript developer, you can incorporate LangChain into your GenAI-driven apps.
For Python developers, the language brings a lot of benefits when it comes to building AI applications:
At DataStax, we’ve worked hard to build support for LangChain—both Python and JavaScript—into our platform. DataStax simplifies building a full GenAI stack that incorporates context-specific data via RAG, giving LLMs the data they need to draw correct and up-to-date conclusions. The result is 20% higher relevance in LLM results with 80% lower total cost of ownership (TCO).
DataStax integrates with LangChain to build a processing pipeline that assists LLMs in producing more accurate and relevant responses to inputs. For example, using LangChain and DataStax, you can integrate RAG support by calling DataStax’s Astra DB vector database to retrieve context, which LangChain then incorporates into the prompt it sends to LLMs. Check out our LangChain Python integration!
Both DataStax and LangChain simplify the complexity involved in building GenAI apps, for you to move your ideas quickly from prototype to production.
Enough talking!
Let’s see how this works in practice. We’ll walk through the basics of how to build a RAG-powered LangChain LLM workflow using Langflow, a low-code tool for creating AI apps. We’ll use Langflow to help us generate the necessary Python code quickly.
To get started, create a new Astra account.
From the dashboard, select Databases > Create Database.
Make sure Serverless (Vector) is selected for the database type and give it a database name of MyFirstVectorDB.
For Region, select us-east-2.
Then, select Create Database.
Wait a few minutes until the message Your database is initializing… disappears and you see the details for your new vector database.
Congratulations—you’ve created a vector database!
It’s not particularly useful, though—it doesn’t have any data in it.
In the next step, you’ll load data into the database, so you’ll need credentials to connect back to Astra DB.
You can connect back to your Astra DB database using an API token. To generate a token, on your Astra DB page, in Application Tokens, select Generate Token.
Copy the token to a safe location on your local machine. You won’t be able to access this value again so, if you lose the token, delete the old one and generate a new one.
Next, let’s design a LangChain that creates two flows:
One of the easiest ways to work with LangChain is to use Langflow. Langflow is a visual IDE and framework for RAG (retrieval-augmented generation) and multi-agent AI applications that makes it easy to build, test, and deploy LangChain-based applications. It’s available as an open-source project that you can download and run locally. DataStax also provides Langflow as a platform as a service (PaaS) so that you can design and launch complex AI application stacks easily. Get started with DataStax Langflow for free!
We’ll use OpenAI as our LLM for this walkthrough.
If you don’t already have an OpenAI API key, create an account and generate one now.
From your Astra Dashboard, select Langflow from the application dropdown:
This will put you into the Langflow UI, where you can start a new project.
Select New Project > Vector Store RAG.
You’ll see a visual editor for the new LangChain flow which contains a number of components for two workflows: the document ingestion flow (at the bottom) and the RAG application flow (up top).
We’ll explain each of these components in more detail below.
Note that this diagram is a visual representation of Python code components.
You can switch between designing your application visually and editing the underlying Python at will.
For example, select the title bar of the document ingestion flow’s Astra DB component, select the ellipses in the popup menu, and then select Code.
Edit the Python code snippet here to your liking and then return to the visual diagram. Design your application visually while dropping down to code to implement more complex logic.
To make this work for your use case, let’s configure several of these components:
First, let’s add the OpenAI key to the OpenAI components in each flow by creating a new global variable. Zoom in the first OpenAI component, named OpenAI, and, in OpenAI API Key, select the globe icon.
Select Add New Variable.
In the Create Variable window, give it a Variable Name of OpenAIAPIKey and a Type of Credential.
For Value, paste in your OpenAI API key from earlier.
Then, in Apply To Fields, select OpenAI API Key.
Finally, select Save Variable.
Go to the second OpenAI component, OpenAI Embeddings, and select the variable you just created for the value of OpenAI API Key.
You’ll also need to specify credentials for the Astra DB database. In the Astra DB component for the document ingestion flow, use the same process above to create a token for your Astra DB database called AstraDBApplicationToken.
Then, set the following fields:
Make sure to apply these same settings to the second Astra DB component for the RAG application flow.
Now, we’ll need to ingest our data. For this example, we’ll use a set of question/answer data generated from United States Securities and Exchange Commission 10Q financial report filings in CSV format. In the File component at the beginning of the document ingestion chain, select Path and then select the CSV file.
LangChain is built on modular components known as LangChain primitives. These components include models, chains, memory, prompt templates, indexes, and output parsers. Together, they form a flexible and powerful framework for developing AI solutions.
The LangChain chains above contain a fair amount of complexity. Here are the major components involved in these flows:
Now you know how to use components, chains, and security best practices to generate a LangChain with Python application using Langflow. We also showed a basic use case for adding your own supplementary data using retrieval-augmented generation, or RAG, based on queries to a vector database.
We encourage you to explore further resources and start building your own LangChain projects, quickly moving from prototype to production.
This is just the tip of the iceberg of what you can accomplish with RAG and LangChain in Python.
Happy coding!
LangChain is a Python framework designed to automate AI applications and focus on real-time data analysis and integration with LLMs.
LangChain itself doesn't directly execute Python code. LangChain is a framework for developing applications powered by language models, and it provides tools and utilities for working with various AI models and data sources.
However, LangChain does offer integration with tools that can execute Python code. For example:
These integrations allow you to execute Python code within a LangChain application using a Python interpreter or environment in which the LangChain application is running. So if you're looking to execute Python code as part of a larger language model-powered application, LangChain provides the framework to integrate such functionality.
LangChain helps AI developers use language models with other sources of data.
LangChain is used because it simplifies the process of working with large language models (LLMs) and provides a unified developer platform for building advanced AI applications. Here's why we use LangChain, incorporating the key concepts you've mentioned:
By offering these features, LangChain significantly reduces the complexity of working with LLMs and provides a comprehensive toolkit for developers to create sophisticated AI applications.
The current LangChain supports using the ChatGPT plugin but the response is not robust as there are many errors when the number exceeds a token.