More companies are interested in deplying generative AI (GenAI) solutions than ever before. However, there remains a stubborn gap between planning and implementation.
In one recent survey, Deloitte found that 67% of organizations it polled were investing in GenAI after discovering early value in initial experiments. However, 68% of them also admitted they had moved a third or fewer of these experiments into production.
In our experience, companies that have success shipping their GenAI solutions to production have taken the time to lay a solid foundation. Creating a solid AI stack is one of the driving factors that enables teams to move apps quickly from idea to reality. In this post, we'll look at what an AI stack is and the essential components required to drive implementation.
What is an AI stack?
An AI stack is a collection of technologies and infrastructure components designed for creating, deploying, and monitoring AI applications. It includes infrastructure for creating both types of AI apps - traditional AI apps that operate within a defined workflow and GenAI apps that use patterns and probability to create new content.
The AI stack encompasses components required to train, fine-tune, and provide context to ML pipelines or AI models. But it also includes tools for quickly prototyping and productizing AI apps so that teams can assess and monitor their quality, performance, and security both before and after release.
Benefits of an AI stack
An AI stack identifies and commoditizes the components common to most AI app solutions. This saves dev teams from having to reinvent the wheel each time and enables:
- Decreased app development time - Supports reusable components common across all solutions that developers can assemble quickly to create new AI applications. No-code and low-code development environments can decrease development time even further by supporting composing new apps quickly as visual workflows.
- Improved results accuracy - Provides tools for testing the accuracy of responses and feeding data back into the system to enable self-improvement.
- Productization of apps - Adds out-of-the-box support for monitoring aspects such as data quality and latency of responses, as well as monitoring and auditing data access to ensure security and compliance.
- Low-latency performance - Controls for the latency that can be introduced in complex AI applications that rely on multiple interconnected components. An AI stack can identify and reduce request and response time between components, ensuring fast response times to customer queries.
Components of an AI stack
While AI applications will differ in complexity, most will make use of the following components:
- Data
- Models
- Memory
- Agents and orchestrators
- Tools
- Monitoring
- Development
Let’s look at each one of these.
Data
By far, data is the most important element of an AI application. Whether you're creating your own models or supplementing a commercial model with your own domain-specific context, your AI apps need tons of high-quality data to produce accurate results.
GenAI data will include both structured data as well as unstructured data —customer chat logs, manuals, online wikis, and multimodal data such as audio, images, and video. Using data for GenAI requires:
Data collection - Data is gathered from multiple data sources and put into a format suitable either for training or supplementing an AI model. This process can include using batch-oriented data pipelines—extract, transform, load (ETL) and extract, load, transform (ELT) processes—as well as real-time data streams to capture up-to-date information (weather data, financial transactions, IoT sensors, etc.)
Data preprocessing and embedding - The data pre-processing phase ensures data quality, identifying missing data and fixing errors. It also develops data features, pulling out the most relevant facts from data to suit your use cases and reducing noise.
The data is then stored in a specialized format for later retrieval. This format will depend on the data’s intended purpose (we’ll get into that more below).
Models
The next most important component is your model. For GenAI applications, this will generally be a large language model (LLM), massive neural networks trained on large amounts of data that use probability to mimic human thought. LLMs can accept natural language queries as requests and respond with generated text, audio, or video.
Some teams may also elect to use so-called small language models, which are purpose-built models meant to solve specific problems.
The applicability of an LLM for specific use cases will be a significant driver in the quality and accuracy of your app’s responses. Because LLMs are expensive to build—and take years to get right—most GenAI use cases will use a commercial or open-source LLM, such as GPT, Llama, or Mistral.
Different LLMs excel at different tasks. For example, GPT tends to be better at code generation and assistance, where a model like Cohere’s Command is better at document text extraction. You’ll need to provide teams some flexibility in identifying and choosing from a range of LLMs that fit a particular use case’s needs.
You may also further choose to fine-tune an LLM. This re-trains a subset of its billions of parameters to obtain either better response accuracy or change the tone or style of an LLM’s responses.
Memory
AI applications use two kinds of memory: parametric memory and non-parametric memory. Parametric memory, which is represented as a set of weights in a neural network, is the memory that comprises the core of a trained LLM using a neural model such as BERT or GPT. Non-parametric memory, by contrast, is memory retrieved from an information retrieval system, such as a collection of product user manuals or wiki documents.
Why do we care about this? Because non-parametric memory is critical for providing relevant context.
While LLMs excel at language generation and answering general questions, they don’t possess context specific to your use cases - e.g., knowledge of your products, or past support calls with customers, for example. And building this into an LLM - e.g., through fine-tuning - is expensive and time-consuming.
As a result, retrieval-augmented generation (RAG) has emerged as a technique for combining an LLM’s parametric memory with non-parametric memory from another data store. This is typically a vector database or graph database that enables searching for related information via a nearness search or graph traversal. Your GenAI app can then include this information as context in its prompt to an LLM.
RAG is cheaper and faster to implement than training or fine-tuning an LLM and can be updated easily. Using RAG results not only in more up-to-date answers, but also more accurate answers when compared with other LLM refinement methods, such as fine-tuning.
Agents and orchestrators
Agents are autonomous units of code that use LLMs to complete complex business tasks. Agents can be as simple as a single call to an LLM. More complex agents, however, can use techniques such as agentic RAG to create dynamic workflows that break complex requests into subtasks.
This is a major evolution from the simple RAG approach described above. It opens the door to creating applications that are complex networks of intelligent agents, each specializing in specific tasks. For example, a virtual travel assistant may rely on numerous other agents to perform subtasks such as booking flights, finding hotels, arranging transport, and planning a tour.
Tools
A tool that is any resource—for example, an API — that an agent uses to connect to external environments to retrieve information, help with tasks, or even perform actions on a user's behalf. Intelligent agents can incorporate tools into their dynamic workflows—for example, retrieving information on the user's current 401(K) investments or booking a flight on a specific airline.
Monitoring
A critical component of any production application is monitoring and observability. System engineers and site reliability engineers (SREs) need to monitor GenAI apps like any other app to track reliability and performance. Prior to launch, this should also include extensive testing and monitoring of key metrics related to response accuracy, relevance, and safety.
Using packages such as Grafana’s AI observability solution, you can integrate such metrics into your GenAI apps easily without a lot of heavy lifting, including:
- Performance monitoring
- Cost optimization
- End-to-end tracing so your team can analyze model predictions in response to user issues
- Prompt and response tracking so you can evaluate prompt effectiveness and the overall quality of user interactions.
Development
You can build AI applications in the language of your choosing. However, they're much easier to build with a development platform that supports constructing new apps quickly from reusable components.
A good example is the LangChain framework, which provides support for calling LLMs, integrating memory from vector and graph stores, and building agents using a simple syntax. LangFlow, which is built on top of LangChain, provides a visual builder for creating, testing, and deploying new GenAI apps at scale using both no-code and low-code approaches.
Developing your AI stack with DataStax
Building your own AI stack from scratch can take time and a lot of trial and error. DataStax provides a set of integrated development tools for building production-ready GenAI apps, including the Astra DB vector store, LangFlow, and tools to support ingesting data and deploying applications on the top AI-cloud providers.
See how DataStax can help you move your GenAI apps quickly from prototype to production - learn more about the platform, or try it out for free.