Artificial Intelligence

Today’s GenAI Stack

In the digital age, where innovation moves at lightspeed, generative AI (GenAI) applications stand at the forefront of technological advancement. These applications are transforming industries by boosting creative processes, augmenting office work, and solving complex problems with an almost human-like touch. But what does it take to build a GenAI application that's both powerful and user-friendly? In this blog post, we'll break down the essential components of today's GenAI applications, from the AI models that drive content generation to the user interfaces that connect users with these applications. Our aim is to demystify what it takes to build solutions that are not only powerful but also user-friendly, equipping you with the core concepts to drive GenAI innovation.

A generative AI (GenAI) application is a complex interplay of components, each essential for crafting a user experience that's both smooth and intelligent. At the core sits the AI model, the driving force capable of generating human-like text and images. To function, the AI model relies on powerful servers – an inference infrastructure designed to process data and deliver swift responses from the AI. Beyond the AI model and inference infrastructure, there's a need for backend logic that orchestrates the AI's content generation and provides AI access to supplementary utilities such as knowledge bases, web search, and analytics engines. The user interface (UI) acts as the gateway to the application, providing a user-friendly and compelling platform for interaction.

These components work together to create a modern generative AI application. In the following sections, we'll take a closer look at each part of this system, unpacking how they work and fit together.

Key components of a generative AI application

AI Model: The Brains of the Operation

An AI model is at the center of any generative AI application, with foundational models like GPT-4 being a prominent example for text-based tasks. These foundational models, particularly Large Language Models (LLMs), are pre-trained on extensive datasets, priming them for generating text that is both coherent and contextually relevant to user inputs. The foundational models' ability to adapt to various tasks without additional training makes them a cornerstone for many applications.

The key to directing these models is the art of prompt engineering. Developers craft specific prompts – instructions fed into the AI – that guide the AI to operate and respond in a specific way. See below for an example of prompt engineering. This technique is crucial for eliciting desired behavior from the AI and for providing the AI with information on what's relevant to the task at hand. Sometimes, however, the results from prompt engineering aren't quite hitting the mark, and the developers can turn to fine-tuning. This process involves additional training on small curated datasets, allowing for more precise control over the AI's responses. Fine-tuning tailors the model's output to better suit specific needs, sharpening its effectiveness for the task at hand.

Example of prompt engineering: In this example, we present two UI sketches to the AI model and request an explanation of their differences. Version 1 showcases the AI's default response: a comprehensive and detailed analysis. Version 2 illustrates how modifying the prompt leads to a concise and summarized comparison by the AI. The AI model used in this example was OpenAI’s gpt-4-vision-preview.

Choosing the right AI model is crucial for the success of generative AI applications. OpenAI's GPT-4 stands out as the best performing, state-of-the-art model that is ready for a wide array of use cases and soon to support multimodal interactions. It boasts extensive multi languagesupport but comes with a higher cost and is accessible only through API. In contrast, Anthropic's Claude series offers a model with quadruple the 'context' memory of GPT-4's production models (Note: OpenAI has recently released a new model – currently in preview – that narrows the gap with Anthropic's top model in context window length), making it ideal for applications requiring deeper context understanding. It's not only more affordable but also engineered for safer outputs, though like GPT-4, it's available solely via API.

Meanwhile, e.g., Meta's LLaMa series caters to those seeking flexibility in deployment and cost, with various model sizes that allow for a tailored speed-performance balance. Its permissive licensing is a game-changer, granting the freedom to host the models on any cloud provider or in-house. 

Developers must consider these strengths and limitations, balancing cost, performance, and operational needs to find the most suitable AI model for their projects.

Inference Infrastructure: The Muscle Behind GenAI

For a generative AI application to respond quickly and handle heavy workloads, it needs strong inference infrastructure – the platform where the AI model actively runs, processing user inputs in real-time. This is like the engine room where all the heavy lifting happens. Powerful GPUs are usually the go-to for this job because they can process AI models quickly and efficiently.

Several cloud platforms provide inference as a service, allowing developers to dedicate their efforts to crafting generative AI applications rather than managing complex GPU clusters. Among these tools, Azure Machine Learning also offers a comprehensive built-in catalog of the latest foundational models, AWS SageMaker boasts a notable integration with HuggingFace – a vast repository for AI models – for streamlined model deployment, and Databricks provides serverless model inference on GPUs, which simplifies scaling and reduces operational overhead by abstracting away the infrastructure management even further.

In addition to inference-as-a-service, Cloud services help developers by offering specific AI models on a pay-per-use basis, which is ideal for quickly spinning up a GenAI application. However, developers may face constraints when choosing AI models, as some – like OpenAI's offerings – are available exclusively through model provider's own API service or selected cloud provider.

When picking where to run their AI, developers have to think about more than just power and price. They also need to consider how well the new tech will fit with their existing cloud setup. It's all about finding the right mix of performance, cost, and compatibility with the cloud setup they have in place to get their generative AI application running smoothly. 

Backend logic: Orchestrating GenAI Functionality

The backend logic of a generative AI application is the conductor behind the scenes, ensuring that the entire system operates in harmony. Its fundamental role is to manage the flow of information: taking in user input, relaying it to the AI model for processing, and then delivering the AI's response back to the user. This cycle is the backbone of the user experience! Needless to say, the backend's responsibilities stretch well beyond this simple exchange.

To extend the AI model’s capabilities to handle a large variety of tasks, the backend in most GenAI applications provides the AI access to a curated set of tools, such as databases for contextual knowledge and web services for additional information. We need to build logic to gracefully handle potential hiccups, like handling overwhelmed AI APIs responding with error messages or adhering to strict rate limits set by most inference platforms. In most cases, we need to set up load balancing to utilize multiple AI inferences from multiple regions – allowing us to scale our GenAI application in high load situations.

Additionally, the backend supports a range of secondary functions that are vital for a good user experience. It oversees user and session management, maintaining the continuity of conversations and user interactions. For voice-enabled applications, it integrates with speech-to-text and text-to-speech services. File uploads are another feature the backend can manage, allowing users to interact with the AI in more diverse ways.

As we expand the AI application’s capabilities, and especially when deploying  consumer-facing AI applications, we also encounter the need for AI safety features. This is where the backend's role as a gatekeeper becomes crucial. It must set and enforce guardrails to prevent misuse. First, the backend must analyze user inputs to prevent the processing of undesired content or the manipulation of the AI into generating such content. Second, the backend must also analyze the AI’s outputs to ensure that they are, e.g., free from biases and don’t contain harmful content. 

As an example, consider an educational chatbot. There the analysis of students’ inputs is essential to filter out off-topic or inappropriate queries. Similarly, scrutinizing AI outputs ensures the provided guidance is accurate, encourages genuine understanding, and does not inadvertently facilitate cheating. Balancing these safety measures with the need for quick and easy access to the AI's capabilities is a nuanced challenge – one that requires careful calibration to preserve both the application's integrity and its user-friendliness.

Contextual Knowledge Bases: Sharpening AI Responses 

The contextual knowledge base is a crucial component in the generative AI stack, serving as a dedicated repository to provide the AI with accurate and relevant information to the task at hand. It is especially valuable in Retrieval Augmented Generation (RAG) setups, where the knowledge base acts as a library of specialized knowledge. Take, for example, a deaGenAI system for an auto repair shop: the knowledge base would be stocked with comprehensive car service manuals.

In an RAG application, upon receiving a user query, the backend logic immediately consults the knowledge base, finding information potentially relevant to the user's input and the ongoing session. In our auto repair shop scenario, the system would extract relevant manual sections that address the mechanic's problem at hand. These extracts, combined with the user's original input, are then forwarded to the AI model for processing. This method achieves four key objectives: (1) it deepens the AI's domain-specific knowledge, (2) it guarantees the availability of up-to-date and precise information, (3) it mitigates the risk of the AI delivering confident but incorrect responses—a known challenge with modern AI systems, and (4) it allows AI to operate transparently by citing the exact references which it uses in its responses.

The contextual knowledge base dynamically refines the AI's comprehension and customizes its output to each unique user interaction. Integrating such a database into a GenAI application significantly boosts its utility, particularly in fields where accuracy and the latest information are critical. A variety of databases are equipped to function as a contextual knowledge base, including, e.g., PostgreSql with pgvector, the latest versions of Elasticsearch, and specialized offerings from cloud providers like Azure AI Search, AWS Kendra, and GCP's Vertex AI Search.

Sketch of a Retrieval Augmented Generation System: (1) Preprocessor AI refines user input for better search results, (2) Backend conducts searches within the knowledge base, (3) Core AI generates responses using user input and relevant data from the knowledge base, and (4) Safety AI ensures the provided information is reliable and secure.

User Experience: Tailoring the GenAI Experience

The user experience (UX) is paramount for GenAI applications – as for all services for that matter – shaping how users interact with the underlying AI technology. A well-designed UX goes beyond aesthetics; it must be functional, guiding users to effectively communicate with the AI in an iterative manner, rather than using it as a simple search tool. This requires a UX design that educates users on the potential of GenAI, encouraging them to engage in a dialogue, ask follow-up questions, and delve deeper into topics, much like in a natural conversation.

The interface design must allow for various input methods tailored to the use case. For an auto repair mechanic using a GenAI application, a voice interface could be invaluable, allowing for hands-free operation. Image input can also be a powerful feature, enabling users to show the AI instead of describing it with language. These multimodal inputs can significantly enhance the utility and user-friendliness of the application.

Moving beyond the conventional chat box, GenAI can be subtly integrated into the user's existing digital environment. This integration acts as an 'invisible AI layer' that quietly operates to refine workflows and enhance productivity without adding to the user's cognitive load. Consider a customer relationship management (CRM) system that utilizes GenAI to analyze communication patterns, recommend personalized follow-ups, and identify upselling opportunities, all without the user needing to directly engage with a distinct AI interface. Such an AI layer can significantly boost the performance of customer success managers and streamline the sales process, ultimately driving increased satisfaction and loyalty.

The essence of crafting a seamless UX lies in embedding GenAI so that it becomes a natural extension of the user's work habits. The aim is for the technology to be so ingrained in the daily workflow that its advanced features are leveraged effortlessly, with the AI's complexity remaining out of sight. Users experience the benefits of GenAI through its contributions to efficiency and decision-making, rather than through overt interactions with the AI system. This approach ensures that the power of GenAI is felt in its outcomes and usability, subtly enhancing the user's tasks without introducing new layers of complexity.

Future of GenAI Application Development

We are entering an era where AI assistants are becoming an essential part of our digital toolkit and a competitive necessity for businesses to thrive. Building a generative AI application requires careful orchestration of various components, such as a modern user interface, powerful AI models, specialized backend logic, and supporting systems such as knowledge bases and integrations to other software.

In the rapidly evolving landscape of GenAI, new development environments are emerging that promise to streamline the development of GenAI applications. Tools like Azure AI Studio, Amazon Q, and AWS PartyRock, along with platforms like OpenAI's GPTs, offer ready-to-go solutions for quick proof-of-concept (PoC) development. These environments can be incredibly useful for demonstrating the potential of GenAI applications in a short timeframe.

However, we've observed that while these tools are excellent for initial PoCs, they often fall short when it comes to scaling and customizing applications for complex, real-world scenarios. That's where the expertise of a seasoned development team comes into play. Building a custom GenAI application from the ground up allows for greater flexibility, integration with existing systems, and the ability to tailor the experience to the unique needs of users and businesses.

As we look to the future, the role of GenAI in software systems will only grow more prominent. With the right approach, the possibilities are limitless. Whether you're looking to create a simple prototype or a sophisticated, enterprise-grade solution, the key is to understand the capabilities and limitations of the available tools and to choose a development path that aligns with your long-term vision. At Brightly, we're committed to navigating this complex terrain with you, leveraging our expertise to unlock the full potential of generative AI for your applications.

Authors

Janne Solanpää

Armed with a tech PhD and a strong background in AI, data science, data engineering, and software engineering, Janne Solanpää is a seasoned specialist in the field. His expertise lies in designing and implementing scalable data infrastructure on leading cloud platforms and in developing innovative software solutions. Leveraging advanced analytics and cutting-edge AI solutions, Janne has been instrumental in helping businesses unlock the potential of their data, driving growth and success.