RAG Implementation

Retrieval-Augmented Generation (RAG) enhances generative AI by integrating external information retrieval, enabling models to produce accurate, contextually informed responses. It combines indexing, retrieval, augmentation, and generation, with semantic search playing a key role in retrieving relevant data based on meaning rather than exact keywords. By leveraging vector embeddings and similarity measures, RAG reduces hallucinations and dynamically updates knowledge without retraining the model. This approach is widely applied across industries, such as customer service, education, legal research, and content creation, to address knowledge-intensive tasks. Its implementation relies on vector databases and advancements in embedding models to efficiently connect AI systems with external, authoritative information sources.

Retrieval-Augmented Generation (RAG) is a method that enhances the capabilities of generative AI models by integrating external information retrieval into their workflows. Unlike traditional generative models that rely solely on their pre-trained knowledge, RAG retrieves relevant, up-to-date, or domain-specific information from external data sources and incorporates it into the model’s response. This process involves four key steps: indexing, retrieval, augmentation, and generation. During indexing, data is transformed into vector embeddings and stored in a vector database. When a query is made, the retrieval step identifies the most relevant documents using similarity measures. These documents are then used during augmentation to enrich the model’s input, enabling it to generate responses that are both contextually accurate and informed by external knowledge.

Semantic search plays a crucial role in the retrieval phase of RAG by enabling the system to find contextually relevant information rather than relying on exact keyword matches. It uses vector search, where both user queries and documents are encoded into numerical representations called embeddings. By comparing these embeddings, semantic search identifies the most conceptually similar documents in a dataset. This approach is particularly effective for understanding user intent and retrieving results that align with the meaning behind a query rather than its literal wording. Semantic search is widely used in applications such as customer support systems, enterprise knowledge management, and e-commerce platforms to improve search accuracy and relevance.

The combination of RAG and semantic search creates a powerful framework for solving knowledge-intensive tasks. For instance, RAG can retrieve specific information from large databases or proprietary knowledge bases, such as customer support logs or legal documents, while semantic search ensures that the retrieved data aligns with the user’s intent. This synergy reduces hallucinations—instances where AI generates incorrect or fabricated information—and allows for dynamic updates without retraining the model. As a result, RAG systems can provide more reliable answers in domains where accuracy and recency are critical.

One of the key advantages of RAG is its versatility across industries and applications. In customer service, it enables chatbots to deliver personalized responses by retrieving contextual information about users or products. In education, RAG can generate tailored study materials or explanations based on specific queries. Legal professionals use it for efficient case law retrieval and drafting assistance, while content creators leverage it for summarization and fact-checking. These diverse applications highlight how RAG bridges the gap between static AI training data and dynamic real-world information needs.

The technical implementation of RAG often relies on vector databases to store embeddings generated from text or other data types. These embeddings capture semantic meaning, allowing for efficient similarity searches at scale. Tools like Pinecone or Elasticsearch facilitate this process by enabling real-time updates to the database as new information becomes available. Additionally, advancements in embedding models and prompt engineering have further improved RAG’s ability to handle complex queries and generate coherent outputs. This makes RAG an essential tool for modern AI systems that require both generative capabilities and access to authoritative external knowledge.

Required Reading and Listening

Listen to the podcast:

RAG and Beyond: Augmenting LLMs with External Data

Transcript

Speaker	Text
Alex	Welcome to the Deep Dive. We take, uh, stacks of research papers and try to extract the most fascinating and important insights. Perfect for you, a busy data scientist. Today we’re going to dive into the world of retrieval augmented generation. And as data scientists, you already know about. Machine learning, deep learning, and the power of LLMs. But RG has a whole new dimension to LLMs. He really does, and we have two really great sources to kind of guide our deep dive today. Yeah,
Sam	so the first one is going to be a research survey paper. It’s titled Retrieval, Augmented Generation, Our RAG and Beyond. And essentially this paper breaks down how to make LLMs smarter using external data. I like that. Yeah. And then we have a more practical paper, which is a software engineering case study titled 7 Failure Points When Engineering a Retrieval Augmented Generation System. Oh, interesting. Yeah, and it explores the practical pitfalls and lessons learned from real world RAG implementations.
Alex	I like it. So you’ve got theory and practice. Exactly two sides of the same coin. Love it. So our mission today is to really understand rag from both a theoretical and a practical perspective. And hopefully equip you with the knowledge to start experimenting with rag on your own projects. Absolutely. So what is rag and why should you care?
Sam	So a rag is essentially a way to overcome some of the limitations of large language models. So as you know, large language models like chat GPT, they’re trained on massive amounts of data. But this data is static, right? So once the model is trained, its knowledge is frozen in time. So RAG comes in and allows these large language models to access and retrieve information from external sources in real time.
Alex	I see. So it’s like giving them a direct line to the internet or any relevant database.
Sam	Exactly. And this has a lot of benefits. First of all, it helps to address the issue of hallucination, which is a big problem with large language models.
Alex	Hallucination, that sounds trippy.
Sam	It kind of is. It’s basically when the LLM generates outputs that are factually incorrect or just don’t make sense. Ah,
Alex	so it’s like the model is making things up.
Sam	Yeah, exactly. And because RAG grounds the LLM in real-time information from external sources, it helps to reduce these hallucinations.
Alex	Makes sense. So a RAG is like a fact checker for LLMs.
Sam	You could say that, but it’s more than just fact checking. It also allows LLMs to stay up to date with the latest information. Like think about a financial analysis chatbot. If it’s only trained on data up to a certain point, it won’t be able to provide accurate insights on current market trends or news. But with RA, this chatbot can access real-time market data, financial news, and even historical financial reports.
Alex	So it’s always learning, always evolving. Exactly.
Sam	It makes the LLM much more dynamic and powerful. And for data scientists, this means we can build more robust and accurate LLM applications. Now
Alex	we’re talking. So Does rag actually work at its core,
Sam	a rag system has two main components. We have retrieval and we have generation. Retrieval is all about finding those needles in the haystack. So like finding the relevant pieces of information from a potentially massive pool of data. In general. Generation is where we use the retrieved information to generate a response, and this is usually done with an
Alex	LLM. Got it. So find the right info, then use it to answer the question
Sam	exactly. Now, the research paper dives into the different levels of queries that a rag system might encounter, and it’s really interesting how the nature of the query can significantly impact the complexity of the system.
Alex	OK, this sounds intriguing.
Sam	It is. It basically highlights that not all raggies are created equal.
Alex	So what are these different levels of queries and how do they affect the RX
Sam	system? Well, the paper breaks it down into four levels. The first level, the simplest one, is called explicit facts. Imagine asking something like, What is the capital of France?
Alex	OK, pretty straightforward,
Sam	right? The answer is explicitly stated somewhere in your data. Easy peasy. Exactly. And for this level, data sets like natural questions are great. Examples. OK,
Alex	natural questions. Got
Sam	it. Now, moving up in complexity, we have the second level, implicit facts. OK, implicit facts. Yeah, and this requires a bit more reasoning. Imagine asking, what is the largest city in the country where Canberra is located? Oh, that’s trickier, right? To answer this, the system needs to first know that Canberra is in Australia and then find the largest city in Australia. It’s like. The do. Exactly. It’s combining information from multiple sources. The paper mentions hotpot QA as a data set with these kinds of questions.
Alex	Hot pot QA. Got it. All right. What about the third
Sam	level? Right, so the third level is called interpretable rationales, and this gets even more
Alex	complex. OK, interpretable rationale. Yeah.
Sam	And this is where the system needs to follow predefined rules or logic to arrive at an answer. Think of it like a medical diagnosis system that uses established clinical guidelines.
Alex	It’s like giving the rag system a set of instructions.
Sam	Exactly. And a data set that aligns well with this level is the fever data set.
Alex	OK, so fever is all about following rules,
Sam	pretty much. Now the 4th and most challenging level is hidden rationales. Hidden
Alex	rationales sounds mysterious.
Sam	Yeah, and in this scenario, there are no explicit rules or guidelines. The system has to find patterns and expertise hidden within the data itself. Think about predicting stock market trends or identifying potential fraudulent. transactions. Oh, that’s a tough one, right? The reasoning behind the answer isn’t readily apparent, and this is where data sets like strategy QA come in.
Alex	OK, strategy QA for those really hard to explain insights. Exactly.
Sam	So you see, depending on the type of question, the Araji system has to work a lot harder to come up with the right answer.
Alex	Makes sense. Now let’s zoom in on that first crucial step, retrieval. The research paper talks a lot about semantic search, which sounds way more sophisticated than just your average keyword matching. It is. So tell me more.
Sam	Semantic search is about understanding the meaning of the query, not just looking for specific words,
Alex	right? So it’s not just about matching words. It’s about understanding the intent behind them,
Sam	precisely. It’s the foundation of how RG systems can retrieve truly relevant information.
Alex	So how do we teach a machine to understand meaning? This is where text embedding comes in,
Sam	right? Exactly.
Alex	So can you break that down for us?
Sam	Sure. Text embedding is essentially transforming text, whether it’s words or entire sentences, into numerical representations. We call these representations vectors,
Alex	vectors. OK. So it’s like translating language into a format that a machine can understand.
Sam	That’s a great way to put it. And the cool thing about these vectors is that they capture semantic relationships.
Alex	So words with similar meanings would have vectors that are closer together in this vector space.
Sam	Exactly. Think of it like this King and queen would be pretty close together because they share a similar semantic meaning.
Alex	But king and, say, broccoli would be way further apart.
Sam	Exactly, because their meanings are completely different.
Alex	Fascinating. And so once we have all these words and sentences represented as vectors. How do we actually use them for retrieval?
Sam	Well, that’s where vector search comes in.
Alex	OK, I’m intrigued. Tell me more.
Sam	So when you have a query, that query also gets transformed into a vector, and then this query vector is compared to all the other vectors in your database, and the vectors that are closest to. The query vector, meaning they are the most semantically similar, get retrieved.
Alex	I see. So we’re not just looking for exact word matches. We’re looking for concepts, relationships, meaning,
Sam	obviously, and that’s a game changer for building truly intelligent systems.
Alex	Absolutely. It sounds incredibly powerful, but I imagine storing and searching through all these vectors. Efficiently requires a special kind of database.
Sam	It does, and that’s where vector databases come into play.
Alex	So how do vector databases differ from our good old fashioned SQL databases?
Sam	Well, think of it this way. SQL databases are great for structured data. They’re designed to handle tables with rows and columns. And you can search for exact matches based on specific values,
Alex	right, your standard database stuff. Exactly.
Sam	But with rag, we’re dealing with semantic similarity, those nuanced relationships captured by vectors. Traditional SQL databases weren’t really built for this kind of search.
Alex	So vector databases are specifically designed to handle these high dimensional vectors.
Sam	Exactly. They can efficiently store and search through millions, even billions of vectors, allowing you to find the most semantically similar items to your query.
Alex	OK, so we’ve got these powerful vector databases storing all this embedded text, but how do they actually go about finding The most relevant vectors for a given query. What’s happening under the hood.
Sam	So there’s some pretty clever algorithms at play here. Two of the most popular ones are clustering and k nearest neighbors. I’ve heard of those. Yeah. So clustering is basically about grouping similar data points together.
Alex	So imagine all documents about machine learning are grouped together.
Sam	Exactly. So when you have a query about machine learning, The database doesn’t have to search through every single document. It can just focus on the cluster related to machine learning.
Alex	That makes sense for efficiency. What about k nearest neighbors?
Sam	P nearest neighbors or KNN is a bit different. It focuses on finding the k most similar items to your query vector.
Alex	So if my k is 5, it’ll return the five most similar documents to my query.
Sam	Exactly. It’s like finding. The k closest stars to a specific star in the night sky.
Alex	So it’s ranking the retrieved information based on its semantic proximity to the query. You got
Sam	it. It’s a very elegant way to find the most relevant information.
Alex	That’s really neat. So now we have all of this relevant information, but how do we actually use it with the LLM? How does it all come together? So
Sam	this is where we move from retrieval to generation, and it’s not just about dumping all the retrieved information into the LLM,
Alex	right? It’s got to be more nuanced than that.
Sam	It is. It’s about enriching the context for the LLM, giving it a targeted boost of knowledge.
Alex	So we’re not just throwing a bunch of data at the LLM and hoping for the best.
Sam	Exactly. We want to give it the most relevant information possible, information that will help it generate the most accurate and informative response. Think of it like this. You wouldn’t ask a chef to cook a gourmet meal without giving them the necessary ingredients, right? It makes sense. So instead of just giving the LLM a broad prompt, we’re saying, Hey, here’s some really relevant information. Use this to craft your response.
Alex	So it’s like giving the LLM a cheat sheet tailored to the task at hand.
Sam	Exactly. The retrieved text is used to augment the prompt, making it much more specific and informative. Imagine you’re building a chatbot to answer customer questions about a specific product. Instead of relying solely on the LLM’s general knowledge, you could use AI to retrieve relevant excerpts from the product documentation, technical specifications, even customer reviews.
Alex	Ah, so it’s like giving the LLM a specialized knowledge base for that specific product.
Sam	Precisely. And this makes the LLM’s responses much more accurate and relevant. But as with any real world implementation, there are challenges,
Alex	right? And that’s where our second source, the case Eti paper, comes in. It really dives into some of the real world challenges of actually implementing these Redge systems. Yeah,
Sam	it’s not always as smooth as the theory might suggest.
Alex	So what are some of the hurdles that data scientists might encounter when building these Redge systems?
Sam	One major one is data processing. You see, in the real world, we often deal with data that’s messy, unstructured, and it needs to be cleaned and structured for efficient retrieval.
Alex	Oh yeah, data wrangling, the bane of every data scientist’s existence.
Sam	Exactly. Imagine trying to feed a rag system, a collection of PDFs, Word documents, handwritten. Notes and who knows what else. It’s a nightmare. It
Alex	can be a real headache. It is.
Sam	It’s not just about cleaning the data. It’s about structuring it in a way that makes retrieval efficient and accurate. You might need to extract key information, categorize documents, or even create summaries.
Alex	So even before we get to the fancy algorithms and embeddings, there’s A mountain of data prep to do.
Sam	Absolutely. And even once you have your data prepped and ready to go, there’s another challenge retrieval accuracy, right? Because
Alex	if the retrieved information is garbage, then the output’s going to be garbage too. Garbage in, garbage out.
Sam	Exactly. Even with the best retrieval techniques, there’s always a chance of retrieving information that’s irrelevant or even worse, misleading. And this is especially important in fields like finance or healthcare, where accuracy is absolutely paramount. Imagine a financial model that retrieves outdated market data or a medical diagnosis system that pulls up information about the wrong condition.
Alex	The consequences could be disastrous.
Sam	They could be. So validating the retrieved information is absolutely critical. But the good news is the case study paper also offers some valuable lessons learned.
Alex	OK, so it’s not all doom and gloom.
Sam	No, not at all. There are definitely ways to overcome these challenges and build robust RAD systems. One key takeaway is that validation needs to be an ongoing process. It’s not enough to just test your system once and call it a day, so
Alex	we can’t just set it and forget it. Nope.
Sam	As you add new data, update your LLM, or even modify your retrieval techniques, you need to continuously monitor and validate the system’s performance.
Alex	So it’s all about being vigilant, making sure that the system is still performing as expected,
Sam	exactly. And another important lesson is that robustness is an iterative process. You build, you test, you learn from your mistakes, and then you adapt. It’s an ongoing cycle of improvement. So
Alex	embrace the agile approach. Fail fast, learn and improve.
Sam	Exactly. The case study paper illustrates this beautifully through a real world example. They were building a rag yag system for legal research, and initially their retrieval accuracy was pretty low, but through rigorous testing and continuous improvement they were able to dramatically increase the system’s accuracy and reliability.
Alex	That’s a great example of how persistence in a data-driven approach can really pay off. OK, so we’ve explored the nuts and bolts of RAG. We’ve talked about the technical details of retrieval and generation, and we’ve even touched on some of the real world challenges. But let’s take a step back for a moment and talk about the bigger picture. What does all this mean for you, the data scientists on the front lines? What’s the big takeaway here? What does all of this WG stuff mean for data scientists?
Sam	I think R represents a really fundamental shift in how we think about data and intelligence. For a long time, you know, the focus has been on building bigger, more complex models, trying to cram in as much knowledge as possible into the training data, but RAG offers a different paradigm.
Alex	So instead of just making these massive models even bigger, we’re finding ways to make them smarter by giving them. Us to more information.
Sam	Exactly. It’s about leveraging external knowledge in a much more dynamic, more targeted way. It’s almost like moving from a closed system to an open one like that where the LLM can constantly learn and adapt as new information becomes available.
Alex	It’s like the model is always plugged in, always updating its knowledge,
Sam	precisely. Instead of being limited by what it learned during its initial training, it now has access to a vast and constantly evolving knowledge base. It’s like, I don’t know, giving it a key to the Library of Alexandria.
Alex	Oh, I like that analogy. But
Sam	this library is constantly expanding and updating. That’s
Alex	powerful. It really is. And it makes me think about all the potential applications, all the things we could build if we could give these LLMs access to all of this knowledge.
Sam	Yeah, the possibilities are incredible. Imagine personalized medicine tailored to an individual’s genetic makeup and lifestyle, or scientific breakthroughs driven by LLMs that can actually sift through mountains of research data.
Alex	It’s like we’re moving beyond just artificial intelligence to something closer to artificial wisdom.
Sam	That’s a great way to put it, because these LLMs, when they’re augmented with rag, they have the potential to tap into the collective knowledge of humanity.
Alex	That’s amazing. So what does all of this mean for data scientists? Where do we fit into this picture?
Sam	I think it opens up a whole new world of opportunities. We can start building applications that can solve problems we couldn’t have even imagined before. It’s
Alex	like a whole new frontier of data science.
Sam	Exactly. It’s about pushing the boundaries of what’s possible. So as we wrap up our deep dive, I want to leave you with a question. What problems are you trying to solve? What data do you have at your fingertips? Could RAG be the key to unlocking new solutions?
Alex	That’s a great question. I think for any data scientist working with LLMs, our egg is definitely worth exploring. It’s a really powerful tool that can help you build more accurate, more reliable, and more intelligent systems.
Sam	It has the potential to revolutionize the way we approach data analysis.
Alex	That’s it for this deep dive. We hope you’ve enjoyed this exploration of retrieval augmented generation. Stay curious, keep learning, and we’ll catch you next time on the Deep dive.

Read the following:

Perplexity blog: RAG and Semantic Search
Textbook: Chapter 8: Hands-On Large Language Models
Paper: Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely

Additional Resources

Some Questions to Consider

What is Retrieval Augmented Generation (RAG)?
RAG is a technique that enhances large language models (LLMs) by allowing them to access and incorporate external data sources during the response generation process. This addresses limitations of LLMs such as outdated knowledge, lack of domain-specific expertise, and the tendency to hallucinate. RAG systems work by retrieving relevant documents based on a user query and then feeding these documents to the LLM to generate an answer. The process typically involves indexing data, retrieving relevant segments, and generating a response.
What are the key components of a RAG system, and how do they work together?
A RAG system primarily consists of two main processes:
Indexing: This involves processing and structuring the external data source. Documents are split into smaller, manageable chunks, which are then converted into embeddings (numerical representations) using an embedding model. These embeddings, along with the original text chunks, are stored in a database for efficient retrieval. Key considerations include chunking strategies (fixed size, semantic-based, etc.) and the choice of embedding model (sparse, dense, hybrid). Querying: When a user poses a question, the system first converts the query into an embedding using the same embedding model used during indexing. This query embedding is then used to perform a semantic search in the indexed database to find the most relevant document chunks. The retrieved chunks are then passed to an LLM, along with the original query, to generate a final answer.
What are the different types of queries that RAG systems can handle, and what are the best techniques for addressing each type?
The effectiveness of a RAG system depends on the type of query it’s designed to handle. The study categorizes queries into four levels:
- Explicit Fact Queries: Answers can be directly retrieved from specific data segments. Best addressed by ensuring accurate retrieval and minimizing noise.
- Implicit Fact Queries: Require synthesizing information from multiple references, often involving multi-hop reasoning. Graph-based approaches and iterative retrieval methods are beneficial.
- Interpretable Rationale Queries: Demand understanding and application of domain-specific rationales, often explicitly provided in external resources (e.g., FDA guidelines). Prompt engineering and instruction tuning are crucial.
- Hidden Rationale Queries: Rely on dispersed knowledge or in-domain data, where the rationale isn’t explicitly stated. Offline learning techniques, in-context learning, and fine-tuning can be effective.
What are the main strategies for improving data retrieval in RAG systems?
Several techniques enhance data retrieval:
Chunking Optimization: Experimenting with different chunk sizes and strategies (fixed size, semantic, recursive) to balance semantic coherence and noise reduction. Indexing Methods: Employing sparse, dense, or hybrid retrieval methods to create effective mappings from search terms to text segments. Dense retrieval uses vector embeddings, while sparse retrieval uses keyword-based indexes. Query Document Alignment: Aligning queries with document segments through traditional alignment (mapping both to the same encoding space), document domain alignment (generating synthetic answers), or query domain alignment (generating synthetic questions). Re-ranking: Re-evaluating the retrieved documents to improve the relevance of the top results Recursive Retrieval: Performing multiple retrieval attempts to progressively refine the search and address any omissions.
How can the response generation phase in RAG systems be improved?
Enhancing the response generation involves:
Determining Sufficiency: Evaluating if the retrieved information is adequate or if additional data is needed. Conflict Resolution: Handling discrepancies between retrieved knowledge and the LLM’s internal knowledge. Supervised Fine-Tuning: Fine-tuning the LLM on carefully designed training data to mitigate the effects of irrelevant or erroneous information. Joint Training: Training both the retriever and generator components of the RAG system together to ensure consistent performance. Prompt Engineering: Optimizing prompts to ensure the LLM accurately follows and reacts based on the rationales provided in the external data. Self-reflection: Using techniques such as “Chain of Thought” or “Tree of Thoughts” to enable the LLM to evaluate and refine its own reasoning process.
What are the main challenges and failure points when engineering a RAG system?
The “Seven Failure Points” paper highlights several key challenges:
Missing Documents: Relevant documents are not included in the indexed data source. Missed Top-Ranked Documents: The answer is in a document that is not ranked highly enough to be returned. Not in Context: Documents with the answer were retrieved but did not make it into the context for generating an answer Reader Cannot Extract the Answer: The document is passed to the LLM, but it fails to generate the correct answer. Hallucination: The LLM ignores the provided context and generates an answer based on its pre-existing knowledge, which may be incorrect. Jailbreak: Users bypass the RAG system through adversarial prompts. Security/Privacy Violations: Unauthorized access to sensitive information Other challenges include:
- Data Processing Pipeline Robustness: Ensuring the data pipeline can handle uploaded documents and media effectively.
- Lack of Pre-existing Data for Testing: Difficulties in testing RAG systems because no data exists and needs to be experimentally discovered.
What are the different ways to integrate external data into LLMs, and what are their respective trade-offs?
There are three main approaches:
Context Injection (RAG): Extracts relevant data based on the query and provides it as context to the LLM. Offers good interpretability and stability but is limited by the context window size and potential information loss. Best suited for scenarios where data can be explained succinctly. Small Model Approach: Trains a smaller model on specific domain data to guide the integration of external information. Reduces training time and can assimilate considerable amounts of data, but its efficacy depends on the model’s capabilities and may limit performance for complex tasks. Fine-Tuning: Directly fine-tunes a general LLM with external domain knowledge to create a domain-expert model. Enables utilization of large model capacities but requires careful data design to avoid generating erroneous outputs or losing previously known domain knowledge. This approach also requires more data, a longer training duration, and more computational resources.
What is the role of validation and continuous monitoring in RAG system deployment?
The sources emphasize that validation of a RAG system is primarily feasible during operation, and its robustness evolves over time. Continuous monitoring is crucial because RAG systems receive unknown input at runtime. This monitoring helps in identifying and addressing issues such as:
- Calibration: Fine-tuning chunk size, embedding strategy, retrieval strategy, consolidation strategy, context size, and prompts.
- Performance Drift: Detecting and mitigating performance degradation due to changes in data or user behavior.
- Security Threats: Identifying and preventing jailbreak attempts and other security vulnerabilities.

Glossary

Term	Description
Retrieval Augmented Generation (RAG)	A framework that enhances language models by retrieving information from external sources to improve the accuracy and relevance of generated content.
Large Language Model (LLM)	A deep learning model with a large number of parameters, trained on vast amounts of text data, capable of understanding and generating human-like text.
Hallucination	The tendency of language models to generate false or nonsensical information that is not grounded in reality.
Domain-Specific Knowledge	Information and expertise relevant to a particular field or subject area.
Explicit Fact Query	A query that can be answered directly by retrieving a specific piece of information from a data source.
Implicit Fact Query	A query that requires synthesizing information from multiple sources, often involving reasoning or inference, to arrive at an answer.
Interpretable Rationale Query	A query that requires understanding and applying domain-specific rationales from external data to provide an answer.
Chunking	The process of dividing documents into smaller, more manageable segments for indexing and retrieval.
Embedding	A vector representation of text or other data that captures its semantic meaning.
Sparse Retrieval	A retrieval method that indexes text segments using specific words.
Dense Retrieval	A retrieval method that maps text segments into a dense vector space of features.
Query Rewriting	Modifying a user’s query to improve search accuracy and relevance.
Fine-tuning	The process of further training a pre-trained language model on a smaller, domain-specific dataset.
In-context Learning	The ability of a language model to learn from examples provided in the prompt, without requiring explicit fine-tuning.
Prompt Tuning	Optimizing the input prompts to elicit the desired behavior from a language model.
Adapter Tuning	Integrating small adapter models with LLMs while freezing the parameters of the LLM during fine-tuning and only optimizing the weights of the adapter.
Low-Rank Adaptation	Reducing the number of trainable parameters needed for adapting to downstream tasks by imposing low-rank constraints on each dense layer to approximate the update matrices.
Knowledge Graph	A graph-structured database that represents entities and their relationships.
Data Dependency	The subset of data segments indispensable for addressing a query.
Power Set	The set of all subsets of a given set, including the empty set and the set itself.
IR (Information Retrieval)	The process of obtaining information system resources that are relevant to an information need from a collection of those resources.
BFS (Breadth-First Search)	An algorithm for traversing or searching tree or graph data structures.
Zero-Shot Learning	A type of machine learning where a model can perform a task without having seen any specific examples of that task during training.
Few-Shot Learning	A type of machine learning where a model can learn a new task from only a few examples.
In-Domain Data	Data from the same domain as the task at hand.
Offline Learning	Training a model using a fixed dataset before deployment.
Prompt Engineering	The process of designing effective prompts to elicit desired responses from language models.
Chain-of-Thought Prompting	A prompting technique that encourages language models to generate intermediate reasoning steps before producing a final answer.
Buffer-of-Thought	Using a problem distiller to distill a meta-buffer across many reasoning tasks.
Instruction Tuning	Supervised fine-tuning using paired (instruction, output) data to infuse new capabilities into LLMs.
G-Evals	NLG evaluation using GPT-4 with better human alignment for offline evaluation techniques.