Daisy-Chaining RAGs from Different Data Sources

Retrieval-augmented Generation (RAG) models, with their capability to pull information from vast databases, represent a milestone in natural language processing. A powerful enhancement involves chaining multiple RAGs together to answer multi-step questions or make context-sensitive decisions.

In this article, we'll explore the structure of chaining RAGs together from different data sources. We'll touch upon three use cases and then dive into a sample JSON configuration that outlines how to set up such a system.


Use Cases

Market Research and Competitive Analysis

By chaining RAGs that have access to company filings and news sources, analysts can gather insights on competitors' strategies and performance metrics within seconds.

Personalized Healthcare

Imagine a RAG that pulls medical histories and current vitals to suggest treatments. It could be chained with another RAG that looks into the latest medical research to recommend the most up-to-date treatments.

Employee Onboarding

Chaining RAGs that access both the internal employee database and online platforms such as LinkedIn could offer a comprehensive view of an employee’s background and skills, aiding in customized onboarding processes.

Adaptive Learning Platform

In an educational setting, an adaptive learning platform could use chained RAGs to offer a more personalized learning experience. The first RAG could pull from a database of educational content to recommend learning modules based on a student’s current performance and preferences. A second RAG could analyze forums, articles, or research to suggest supplemental materials for deeper understanding. A third could tap into a Q&A database to provide instant homework help based on the student's current study material.


Sample Configuration

Here is an illustrative JSON configuration that shows how to chain two different RAGs—one pulling from an S3 bucket with ACME Inc.’s company filings and the other from a URL containing LinkedIn profiles of its employees.

{
  "data_sources": [
    {
      "id": "1",
      "type": "S3",
      "config": {"path": "s3://s3-of-acme-filings"},
      "chunking_strategy": [
        
          "chunker_type": "language.python",
          "chunk_overlap": 0,
          "chunk_size": 100
        },
        {
          "chunker_type": "markdown",
          "split_at": "heading"
        }
      ]
    },
    {
      "id": "2",
      "type": "URL",
      "config": {"path": "http://linkedins-of-employees"},
      "chunking_strategy": [
        {
          "chunker_type": "language.html"
        }
      ]      
    }
  ],
  "model_chains": [
    {
      "id": "first",
      "model": "text_to_text",
      "source_id": "1",
      "config": {
        "k":5,
        "prompt": "who is the CEO of Acme Inc?",
        "system_prompt": "dont give me bullshit and format the answer as JSON",
      },
      "log": true
    },
    {
      "id": "second",
      "model": "text_to_text",
      "source_id": "2",
      "config": {
        "k":5,
        "prompt": "what year did {{model_chains.id.first.output}} start their career?"
      },
      "log": true
    }
  ],
  "output": {
    "type": "API",
    "config": {"format": "json"}
  }
}

Sample Response

Suppose you've run the models defined in your JSON. Here's how a sample response may look, inclusive of logs, references, and the final output.

{
  "chain_id":"123",
  "response": {
    "question": "What year did John Doe start their career?",
    "answer": "1995",
  },  
  "meta": [
    {
      "model_id": "first",
      "source_id": "1",
      "document_ids": ["s3_doc_001", ...],
      "action_log": "Retrieved {5} documents from s3://s3-of-acme-filings",
    },
    {
      "model_id": "second",
      "source_id": "2",
      "document_ids": ["url_doc_001", ...],
      "action_log": "Retrieved {5} documents from http://linkedins-of-employees",
    }
  ]
}

Challenges

Data Consistency

When chaining RAGs from different sources, ensuring that the data is consistent across all platforms is challenging. If one source updates more frequently than another, or if they use different formats or terminologies, discrepancies may arise.

Query Coordination

Coordinating queries between multiple RAGs can be complex. You have to manage how the output from one model serves as an input to the next. Errors or inconsistencies in one model's output can propagate and amplify, leading to inaccurate final results.

Computational Overhead

Running multiple RAGs increases computational demands. As each RAG retrieves and generates text, system latency can be a concern, especially for real-time applications.

Error Handling

Managing errors becomes complex as more RAGs are involved. If one RAG fails to produce an output or encounters an error, robust mechanisms are needed to handle such scenarios gracefully without disrupting the entire chain.

Data Visibility

When you're chaining RAGs, it can be difficult to trace back where specific pieces of information originated. As data flows through the chain, losing sight of its source can create issues for debugging, auditing, or compliance. Ensuring end-to-end visibility in the data lineage is crucial but challenging.

Logging

With multiple RAGs, you'll want detailed logs for performance monitoring, troubleshooting, and auditing. Implementing comprehensive logging that captures the workflow at each stage of the chain can be complex but is essential for transparency and compliance.

Tuning

Tuning individual models in the chain allows for optimized performance. However, this can lead to challenges in ensuring that the tuned models still work well in the context of the entire chain. Adjusting one model can affect downstream models, potentially requiring a re-tuning of the entire system.

Versioning

As RAG models are updated or refined, version control becomes essential. Knowing which version of a model is operating at each point in the chain is crucial for debugging, reproducibility, and auditing. However, managing versions in a multi-model environment is complex and requires careful orchestration.

Pinning

Pinning refers to the practice of specifying which version of a model should be used in the chain. This ensures that even if newer models are released, the chain can continue to operate with known, stable versions. While this provides stability, it also poses a challenge for incorporating improvements or updates to individual models in a seamless manner.

What will you build?

Explore workbook templates or customize your own.

Start Building