LlamaIndex

LlamaIndex 是一个用于连接自定义数据源到 LLM 的框架。简单来说，它可以让你用你自己的数据来增强 LLM，让 LLM 能够回答关于这些数据的更具体、更准确的问题。它被设计为实现 RAG（Retrieval-Augmented Generation）的 Agent。

LlamaIndex 核心功能和作用

1. 数据连接：

LlamaIndex 提供了各种数据连接器 (Data Connectors)，可以从不同的数据源加载数据，例如：文档 (PDF, Word, Text, Markdown 等)，网站，数据库，知识图谱，API。这些连接器负责将数据转换为 LlamaIndex 可以处理的文档格式。

2. 数据索引：

LlamaIndex 将加载的数据构建成索引 (Index)，以便 LLM 可以高效地查询和检索相关信息。LlamaIndex 提供了多种索引类型，例如：

列表索引 (List Index): 简单地将文档列表存储起来。
向量索引 (Vector Store Index): 将文档嵌入到向量空间中，以便进行语义搜索。
树索引 (Tree Index): 将文档组织成树状结构，以便进行分层搜索。
关键词表索引 (Keyword Table Index): 使用关键词来索引文档。

你可以根据你的数据和查询需求选择合适的索引类型

3. 查询引擎：

LlamaIndex 提供了查询引擎 (Query Engine)，用于接收用户的查询，并从索引中检索相关信息。查询引擎使用 LLM 来理解用户的查询，并生成合适的查询语句。查询引擎还可以对检索到的信息进行排序、过滤和聚合，以便提供更准确的答案。

4. 数据代理 (Data Agents):

LlamaIndex 允许你创建数据代理，这些代理可以自动执行各种任务，例如：回答问题，生成文本，总结文档，等等。数据代理可以根据你的需求进行定制，以便更好地完成特定任务。

3个重要部分

Components：最基本的构建块。这些包括提示、模型和数据库等。组件通常有助于将 LlamaIndex 与其他工具和库连接起来。比如指定模型。
Agents and Tools：让 Agent 执行动作的组件，如搜索、计算或访问外部服务；Agent 是能够使用工具并做出决策的自主组件。
Workflows：是逐步处理逻辑的过程。

LlamaHub

LlamaHub 是一个包含数百个 integrations、Agents 和 Tools 的注册中心，属于 LlamaIndex。

尝试使用 Hugging Face inference API integration.和 embedding component

pip install llama-index-llms-huggingface-api llama-index-embeddings-huggingface

然后可以看到一个使用 Hugging Face 推理 API 的 LLM 组件的示例：

# 这里有指定模型的 API
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
import os
from dotenv import load_dotenv

# Load the .env file
load_dotenv()

# Retrieve HF_TOKEN from the environment variables
hf_token = os.getenv("HF_TOKEN")
llm = HuggingFaceInferenceAPI(
    model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
    temperature=0.7,
    max_tokens=100,
    token=hf_token,
)
response = llm.complete("Hello, how are you?")
print(response)
# I am good, how can I help you today?

如何 find, install 和 use 我们所需的组件的 integrations。

Components 组件

Agents 需要理解我们的请求，准备、查找和使用相关信息来帮助完成任务。这就是 LlamaIndex 组件发挥作用的地方。

该框架有许多组件，重点使用 QueryEngine，因为它可以作为 RAG 给 Agent 使用。


User 通过 Index（LlamaIndex 将database中的数据构建成Index）在 Your database 中查找信息，并将这个信息 append 到 prompt中，作为 LLM 的输入，最终返回给User.

所以 QueryEngine 的作用是于接收用户的查询，并从索引中检索相关信息，任何代理都需要一种查找和理解相关数据的方式。 QueryEngine 正是提供了这种能力。

创建 RAG pipline 5 步骤

回顾 RAG 的5个步骤

我使用的 Devv.ai 不是 RAG 的系统，只是简单的使用 LLM 回答问题。

现在使用 LlmaIndex 创建 RAG 流程。

这里有更多关于 LlamaIndex 的 RAG

步骤1. loading

熟悉 LlamaHub 加载器和 LlamaParse 解析器，以处理更复杂的数据源。

最简单加载数据的方式是使用 SimpleDirectoryReader 。这个多功能组件可以从文件夹中加载各种文件类型，并将它们转换为 Document 对象，以便 LlamaIndex 可以使用。

from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="path/to/directory")
documents = reader.load_data()

加载我们的文档后，我们需要将它们分解成更小的片段，称为 Node 对象。IngestionPipeline 帮助我们通过两个关键转换来创建这些节点：SentenceSplitter & HuggingFaceEmbedding ：

from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline

# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ]
)

nodes = await pipeline.arun(documents=[Document.example()])

步骤2，3. storing & indexing

在创建我们的 Node 对象后，我们需要对它们进行索引以使其可搜索，但在那之前，我们需要一个地方来存储我们的数据。使用 chromadb

!pip install llama-index-vector-stores-chroma 用于存储 documents。

import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
# 保存我们的 documents
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    ],
    vector_store=vector_store,
)

通过将 query and nodes 嵌入到同一个向量空间中，我们可以找到相关的匹配项。 VectorStoreIndex 帮助我们处理这个问题。从我们的 vector_store 和 embedding 中创建这 Index:

from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)

注意使用相同的 model。现在可以轻松地 save and load our index。

步骤4. 使用提示和 LLMs Quering a VectorStoreIndex

在我们能够查询 Index 之前，我们需要将其转换为查询 interface，方法有：

as_retriever，as_query_engine， as_chat_engine。

from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)
query_engine.query("What is the meaning of life?")
# The meaning of life is 42

步骤5. Evaluation

LlamaIndex 提供内置评估工具来评估响应质量。这些评估器利用 LLMs 来分析响应的不同维度。

即使没有直接评估，也可以通过观察每一个组件的结果来评估Agent的表现。

ResponseSynthesizer

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.response_synthesizers import ResponseMode

# 加载文档
documents = SimpleDirectoryReader("data").load_data(......)
index = VectorStoreIndex.from_documents(documents)

# as_query_engine 将一个 Index 转化为一个 Search Engine
# 配置查询引擎，使用 ResponseSynthesizer
query_engine = index.as_query_engine(
    response_mode="refine",  # 使用 refine 模式
    similarity_top_k=3  # 检索 top-3 节点
)

# 查询
response = query_engine.query("What is LlamaIndex?")
print(response)  # ResponseSynthesizer 合成答案：llamaIndex includes...

从 top-3 文档节点中提取 LlamaIndex 信息，迭代优化（refine 模式），生成最终答案。

关于 respond_mode，有以下几种模式：

compact：将所有节点合并为单一上下文，调用 LLM 一次生成答案。
refine：迭代优化每个节点的响应，逐步生成最终答案。
tree_summarize：分层汇总节点，适合复杂查询。
accumulate：为每个节点生成独立答案，汇总输出。

LlamaIndex 核心功能和作用#

1. 数据连接：#

2. 数据索引：#

3. 查询引擎：#

4. 数据代理 (Data Agents):#

3个重要部分#

LlamaHub#

Components 组件#

创建 RAG pipline 5 步骤#

步骤1. loading#

步骤2，3. storing & indexing#

步骤4. 使用提示和 LLMs Quering a VectorStoreIndex#

步骤5. Evaluation#

ResponseSynthesizer#