🚀 MariaDB云混合RAG就绪搜索
在数据所在之处进行向量搜索 + 实时MCP增强
可在MariaDB云上立即部署一个支持RAG的混合搜索引擎。
这是一个即插即用的MariaDB云演示项目,它利用了MariaDB向量搜索、配备Brave搜索增强功能的FastMCP服务器,以及来自scikit - learn的20 Newsgroups数据集。该解决方案将基于云的语义搜索与实时网络搜索增强功能相结合,适用于广泛的应用和可能的使用场景。它消除了基础设施冗余和解决方案间的延迟,提供了一个精简的、支持RAG的搜索架构。
该方案可轻松适配任何搜索源,并可集成到任何RAG工作流程中。
✨ 主要特性
- 数据本地化搜索:能够在数据所在的MariaDB云上直接进行向量搜索。
- 实时网络搜索增强:通过MCP增强功能,结合实时网络搜索结果。
- 即插即用:提供快速部署的演示,方便用户快速上手。
- 高度可扩展:架构设计具有可扩展性,适用于多种应用场景。
🎯 应用领域
此架构专为可扩展性而设计,适用于以下场景:
- 初创搜索引擎:将专有/自有网站数据库(通过MariaDB向量进行索引,并通过MariaDB云进行部署)与大型搜索引擎的结果相结合,借助MCP增强功能,实现全面的搜索覆盖。
- 产品搜索引擎:首先通过带有向量搜索功能的MariaDB云展示内部库存产品的搜索结果,然后通过远程MCP工具从亚马逊等大型卖家的产品目录中补充结果,确保用户不会遇到“无搜索结果”的页面。
- 企业联合搜索:在MariaDB本地解决高保真度的内部查询,以分散负载,同时将基于趋势或临时的查询卸载到多个远程MCP实例。
- 更多场景:混合静态和动态知识的可能性是无限的。
📂 仓库结构
| 文件 |
类型 |
功能 |
| mariadb_cloud_hybrid_search.py |
🧠 核心逻辑 |
主应用程序。连接到MariaDB进行基于语义向量的搜索,并启动MCP实例以获取外部网页搜索结果。 |
| server.py |
🔌 MCP服务器 |
一个轻量级的纯Python实现的FastMCP + Brave Search MCP服务器。其核心优势是无需复杂的Node.js/NPX依赖。 |
| load_data.py |
🚜 数据摄取 |
强大的数据加载器。读取newsgroups_with_embeddings.csv文件,解析JSON向量,并将其安全批量插入到MariaDB云实例中。 |
| schema.sql |
🏗️ 数据库 |
DDL脚本。定义文档表,并使用余弦距离配置HNSW(分层导航小世界)向量索引。 |
| embeddings.py |
⚗️ 数据转换 |
使用sentence - transformers(all - MiniLM - L6 - v2)从原始文本生成384维的密集向量。 |
| import_newsgroups.py |
📦 数据源 |
ETL脚本。从Scikit - Learn下载“20 Newsgroups”数据集并进行整理,以便后续处理。 |
| brave_api_config.py |
⚙️ 配置文件 |
用于存储Brave API密钥的集中式凭证文件。 |
| db_config.py |
⚙️ 配置文件 |
用于存储MariaDB云连接详细信息的集中式凭证文件。 |
🚀 快速开始
1. 前提条件
- MariaDB云(或带有MariaDB向量的MariaDB 11.8+) (可在此处获取免费试用,此处获取更多信息,此处访问仪表盘)
- Python 3.10+
- Brave搜索API密钥 (可在此处获取免费密钥)
2. 安装
安装所需的Python包(无需Node.js!):
sudo apt update
sudo apt install -y python3
pip3 install --break-system-packages scikit-learn pandas sentence-transformers mysql-connector-python mcp fastmcp httpx
3. 数据库设置
python3 init_schema.py
4. MCP/Brave API设置
在brave_api_config.py中配置API密钥(免费密钥,链接见上文)。
5. 数据管道
摄取示例数据(20 Newsgroups):
- 下载原始数据
python3 import_newsgroups.py
- 生成向量
python3 embeddings.py
- 加载到MariaDB
python3 load_data.py
你应该会看到类似以下的输出:
$ python3 load_data.py
Reading load_data.sql...
Connecting to MariaDB (serverless-eu-west-2.sysp0000.db1.skysql.com)...
Parsing and executing statements (This may take some time)...
Note that if you have a mariadb CLI client connection to your MariaDB Cloud instance,
you can use the SHOW PROCESSLIST; there to see the LOAD DATA LOCAL INFILE command executing!
-> LOAD DATA executed. Rows affected: 17872
-> Verification Result: [(17872, '[0.00207798,0.0234504,0.0248088,-0.0101102,0.0462614,-0.0190388,0.0619883,0.0491666,0.0265862,-0.00934642,-0.0995098,0.0397233,-0.0552096,0.0253242,0.029936,-0.0195666,-0.0608661,0.0158701,0.0253339,0.0459387,-0.0141414,-0.00794888,0.0213752,-0.0101096,0.1009,0.0132258,0.00994408,0.0649844,0.0359498,0.00901051,-0.0493552,0.0284283,0.0166625,-0.0703645,0.0288974,-0.0127835,-0.0162346,-0.0295971,0.00119797,0.014752,0.0327471,0.0327008,-0.0538816,-0.0343446,0.0388207,-0.0128942,-0.0578634,-0.0505732,0.0364845,-0.0185513,-0.0109562,-0.0236339,0.0850376,-0.0982703,0.0315816,0.0419593,-0.0214829,-0.0429301,0.0539161,-0.0595207,0.0101381,-0.0324808,0.0105534,-0.0295961,0.000469759,-0.0988063,-0.0261606,0.022961,-0.0431282,0.0262253,-0.0197124,-0.010355,0.0422773,-0.00657083,-0.0307955,0.043206,-0.0730875,0.00620324,0.0111945,0.00741116,0.106064,-0.0605068,0.0679394,0.0162757,0.0442884,0.0580315,-0.0317181,-0.056659,-0.0250365,0.0500393,0.014552,0.04476,0.0342192,-0.0427249,0.0201785,0.0179153,0.0471298,0.0743766,-0.0370667,0.0917412,0.0626817,0.0586675,-0.0685789,-0.0914458,0.0772436,0.00542554,0.00732471,-0.0697413,0.0390976,0.00170016,-0.0395483,0.031039,-0.0382415,-0.035023,-0.000318743,-0.0136482,0.00872001,0.12912,0.0229155,0.0353523,0.0721866,0.0821704,-0.0257309,0.0167397,0.122414,0.0199302,-0.0278529,5.47535e-34,0.0377817,0.0385392,-0.0292307,0.0330809,0.00722811,-0.0426218,0.027011,-0.141702,-0.0902081,-0.0307317,-0.020806,0.0545303,-0.0834641,-0.0445694,0.00319737,0.0126429,-0.00751519,-0.0711889,0.0474987,0.0450712,0.0713869,-0.0268942,-0.00465639,0.0221245,-0.0377545,-0.0690702,-0.00952104,-0.0239597,-0.0911421,-0.0158635,-0.0844707,0.0372535,0.0240838,0.0404958,-0.00198159,-0.0725607,0.0109588,-0.00463272,-0.029936,-0.100364,-0.0340318,0.00302746,-0.102122,0.0170296,-0.0158033,0.0598794,-0.00274417,-0.0407885,0.0162227,-0.0383247,0.0657488,0.0118673,0.0115224,0.0190624,0.0607903,-0.0136143,0.0737808,-0.0463348,-0.00973392,0.0136533,-0.00238923,0.0282113,-0.0422151,-0.0891679,0.00394035,0.0266988,0.0000345921,0.027942,-0.0359832,-0.0217479,-0.0339333,-0.0153249,-0.0942851,-0.0971724,0.0373808,0.0158067,-0.0564069,-0.000277566,0.0210942,-0.0601796,0.0690045,-0.105883,-0.00802559,-0.0160427,0.0175808,-0.0880975,0.0900668,0.0405845,-0.0564371,-0.049276,0.0457819,-0.0356521,0.0915206,0.00919633,-0.110842,-2.48525e-33,-0.074846,0.0494792,-0.0824998,0.0418563,-0.1552,-0.0439295,0.0192631,-0.00966113,0.0590463,-0.0146881,-0.00141262,-0.00580723,-0.00185381,0.125171,-0.0804913,0.0120403,-0.00742259,0.0574826,-0.0591295,-0.112534,0.0355046,0.0482968,0.0211093,0.0316254,-0.00540866,0.039821,0.0507724,-0.0538451,-0.108723,-0.056546,0.00917155,-0.0127874,0.0416142,-0.0590954,0.0310165,0.00620159,0.0697401,0.037875,-0.00289023,0.0218099,0.00173059,-0.100301,-0.0584759,-0.0396568,-0.026757,-0.0750557,0.0533853,0.0288793,-0.00308289,0.0572516,-0.0667324,0.0578654,-0.0726077,0.0570928,-0.0326259,-0.0651438,-0.000157117,-0.0244271,0.014564,-0.0291592,-0.0743428,0.0490508,-0.0526584,0.0306764,0.1116,0.0487338,0.00533625,-0.0664902,0.0604829,-0.0586754,-0.0611762,0.0997686,0.014563,0.0073697,0.0761423,0.0576636,-0.00103635,-0.0233747,0.0835168,0.0155544,-0.054695,-0.055751,-0.0528943,0.0611399,-0.0865382,-0.0566337,0.0420126,-0.0754347,0.00359017,-0.0107689,0.111075,-0.0227384,0.0180179,0.0210111,0.0581104,-4.14187e-8,-0.0225763,0.0266639,-0.116486,-0.0280538,0.0996045,-0.0382829,0.00980801,-0.0274747,0.0349741,0.13355,0.0601781,0.0702117,0.0266095,-0.0735184,0.0754873,0.0784986,0.0388271,0.0177443,-0.0194504,-0.0309874,0.0348932,0.0401711,-0.102051,-0.000333285,0.0430148,-0.0856237,-0.0293319,0.0183104,0.0692883,-0.00915029,0.00509703,0.00207025,0.0105034,0.0234064,-0.0567807,-0.0630953,-0.0389075,-0.0230801,0.0395504,0.027733,-0.0902364,-0.0485402,-0.0275208,-0.00357997,-0.0559152,-0.0538809,0.0113278,0.0377156,-0.00260669,-0.0852173,0.0148187,-0.0650994,0.0389202,0.0958818,0.0609211,-0.0238973,-0.0266706,0.102948,-0.0191063,-0.0067373,-0.0291557,0.00143591,0.0151075,0.0528758]')]
SUCCESS: Data load completed.
6. 运行MariaDB混合搜索
启动引擎:
python3 mariadb_cloud_hybrid_search.py
7. 🔧 故障排除
问题:“数据库或MCP/Brave API连接错误”
确保db_config.py和brave_api_config.py包含正确的凭证。
🚀 运行时演示!
$ python3 mariadb_cloud_hybrid_search.py
Initializing AI Model (SentenceTransformer)...
======================================================================
Starting Hybrid RAG Search for: 'AI technology'
======================================================================
--- PHASE 1: INTERNAL KNOWLEDGE RETRIEVAL VIA MARIADB VECTOR SEARCH ---
Connecting to MariaDB (serverless-eu-west-2.sysp0000.db1.skysql.com)...
Internal Search executed in 0.7903 seconds.
Found 5 internal documents:
[sci.med] (Dist: 0.5179)
> If you have any information on artificial intelligence in medicine, then I would appreciate it if you could mail me with whatever it is. The informati...
----------------------------------------
[comp.graphics] (Dist: 0.5711)
> From article <1993May1.092058.1@aurora.alaska.edu>, by pstlb@aurora.alaska.edu: Since this was posted on comp.ai, I assume there is an AI angle to thi...
----------------------------------------
[sci.med] (Dist: 0.6187)
> [For those attending the AAAI conf this summer, note that this conference is immediately preceding it.] PRELIMINARY PROGRAM AND REGISTRATION MATERIALS...
----------------------------------------
[comp.graphics] (Dist: 0.6231)
> The Harvard Computer Society is pleased to announce its third lecture of the spring. Ivan Sutherland, the father of computer graphics and an innovator...
----------------------------------------
[comp.graphics] (Dist: 0.6522)
> Technion - Israel Institute of Technology Department of Computer Science GRADUATE STUDIES IN COMPUTER GRAPHICS Applications are invited for graduate s...
----------------------------------------
--- PHASE 2: EXTERNAL WEB RETRIEVAL (MCP) FOR SEARCH RESULTS AUGMENTATION ---
Spinning up Local Python MCP Server...
╭──────────────────────────────────────────────────────────────────────────────╮
│ │
│ │
│ ▄▀▀ ▄▀█ █▀▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ │
│ █▀ █▀█ ▄▄█ █ █ ▀ █ █▄▄ █▀▀ │
│ │
│ │
│ FastMCP 2.14.2 │
│ https://gofastmcp.com │
│ │
│ 🖥 Server: brave-search │
│ 🚀 Deploy free: https://fastmcp.cloud │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────╮
│ ✨ FastMCP 3.0 is coming! │
│ Pin fastmcp<3 in production, then upgrade when you're ready. │
╰──────────────────────────────────────────────────────────────────────────────╯
[01/08/26 19:02:59] INFO Starting MCP server 'brave-search' with transport 'stdio' server.py:2504
Calling MCP Tool 'brave_web_search' with query: 'AI technology'
Found 3 external results:
----------------------------------------
Artificial intelligence - Wikipedia
Link: https://en.wikipedia.org/wiki/Artificial_intelligence
> The prevalence of generative AI tools has increased significantly since the AI boom in the 2020s. This boom was made possible by improvements in deep neural networks, particularly large language models (LLMs), which are based on the transformer architecture. Major tools include LLM-based chatbots such as ChatGPT, Claude, Copilot, DeepSeek, Google Gemini and Grok; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo, LTX and Sora. Technology companies developing generative AI include Alibaba, Anthropic, Baidu, DeepSeek, Google, Lightricks, Meta AI, Microsoft, Mistral AI, OpenAI, Perplexity AI, xAI, and Yandex.
----------------------------------------
What Is Artificial Intelligence (AI)? | IBM
Link: https://www.ibm.com/think/topics/artificial-intelligence
> Artificial intelligence (AI) is technology that <strong>enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy</strong>.
----------------------------------------
Home - AI Technology, Inc.
Link: https://www.aitechnology.com/
> Since pioneering the use of flexible epoxy technology for microelectronic packaging in 1985, AI Technology has been one of the leading forces in development and patented applications of advanced materials and adhesive solutions for electronic interconnection and packaging.
----------------------------------------