首页 最新 热门 推荐

  • 首页
  • 最新
  • 热门
  • 推荐

2025最新版微软GraphRAG 2.0.0本地部署教程:基于Ollama快速构建知识图谱

  • 25-04-24 13:04
  • 2444
  • 7627
blog.csdn.net

一、前言

微软近期发布了知识图谱工具 GraphRAG 2.0.0,支持基于本地大模型(Ollama)快速构建知识图谱,显著提升了RAG(检索增强生成)的效果。本文手把手教你如何从零部署,并附踩坑记录和性能实测!

二、环境准备

1. 创建虚拟环境

推荐使用 Python 3.12.4(亲测兼容性较佳):

  1. conda create -n graphrag200 python=3.12.4
  2. conda activate graphrag200

2. 拉取源码

建议通过Git下载最新代码(Windows用户需提前安装Git):

  1. git clone https://github.com/microsoft/graphrag.git
  2. cd graphrag

    (附:若直接下载压缩包解压,解压完后需创建一个仓库,不然后续会报错)

        创建仓库方法:

  1. git init
  2. git add .
  3. git commit -m "Initial commit"

3. 安装依赖

一键安装所需依赖包:

pip install -e .

4. 创建输入文件夹

用于存放待处理的文档(Windows可以直接手动创建):

mkdir -p ./graphrag_ollama/input

将数据集放入input目录即可。

三、关键配置修改

1. 初始化项目

执行初始化命令(注意与旧版参数不同):

python -m graphrag init --root ./graphrag_ollama

2. 修改settings.yaml

核心配置项(需按需调整):

  • 模型设置:使用Ollama本地模型

 注意修改一下圈出的几个地方

测试小文件时,建议把chunks改小:

 修改结果如下:

    1. ### This config file contains required core defaults that must be set, along with a handful of common optional settings.
    2. ### For a full list of available settings, see https://microsoft.github.io/graphrag/config/yaml/
    3. ### LLM settings ###
    4. ## There are a number of settings to tune the threading and token limits for LLM calls - check the docs.
    5. models:
    6. default_chat_model:
    7. type: openai_chat # or azure_openai_chat
    8. api_base: http://192.168.0.167:11434/v1
    9. # api_version: 2024-05-01-preview
    10. auth_type: api_key # or azure_managed_identity
    11. api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file
    12. # audience: "https://cognitiveservices.azure.com/.default"
    13. # organization: <organization_id>
    14. model: deepseek-r1:32b
    15. # deployment_name: <azure_model_deployment_name>
    16. encoding_model: cl100k_base # automatically set by tiktoken if left undefined
    17. model_supports_json: true # recommended if this is available for your model.
    18. concurrent_requests: 25 # max number of simultaneous LLM requests allowed
    19. async_mode: threaded # or asyncio
    20. retry_strategy: native
    21. max_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)
    22. tokens_per_minute: 0 # set to 0 to disable rate limiting
    23. requests_per_minute: 0 # set to 0 to disable rate limiting
    24. default_embedding_model:
    25. type: openai_embedding # or azure_openai_embedding
    26. api_base: http://192.168.0.167:11434/v1
    27. # api_version: 2024-05-01-preview
    28. auth_type: api_key # or azure_managed_identity
    29. api_key: ${GRAPHRAG_API_KEY}
    30. # audience: "https://cognitiveservices.azure.com/.default"
    31. # organization: <organization_id>
    32. model: bge-m3:latest
    33. # deployment_name: <azure_model_deployment_name>
    34. encoding_model: cl100k_base # automatically set by tiktoken if left undefined
    35. model_supports_json: true # recommended if this is available for your model.
    36. concurrent_requests: 25 # max number of simultaneous LLM requests allowed
    37. async_mode: threaded # or asyncio
    38. retry_strategy: native
    39. max_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)
    40. tokens_per_minute: 0 # set to 0 to disable rate limiting
    41. requests_per_minute: 0 # set to 0 to disable rate limiting
    42. vector_store:
    43. default_vector_store:
    44. type: lancedb
    45. db_uri: output\lancedb
    46. container_name: default
    47. overwrite: True
    48. embed_text:
    49. model_id: default_embedding_model
    50. vector_store_id: default_vector_store
    51. ### Input settings ###
    52. input:
    53. type: file # or blob
    54. file_type: text # or csv
    55. base_dir: "input"
    56. file_encoding: utf-8
    57. file_pattern: ".*\\.txt$$"
    58. chunks:
    59. size: 200
    60. overlap: 50
    61. group_by_columns: [id]
    62. ### Output settings ###
    63. ## If blob storage is specified in the following four sections,
    64. ## connection_string and container_name must be provided
    65. cache:
    66. type: file # [file, blob, cosmosdb]
    67. base_dir: "cache"
    68. reporting:
    69. type: file # [file, blob, cosmosdb]
    70. base_dir: "logs"
    71. output:
    72. type: file # [file, blob, cosmosdb]
    73. base_dir: "output"
    74. ### Workflow settings ###
    75. extract_graph:
    76. model_id: default_chat_model
    77. prompt: "prompts/extract_graph.txt"
    78. entity_types: [organization,person,geo,event]
    79. max_gleanings: 1
    80. summarize_descriptions:
    81. model_id: default_chat_model
    82. prompt: "prompts/summarize_descriptions.txt"
    83. max_length: 500
    84. extract_graph_nlp:
    85. text_analyzer:
    86. extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
    87. extract_claims:
    88. enabled: false
    89. model_id: default_chat_model
    90. prompt: "prompts/extract_claims.txt"
    91. description: "Any claims or facts that could be relevant to information discovery."
    92. max_gleanings: 1
    93. community_reports:
    94. model_id: default_chat_model
    95. graph_prompt: "prompts/community_report_graph.txt"
    96. text_prompt: "prompts/community_report_text.txt"
    97. max_length: 2000
    98. max_input_length: 8000
    99. cluster_graph:
    100. max_cluster_size: 10
    101. embed_graph:
    102. enabled: false # if true, will generate node2vec embeddings for nodes
    103. umap:
    104. enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
    105. snapshots:
    106. graphml: false
    107. embeddings: false
    108. ### Query settings ###
    109. ## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
    110. ## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
    111. local_search:
    112. chat_model_id: default_chat_model
    113. embedding_model_id: default_embedding_model
    114. prompt: "prompts/local_search_system_prompt.txt"
    115. global_search:
    116. chat_model_id: default_chat_model
    117. map_prompt: "prompts/global_search_map_system_prompt.txt"
    118. reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
    119. knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
    120. drift_search:
    121. chat_model_id: default_chat_model
    122. embedding_model_id: default_embedding_model
    123. prompt: "prompts/drift_search_system_prompt.txt"
    124. reduce_prompt: "prompts/drift_search_reduce_prompt.txt"
    125. basic_search:
    126. chat_model_id: default_chat_model
    127. embedding_model_id: default_embedding_model
    128. prompt: "prompts/basic_search_system_prompt.txt"

    四、构建知识图谱

    执行索引命令(算力警告:亲测4090-24G显卡处理2万字需3小时):

    python -m graphrag index --root ./graphrag_ollama

    五、知识图谱查询

    支持多种查询方式,按需选择:

  • 方法命令示例用途
    全局查询python -m graphrag query --method global --query "知识图谱定义"跨文档综合分析
    局部查询python -m graphrag query --method local --query "知识图谱定义"单文档精准检索
    DRIFT查询python -m graphrag query --method drift --query "知识图谱定义"动态漂移分析
    基础查询python -m graphrag query --method basic --query "知识图谱定义"传统RAG检索

六、注意事项

  1. 模型路径:确保Ollama服务已启动,且模型名称与配置一致(如deepseek-r1:32b需提前拉取)。

  2. 算力需求:小规模数据集建议使用GPU加速,CPU模式耗时可能成倍增加。

  3. 文件编码:输入文档需为UTF-8编码,否则可能报错。

  4. 配置备份:修改settings.yaml前建议备份原始文件。

七、总结

GraphRAG 2.0.0大幅优化了知识图谱的构建效率,结合本地模型可实现隐私安全的行业级应用。若遇到部署问题,欢迎在评论区留言交流!

相关资源:

 GraphRAG GitHub仓库

Ollama模型库

原创声明:本文为作者原创,未经授权禁止转载。如需引用请联系作者。


点赞关注,技术不迷路! 👍
你的支持是我更新的最大动力! ⚡

注:本文转载自blog.csdn.net的Sindy_he的文章"https://blog.csdn.net/m0_54356251/article/details/146074188"。版权归原作者所有,此博客不拥有其著作权,亦不承担相应法律责任。如有侵权,请联系我们删除。
复制链接
复制链接
相关推荐
发表评论
登录后才能发表评论和回复 注册

/ 登录

评论记录:

未查询到任何数据!
回复评论:

分类栏目

后端 (14832) 前端 (14280) 移动开发 (3760) 编程语言 (3851) Java (3904) Python (3298) 人工智能 (10119) AIGC (2810) 大数据 (3499) 数据库 (3945) 数据结构与算法 (3757) 音视频 (2669) 云原生 (3145) 云平台 (2965) 前沿技术 (2993) 开源 (2160) 小程序 (2860) 运维 (2533) 服务器 (2698) 操作系统 (2325) 硬件开发 (2492) 嵌入式 (2955) 微软技术 (2769) 软件工程 (2056) 测试 (2865) 网络空间安全 (2948) 网络与通信 (2797) 用户体验设计 (2592) 学习和成长 (2593) 搜索 (2744) 开发工具 (7108) 游戏 (2829) HarmonyOS (2935) 区块链 (2782) 数学 (3112) 3C硬件 (2759) 资讯 (2909) Android (4709) iOS (1850) 代码人生 (3043) 阅读 (2841)

热门文章

125
微软技术
关于我们 隐私政策 免责声明 联系我们
Copyright © 2020-2024 蚁人论坛 (iYenn.com) All Rights Reserved.
Scroll to Top