基于Llama 3.2快速构建指南：本地部署网页采集AI Agent

AI教程7个月前更新 AI怪打工人

90 0

本地部署的智能网页采集助手：基于 Llama 3.2 的构建指南

本文将指导你如何在本地环境中构建一个智能网页采集助手，利用 Llama 3.2 模型，实现无需联网即可高效抓取网页数据。

🧰 所需工具与环境准备

Python 3.7+：确保已安装。
Ollama：用于本地运行 Llama 3.2 模型。
ScrapeGraphAI：结合 LLM 和图结构逻辑的网页采集库。
Streamlit：快速构建交互式网页应用的框架。
代码编辑器：推荐使用 VS Code 或 PyCharm。

🛠️ 构建步骤概览

克隆项目仓库：

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
cd web_scrapping_ai_agent

安装依赖：

pip install -r requirements.txt

确保 Ollama 正在运行：
Ollama 应该在本地的 11434 端口运行。
创建 Streamlit 应用：
新建 local_ai_scrapper.py 文件，并添加以下代码：

import streamlit as st
from scrapegraphai.graphs import SmartScraperGraph

st.title("网页采集智能助手")
st.caption("使用 Llama 3.2 模型进行网页数据抓取")

url = st.text_input("请输入要采集的网页 URL")
user_prompt = st.text_input("请输入采集指令（例如：提取所有标题）")

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "format": "json",
        "base_url": "http://localhost:11434",
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",
    },
    "verbose": True,
}

if st.button("开始采集"):
    smart_scraper_graph = SmartScraperGraph(
        prompt=user_prompt,
        source=url,
        config=graph_config
    )
    result = smart_scraper_graph.run()
    st.write(result)