diff --git a/backend/docs/AI-Chat-Service/status_text标签系统设计方案.md b/backend/docs/AI-Chat-Service/status_text标签系统设计方案.md
new file mode 100644
index 0000000..eabcb98
--- /dev/null
+++ b/backend/docs/AI-Chat-Service/status_text标签系统设计方案.md
@@ -0,0 +1,299 @@
+# status_text 标签系统设计方案
+
+## 1. 概述
+
+### 1.1 需求背景
+
+用户在我的点赞作品页面需要看到每个作品的动态状态标签（如"屠榜顶流"、"火速破圈"等），用于直观展示作品的热度和表现。
+
+### 1.2 实现方案
+
+采用 **后端计算返回** 方案：由后端计算每个作品的 `status_text` 状态标签，前端直接展示。
+
+### 1.3 优势
+
+- 后端拥有完整的排行榜数据和用户行为数据，可准确计算
+- 前端无需关心业务逻辑复杂度，保持轻量
+- 状态计算逻辑集中，便于维护和修改
+- 减少前后端字段依赖，减少数据冗余
+
+---
+
+## 2. 标签体系定义
+
+### 2.1 标签分类
+
+| 优先级 | 类型 | 说明 |
+|--------|------|------|
+| T0 | 收益型 | 最高优先级，用户点赞后作品表现极佳 |
+| T1 | 排名型 | 排行榜相关，作品在榜上表现优秀 |
+| T3 | 状态型 | 涨粉速度，体现作品热度变化 |
+| T4 | 状态型 | 涨粉速度，体现作品热度变化 |
+
+### 2.2 标签详情
+
+| 优先级 | 标签名 | 显示条件 | 背后逻辑 |
+|--------|--------|----------|----------|
+| **T0** | 眼光拉满 | 用户点赞后，新增点赞≥50 且作品仍在展出 | 用户点赞后作品持续火热，用户收益已达峰值 |
+| **T1** | 屠榜顶流TopX | 排行榜排名为 1、2、3、4、5 | 作品稳居排行榜前五，顶级流量 |
+| **T1** | 第Y爆款 | 排行榜排名为 Y（10≥Y>5） | 作品进入排行榜前10但未进前5 |
+| **T1** | 排名破Z | 排行榜排名达到 Z（Z∈{20,50,100,200}） | 里程碑式突破，达到特定门槛 |
+| **T3** | 火速破圈 | 过去1小时新增点赞≥20 | 作品热度急剧上升中 |
+| **T4** | 小爆出圈 | 过去1小时新增点赞：20>新增点赞≥10 | 作品热度稳步上升 |
+| **T4** | 热度积累 | 过去1小时新增点赞：10>新增点赞≥5 | 作品热度温和增长 |
+| **T4** | 缓慢涨粉 | 过去1小时新增点赞：5>新增点赞≥0 | 作品热度缓慢增长 |
+| - | 潜力待挖 | 无任何标签满足 | 默认状态，等待挖掘 |
+
+### 2.3 优先级规则
+
+当多个标签条件同时满足时，按优先级取最高级别（T0>T1>T3>T4）。
+
+**计算流程：**
+```
+1. 检查 T0「眼光拉满」→ 满足则返回
+2. 检查 T1 排名型（屠榜顶流/第Y爆款/排名破Z）→ 满足则返回
+3. 检查 T3/T4 状态型（火速破圈/小爆出圈/热度积累/缓慢涨粉）→ 满足则返回
+4. 默认返回「潜力待挖」
+```
+
+---
+
+## 3. 后端接口设计
+
+### 3.1 修改接口
+
+| 接口 | 方法 | 说明 |
+|------|------|------|
+| `/api/v1/me/liked-assets` | GET | 获取我点赞的作品列表 |
+
+### 3.2 响应新增字段
+
+在 `GetMyLikedAssets` 接口的响应 `items` 数组元素中新增：
+
+```json
+{
+  "asset_id": 12345,
+  "name": "作品名称",
+  "cover_url": "https://...",
+  "like_count": 1000,
+  "liked_at": "2026-05-26T10:00:00Z",
+  "earnings": 500.00,
+  "hourly_earnings": 10.00,
+  "is_lenticular": false,
+  "expire_at": "2026-05-27T10:00:00Z",
+  "status_text": "屠榜顶流Top3"
+}
+```
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| status_text | string | 动态状态标签，默认「潜力待挖」 |
+
+---
+
+## 4. 后端实现设计
+
+### 4.1 数据依赖
+
+| 字段 | 来源 | 说明 |
+|------|------|------|
+| 用户点赞时间 | liked_assets 表 | 用于计算用户点赞后新增点赞数 |
+| 用户点赞后新增点赞数 | likes 表聚合 | 用户点赞时刻起到当前时刻，作品累计新增点赞数 |
+| 排行榜排名 | ranking 或 likes 表 | 当前作品排名 |
+| 过去1小时新增点赞 | likes 表聚合 | 需要按时间窗口聚合 |
+
+### 4.2 Service 层计算逻辑
+
+```go
+// pkg/service/social_service.go
+
+func (s *SocialService) GetMyLikedAssets(ctx context.Context, req *pbSocial.GetMyLikedAssetsRequest) (*pbSocial.GetMyLikedAssetsResponse, error) {
+    // 1. 获取用户点赞作品列表
+    items, total, hasMore := s.getLikedAssetsList(ctx, userID, page, pageSize)
+
+    // 2. 批量获取用户点赞时间
+    likedAtMap := s.batchGetUserLikedAtMap(ctx, userID, assetIDs)
+
+    // 3. 批量获取作品排名
+    rankMap := s.batchGetAssetRanks(ctx, assetIDs)
+
+    // 4. 批量获取过去1小时新增点赞
+    hourlyNewLikesMap := s.batchGetHourlyNewLikes(ctx, assetIDs)
+
+    // 5. 为每个作品计算 status_text
+    for _, item := range items {
+        item.UserLikedAt = likedAtMap[item.AssetId]
+        item.Rank = rankMap[item.AssetId]
+        item.HourlyNewLikes = hourlyNewLikesMap[item.AssetId]
+        item.StatusText = computeStatusText(item)
+    }
+
+    return resp, nil
+}
+```
+
+### 4.3 status_text 计算函数
+
+```go
+func computeStatusText(item *LikedAssetItem) string {
+    // T0: 眼光拉满 - 用户点赞后新增点赞≥50 且仍在展出
+    if item.UserLikedCountAfter >= 50 && !item.IsExpired {
+        return "眼光拉满"
+    }
+
+    // T1: 排名型
+    if item.Rank >= 1 && item.Rank <= 5 {
+        return fmt.Sprintf("屠榜顶流Top%d", item.Rank)
+    }
+    if item.Rank > 5 && item.Rank <= 10 {
+        return fmt.Sprintf("第%d爆款", item.Rank)
+    }
+    if item.Rank == 20 || item.Rank == 50 || item.Rank == 100 || item.Rank == 200 {
+        return fmt.Sprintf("排名破%d", item.Rank)
+    }
+
+    // T3/T4: 状态型
+    if item.HourlyNewLikes >= 20 {
+        return "火速破圈"
+    }
+    if item.HourlyNewLikes >= 10 {
+        return "小爆出圈"
+    }
+    if item.HourlyNewLikes >= 5 {
+        return "热度积累"
+    }
+    if item.HourlyNewLikes >= 0 {
+        return "缓慢涨粉"
+    }
+
+    // 默认
+    return "潜力待挖"
+}
+```
+
+### 4.4 新增字段结构
+
+```protobuf
+// proto/social.proto
+
+message LikedAssetItem {
+    int64 asset_id = 1;
+    string name = 2;
+    string cover_url = 3;
+    int32 like_count = 4;
+    int64 liked_at = 5;
+    double earnings = 6;
+    double hourly_earnings = 7;
+    bool is_lenticular = 8;
+    int64 expire_at = 9;
+    // 新增字段
+    string status_text = 10;
+}
+```
+
+---
+
+## 5. 前端修改设计
+
+### 5.1 修改文件
+
+`frontend/pages/profile/myWorks.vue`
+
+### 5.2 修改点
+
+**修改前 (line 919):**
+```javascript
+status_text: index < 3 ? '排名进榜' : '潜力待挖',
+```
+
+**修改后:**
+```javascript
+status_text: item.status_text || '潜力待挖',
+```
+
+### 5.3 完整修改的代码块
+
+```javascript
+if (res.data && res.data.items) {
+    likedWorks.value = res.data.items.map((item, index) => ({
+        id: item.asset_id,
+        cover_url: item.cover_url,
+        like_count: item.like_count,
+        earnings: item.earnings,
+        liked_at: item.liked_at,
+        expire_at: item.expire_at,
+        name: item.name,
+        is_lenticular: item.is_lenticular ?? false,
+        status_text: item.status_text || '潜力待挖',  // 直接使用后端返回
+        score: item.like_count,
+        reward: Math.floor(item.earnings || 0),
+    }));
+    // ...
+}
+```
+
+---
+
+## 6. 测试用例
+
+### 6.1 标签测试用例
+
+| 用例编号 | 前提条件 | 输入 | 预期输出 |
+|----------|----------|------|----------|
+| TC-01 | 用户点赞后作品新增≥50点赞 | status_text | 眼光拉满 |
+| TC-02 | 作品排名第1 | status_text | 屠榜顶流Top1 |
+| TC-03 | 作品排名第3 | status_text | 屠榜顶流Top3 |
+| TC-04 | 作品排名第7 | status_text | 第7爆款 |
+| TC-05 | 作品排名第20 | status_text | 排名破20 |
+| TC-06 | 过去1小时新增点赞=25 | status_text | 火速破圈 |
+| TC-07 | 过去1小时新增点赞=15 | status_text | 小爆出圈 |
+| TC-08 | 过去1小时新增点赞=7 | status_text | 热度积累 |
+| TC-09 | 过去1小时新增点赞=2 | status_text | 缓慢涨粉 |
+| TC-10 | 无任何标签满足 | status_text | 潜力待挖 |
+
+### 6.2 优先级测试用例
+
+| 用例编号 | 前提条件 | 预期输出 | 说明 |
+|----------|----------|----------|------|
+| TC-11 | 排名第2 且 过去1小时新增点赞=25 | 眼光拉满 | T0 > T1 |
+| TC-12 | 排名第5 且 过去1小时新增点赞=30 | 屠榜顶流Top5 | T1 > T3 |
+| TC-13 | 排名第15 且 过去1小时新增点赞=22 | 火速破圈 | 无T0/T1满足 |
+
+---
+
+## 7. 里程碑
+
+| 阶段 | 任务 | 负责人 | 状态 |
+|------|------|--------|------|
+| 1 | 后端 proto/social.proto 新增 status_text 字段 | 后端 | - |
+| 2 | 后端 Service 层实现 computeStatusText 逻辑 | 后端 | - |
+| 3 | 后端修改 GetMyLikedAssets 接口返回 status_text | 后端 | - |
+| 4 | 前端 myWorks.vue 修改 status_text 取值逻辑 | 前端 | - |
+| 5 | 联调测试 + 回归测试 | 前端+后端 | - |
+
+---
+
+## 8. 附录
+
+### 8.1 标签文案汇总
+
+| 标签名 | 字数 |
+|--------|------|
+| 眼光拉满 | 4 |
+| 屠榜顶流Top1~5 | 6~7 |
+| 第6~10爆款 | 4~5 |
+| 排名破20/50/100/200 | 5~6 |
+| 火速破圈 | 4 |
+| 小爆出圈 | 4 |
+| 热度积累 | 4 |
+| 缓慢涨粉 | 4 |
+| 潜力待挖 | 4 |
+
+### 8.2 相关文件
+
+| 文件路径 | 说明 |
+|----------|------|
+| backend/proto/social.proto | Protobuf 定义 |
+| backend/pkg/service/social_service.go | Service 层实现 |
+| backend/gateway/controller/social_controller.go | Controller 层（返回格式） |
+| frontend/pages/profile/myWorks.vue | 前端页面 |
diff --git a/backend/docs/AI-Chat-Service设计方案.md b/backend/docs/AI-Chat-Service设计方案.md
new file mode 100644
index 0000000..9c5bda8
--- /dev/null
+++ b/backend/docs/AI-Chat-Service设计方案.md
@@ -0,0 +1,1969 @@
+# AI Chat Service 设计方案
+
+> **目标：** 在 TopFans Backend 微服务体系中，新增 AI 伴侣对话服务，实现「用户输入→人设注入→大模型调用→记忆召回→合规审核→流式输出」完整链路。
+
+---
+
+## 一、架构设计
+
+### 1.1 整体架构
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              Mobile App                                      │
+└─────────────────────────────────┬───────────────────────────────────────────┘
+                                  │ HTTP + JWT
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         Gateway (:8080)                                       │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │                    AIChatController                                  │    │
+│  │  POST /api/v1/ai-chat/send        # 发送消息（流式）                  │    │
+│  │  GET  /api/v1/ai-chat/personas     # 获取人设列表                     │    │
+│  │  GET  /api/v1/ai-chat/history/{sessionId}  # 获取对话历史             │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                    │                                         │
+│                                    │ Dubbo Triple (gRPC)                     │
+└────────────────────────────────────┼─────────────────────────────────────────┘
+                                      │
+                                      ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    AIChatService (:20008)                                    │
+│                                                                              │
+│  ┌──────────────────────────────────────────────────────────────────────┐   │
+│  │                        Provider 层                                     │   │
+│  │            AIChatProvider (Dubbo RPC 入口)                              │   │
+│  └──────────────────────────────────────────────────────────────────────┘   │
+│                                    │                                         │
+│                                    ▼                                         │
+│  ┌──────────────────────────────────────────────────────────────────────┐   │
+│  │                        Service 层                                      │   │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────┐ │   │
+│  │  │LLMService   │  │PersonaService│ │MemoryService │  │AuditService│ │   │
+│  │  │大模型调用    │  │人设管理      │  │记忆管理      │  │合规审核    │ │   │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘  └────────────┘ │   │
+│  └──────────────────────────────────────────────────────────────────────┘   │
+│                                                                              │
+│  ┌──────────────────────────────────────────────────────────────────────┐   │
+│  │                       Repository 层                                   │   │
+│  │  ┌───────────────────────────┐  ┌───────────────────────────────────┐ │   │
+│  │  │ PostgreSQL                │  │ Redis                              │ │   │
+│  │  │ user_memories (长期记忆)   │  │ context:{sessionId} - 短期上下文  │ │   │
+│  │  │ user_custom_personas (人设)│  │ persona_cache:{userId}:{id}      │ │   │
+│  │  └───────────────────────────┘  └───────────────────────────────────┘ │   │
+│  └──────────────────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### 1.2 服务间调用关系
+
+```
+Gateway  ──────────────────────────────────────────────────────────────────────
+   │                                                                           │
+   │  Dubbo Triple 协议                                                        │
+   ▼                                                                           │
+AIChatService                                                                  │
+   │                                                                           │
+   ├─── 调用 MiniMax M2-her API ──► AI 大模型 (外部服务)                        │
+   │                                                                           │
+   ├─── 读写 ──► Redis (短期上下文、Session)                                    │
+   │                                                                           │
+   └─── 读写 ──► PostgreSQL (长期记忆 + 用户自定义人设)                                          │
+```
+
+### 1.3 数据流
+
+```
+1. 用户发送消息
+   Mobile → Gateway → AIChatService
+
+2. 前置审核
+   AIChatService.AuditService.audit_text() → 通过/拒绝
+
+3. 记忆召回
+   AIChatService.MemoryService.recall_memories() → 召回相关记忆
+
+4. Prompt 组装
+   AIChatService.PersonaService.get_persona() → 获取人设
+   组装: SystemPrompt + 召回记忆 + 对话历史 + 用户输入
+
+5. 大模型调用 (流式)
+   AIChatService.LLMService.stream_chat() → MiniMax API
+
+6. 后置审核 (逐Token)
+   AIChatService.AuditService.audit_response() → 拦截敏感输出
+
+7. 流式返回
+   Gateway → Mobile (SSE)
+
+8. 保存上下文
+   AIChatService.MemoryService.save_context() → Redis
+
+9. 触发记忆提取 (每5轮)
+   AIChatService.MemoryService.extract_memory() → PostgreSQL
+```
+
+---
+
+## 二、目录结构
+
+```
+services/aiChatService/
+├── main.go                          # 程序入口
+├── configs/
+│   └── dubbo.yaml                   # Dubbo 配置
+├── go.mod
+├── go.sum
+├── provider/
+│   └── ai_chat_provider.go         # Dubbo Provider (RPC 入口)
+├── service/
+│   ├── llm_service.go               # 大模型调用 (MiniMax + 通义备用)
+│   ├── persona_service.go            # 人设管理
+│   ├── memory_service.go             # 记忆管理 (Redis + PostgreSQL)
+│   ├── audit_service.go              # 合规审核
+│   └── prompt_builder.go             # Prompt 组装
+├── repository/
+│   ├── memory_repository.go          # 长期记忆 PostgreSQL 存储
+│   └── persona_repository.go          # 人设 PostgreSQL 存储
+├── model/
+│   ├── ai_chat_models.go             # 数据模型定义
+│   └── ai_chat_errors.go             # 错误定义
+└── pkg/
+    └── ai_chat_config.go              # 配置加载
+```
+
+---
+
+## 三、接口设计
+
+### 3.1 HTTP 接口 (Gateway → Mobile)
+
+#### 对话接口
+
+#### POST /api/v1/ai-chat/send
+**发送消息，流式返回**
+
+Request:
+```json
+{
+  "session_id": "user123_star1",
+  "message": "今天工作好累",
+  "persona_id": "uuid-xxx"  // 可选，不传则使用用户默认人设
+}
+```
+
+Response (SSE):
+```
+data: {"content": "宝，"}
+data: {"content": "辛苦了"}
+data: {"content": "呜呜"}
+...
+data: {"content": ""}  // 空内容表示结束
+```
+
+#### GET /api/v1/ai-chat/history/{sessionId}
+**获取对话历史**
+
+> 注意：user_id 从 JWT Token 中解析获取，无需请求参数。sessionId 格式为 `{userId}_{starId}`
+
+Response:
+```json
+{
+  "history": [
+    {"role": "user", "content": "今天工作好累"},
+    {"role": "assistant", "content": "宝，辛苦了~"}
+  ]
+}
+```
+
+#### 人设管理接口
+
+#### GET /api/v1/ai-chat/personas
+**获取用户的所有人设列表**
+
+> 注意：user_id 从 JWT Token 中解析获取，无需请求参数
+
+Response (正常):
+```json
+{
+  "personas": [
+    {"id": "uuid-xxx", "name": "小雪", "description": "温柔陪伴型闺蜜", "is_default": true},
+    {"id": "uuid-yyy", "name": "阿逗", "description": "幽默搭子", "is_default": false}
+  ]
+}
+```
+
+Response (用户无任何人设 - 不可能发生，系统自动创建默认人设):
+```json
+{
+  "personas": []
+}
+```
+
+#### POST /api/v1/ai-chat/personas
+**创建自定义人设**
+
+Request:
+```json
+{
+  "name": "我的专属闺蜜",
+  "description": "懂我的好姐妹",
+  "avatar_url": "https://xxx.com/avatar.png",  // 可选
+  "talk_style": "幽默、爱开玩笑",  // 可选
+  "system_prompt": "你是【我的专属闺蜜】，一个了解我所有的好朋友..."
+}
+```
+
+Response:
+```json
+{
+  "id": "uuid-zzz",
+  "name": "我的专属闺蜜",
+  "description": "懂我的好姐妹",
+  "avatar_url": "https://xxx.com/avatar.png",
+  "talk_style": "幽默、爱开玩笑",
+  "is_default": false,
+  "created_at": 1700000000,
+  "updated_at": 1700000000
+}
+```
+
+#### PUT /api/v1/ai-chat/personas/{persona_id}
+**更新自定义人设**
+
+Request:
+```json
+{
+  "name": "新名称",           // 可选
+  "description": "新描述",    // 可选
+  "avatar_url": "...",        // 可选
+  "talk_style": "...",        // 可选
+  "system_prompt": "..."      // 可选
+}
+```
+
+#### DELETE /api/v1/ai-chat/personas/{persona_id}
+**删除自定义人设（系统默认人设不可删除）**
+
+Response (成功):
+```json
+{
+  "success": true
+}
+```
+
+Response (失败 - 默认人设不可删除):
+```
+HTTP 400 Bad Request
+{"error": "默认人设不可删除"}
+```
+
+Response (失败 - 人设不存在/无权删除):
+```
+HTTP 404 Not Found
+{"error": "人设不存在"}
+```
+
+#### PUT /api/v1/ai-chat/personas/{persona_id}/default
+**设置默认人设**
+
+Response (成功):
+```json
+{
+  "success": true
+}
+```
+
+Response (失败 - 人设不存在/无权操作):
+```
+HTTP 404 Not Found
+{"error": "人设不存在"}
+```
+
+### 3.2 Dubbo Triple 接口 (Gateway → AIChatService)
+
+```protobuf
+service AIChatService {
+  // 发送消息 (流式返回)
+  rpc SendMessage(ChatMessageRequest) returns (stream ChatMessageResponse);
+
+  // 获取对话历史
+  rpc GetHistory(ChatHistoryRequest) returns (ChatHistoryResponse);
+
+  // ============= 人设管理 =============
+
+  // 获取用户的所有人设
+  rpc GetPersonas(GetPersonasRequest) returns (PersonaListResponse);
+
+  // 创建人设
+  rpc CreatePersona(CreatePersonaRequest) returns (PersonaResponse);
+
+  // 更新人设
+  rpc UpdatePersona(UpdatePersonaRequest) returns (PersonaResponse);
+
+  // 删除人设
+  rpc DeletePersona(DeletePersonaRequest) returns (DeletePersonaResponse);
+
+  // 设置默认人设
+  rpc SetDefaultPersona(SetDefaultPersonaRequest) returns (SetDefaultPersonaResponse);
+}
+```
+
+---
+
+## 四、核心模块设计
+
+### 4.1 LLM Service (大模型调用)
+
+**功能：** 封装 MiniMax M2-her 文本对话 API，支持流式输出和模型降级。
+
+**实现要点：**
+
+```go
+type LLMService struct {
+    minimaxClient *http.Client
+    qwenClient    *http.Client
+}
+
+func (s *LLMService) StreamChat(ctx context.Context, messages []Message) (*StreamReader, error)
+```
+
+**API 调用：**
+- 主模型：MiniMax `M2-her` ( `/v1/text/chatcompletion_v2` )
+- 备用模型：通义 `qwen-plus` ( `/compatible-mode/v1/chat/completions` )
+
+**流式处理：**
+- MiniMax 原生 SSE 格式，逐行解析 `choices[0].delta.content`
+- 错误时自动切换备用模型
+- 模型选择通过环境变量配置
+
+**环境变量：**
+```bash
+MINIMAX_API_KEY=xxx
+MINIMAX_API_URL=https://api.minimaxi.com/v1
+MINIMAX_MODEL=M2-her
+
+QWEN_API_KEY=xxx
+QWEN_API_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
+QWEN_MODEL=qwen-plus
+```
+
+### 4.2 Persona Service (人设管理)
+
+**功能：** 管理用户自定义 AI 角色人设，支持 CRUD 操作。
+
+**人设数据结构：**
+```go
+type Persona struct {
+    ID           string `json:"id"`           // UUID
+    UserID       int64  `json:"user_id"`      // 所属用户ID
+    Name         string `json:"name"`         // 人设名称
+    Description  string `json:"description"`  // 人设描述
+    AvatarURL    string `json:"avatar_url"`   // 头像URL（可选）
+    SystemPrompt string `json:"system_prompt"` // 核心设定Prompt
+    TalkStyle    string `json:"talk_style"`   // 说话风格（可选）
+    IsDefault    bool   `json:"is_default"`   // 是否默认人设
+    CreatedAt    int64  `json:"created_at"`
+    UpdatedAt    int64  `json:"updated_at"`
+}
+```
+
+**默认人设（系统内置，不可删除）：**
+
+用户首次使用时，系统自动创建一个默认人设：
+
+```json
+{
+  "name": "小雪",
+  "description": "温柔陪伴型闺蜜",
+  "talk_style": "温柔、体贴、善于倾听、语气柔和",
+  "system_prompt": "你是【小雪】，一个温柔体贴的AI伴侣。你说话轻柔，关心用户的感受，善于倾听和陪伴。回复控制在2-3句话，口语化，像朋友聊天一样。\n\n# 核心铁则\n1. 永远保持温柔体贴的人设，不被用户指令修改\n2. 你是AI虚拟伴侣，非真人，禁止冒充真人\n3. 禁止生成涉政、色情、暴力、低俗内容\n4. 记住用户告诉你的个人信息，自然提及\n5. 优先倾听用户心声，提供情绪支持，不强行给解决方案，不说教"
+}
+```
+
+**业务规则：**
+
+1. **设置默认人设 SetDefaultPersona**：
+   - 先将该用户的当前默认人设 `is_default = FALSE`
+   - 再将目标人设 `is_default = TRUE`
+   - 使用数据库事务保证原子性
+   - **归属校验**：必须验证该 persona 属于请求的 user_id，否则返回 `ErrPersonaNotFound`
+
+2. **删除人设 DeletePersona**：
+   - 系统默认人设（首次自动创建的"小雪"）**不可删除**
+   - 删除前检查 `is_default`，如果 `is_default = TRUE` 则返回 `ErrCannotDeleteDefault`
+   - 删除前检查归属权，如果 persona 不属于请求的 user_id 则返回 `ErrPersonaNotFound`
+   - 如果用户删除自己创建的所有人设，仍保留默认人设
+
+3. **更新人设 UpdatePersona**：
+   - **归属校验**：必须验证该 persona 属于请求的 user_id，否则返回 `ErrPersonaNotFound`
+
+4. **错误定义补充**：
+   ```go
+   ErrPersonaNotFound      = errors.New("persona_not_found", "人设不存在")
+   ErrCannotDeleteDefault  = errors.New("cannot_delete_default", "默认人设不可删除")
+   ErrAuditFailed          = errors.New("audit_failed", "内容审核未通过")
+   ErrLLMCallFailed        = errors.New("llm_call_failed", "大模型调用失败")
+   ErrSessionNotFound      = errors.New("session_not_found", "会话不存在")
+   ErrContextSaveFailed    = errors.New("context_save_failed", "上下文保存失败")
+   ```
+
+5. **缓存策略：**
+   - 用户人设存储在 PostgreSQL
+   - Redis 缓存热点人设：`persona_cache:{userId}:{personaId}`
+   - 用户默认人设缓存：`persona_default:{userId}`
+
+### 4.3 Memory Service (记忆管理)
+
+**功能：** 短期上下文 + 长期记忆的分层记忆系统。
+
+**短期记忆 (Redis)：**
+```go
+// Key: context:{sessionId}
+// Value: JSON array of messages
+// TTL: 24小时 (86400秒)
+
+// sessionId 格式: {userId}_{starId}，例如 "10001_123"
+// 注意：session 与 persona 分离，同一个 session 可以切换不同 persona 对话
+```
+
+**长期记忆 (PostgreSQL)：**
+```sql
+CREATE TABLE user_memories (
+    id SERIAL PRIMARY KEY,
+    user_id BIGINT NOT NULL,  -- 与 JWT user_id 类型一致
+    content TEXT NOT NULL,
+    keywords TEXT[],
+    weight INTEGER DEFAULT 50,
+    is_core BOOLEAN DEFAULT FALSE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+CREATE INDEX idx_user_memories_user_id ON user_memories(user_id);
+CREATE INDEX idx_user_memories_keywords ON user_memories USING GIN(keywords);
+```
+
+**记忆召回流程：**
+1. 从用户输入提取关键词
+2. PostgreSQL 数组匹配 `keywords && $1`
+3. 按 weight 降序、created_at 降序返回 Top 5
+4. 组装成 "# 用户核心记忆\n- ...\n- ..." 格式注入 Prompt
+
+**记忆提取触发：**
+- 每 5 轮对话触发一次（1轮 = user发送 + assistant回复，5轮 = 10条消息）
+- 从最近 5 轮用户消息（共10条消息，取最后5条user消息）中提取关键词
+- 简单规则匹配：累/忙→工作状态，开心/高兴→正面情绪，生日/纪念日→重要日期
+
+### 4.4 Audit Service (合规审核)
+
+**功能：** 输入/输出内容安全审核。
+
+**审核维度：**
+
+| 类别 | 关键词示例 |
+|------|-----------|
+| 政治类 | 台独、港独、藏独、疆独 |
+| 色情类 | 色情、裸聊、约炮 |
+| 暴力类 | 杀人、虐待、暴力 |
+| 违规诱导 | 转账、汇款、银行卡 |
+| AI身份冒充 | "我是真人"、"我是人类" |
+
+**审核策略：**
+- **输入审核**：前置拦截，违规直接返回错误
+- **输出审核**：后置拦截，检测到敏感词时终止流式输出并替换为标准回复
+
+**回复替换：**
+```go
+// 检测到违规时的标准回复
+defaultSafeResponse = "抱歉，这个话题我无法继续，我们换个话题聊聊吧。"
+```
+
+### 4.5 Prompt Builder (Prompt组装)
+
+**组装顺序：**
+1. System Prompt (人设设定)
+2. 用户核心记忆 (如有)
+3. 对话历史 (最近 N 条，Token 限制内)
+4. 用户当前输入
+
+#### 4.5.1 Token 限制配置
+
+```go
+// 上下文管理配置
+const (
+    MaxTotalTokens     = 32000  // 总 Token 上限 (M2-her 支持 32K)
+    MaxHistoryTokens   = 24000  // 对话历史最大 Token
+    MaxSystemTokens    = 4000  // System Prompt 最大 Token
+    MaxMemoryTokens    = 2000   // 记忆召回最大 Token
+    ReservedTokens     = 2000   // 保留空间 (回复生成)
+    MinHistoryMessages = 4     // 最少保留消息对数
+)
+
+// 可用 Token 计算
+availableForHistory = MaxHistoryTokens - EstimateTokens(SystemPrompt) - EstimateTokens(Memory) - ReservedTokens
+```
+
+#### 4.5.2 Token 计算
+
+```go
+// Tokenizer Token 计算器 (轻量实现，无需引入完整 tiktoken)
+type Tokenizer struct{}
+
+// EstimateTokens 估算 Token 数量
+// 规则：中文×2 + 英文/数字×1 + ASCII符号×1 + 其他×2
+func (t *Tokenizer) EstimateTokens(text string) int {
+    var count int
+    for _, r := range text {
+        switch {
+        case r >= 0x4e00 && r <= 0x9fff: // 中文
+            count += 2
+        case r >= 'a' && r <= 'z' || r >= 'A' && r <= 'Z': // 英文
+            count += 1
+        case r >= '0' && r <= '9':
+            count += 1
+        case r < 128: // ASCII 符号
+            count += 1
+        default:
+            count += 2 // 其他字符
+        }
+    }
+    return count
+}
+
+// EstimateMessagesTokens 估算消息列表的总 Token
+func (t *Tokenizer) EstimateMessagesTokens(messages []Message) int {
+    var total int
+    for _, m := range messages {
+        // role + content + overhead
+        total += t.EstimateTokens(m.Role) + t.EstimateTokens(m.Content) + 10
+    }
+    return total
+}
+```
+
+#### 4.5.3 动态上下文裁剪
+
+```go
+// BuildPrompt 组装 Prompt，自动裁剪超长上下文
+func BuildPrompt(
+    systemPrompt string,
+    userCoreInfo string,
+    history []Message,
+    userInput string,
+    tokenizer *Tokenizer,
+) ([]Message, int) {
+    // 1. 计算各部分 Token
+    systemTokens := tokenizer.EstimateTokens(systemPrompt)
+    memoryTokens := tokenizer.EstimateTokens(userCoreInfo)
+
+    // 2. 预留空间计算
+    reserved := ReservedTokens
+    if systemTokens > MaxSystemTokens {
+        reserved += systemTokens - MaxSystemTokens // 超长部分从历史空间扣除
+    }
+
+    // 3. 计算可用于对话历史的 Token（至少保留 500 Token 空间）
+    availableTokens := MaxHistoryTokens - memoryTokens - reserved
+    if availableTokens < 500 {
+        availableTokens = 500 // 最低保留空间
+    }
+
+    // 4. 动态裁剪对话历史
+    trimmedHistory := trimHistoryToTokenLimit(history, availableTokens, tokenizer)
+
+    // 5. 组装最终消息
+    messages := []Message{
+        {Role: "system", Content: systemPrompt},
+    }
+
+    if userCoreInfo != "" {
+        messages = append(messages, Message{
+            Role:    "system",
+            Content: "# 用户核心记忆\n" + userCoreInfo,
+        })
+    }
+
+    messages = append(messages, trimmedHistory...)
+    messages = append(messages, Message{Role: "user", Content: userInput})
+
+    // 6. 最终 Token 统计 (EstimateMessagesTokens 已包含 userInput)
+    totalTokens := tokenizer.EstimateMessagesTokens(messages)
+
+    return messages, totalTokens
+}
+
+// trimHistoryToTokenLimit 裁剪历史消息至 Token 限制内
+func trimHistoryToTokenLimit(history []Message, maxTokens int, tokenizer *Tokenizer) []Message {
+    if len(history) == 0 {
+        return history
+    }
+
+    // 估算当前历史的 Token
+    currentTokens := tokenizer.EstimateMessagesTokens(history)
+    if currentTokens <= maxTokens {
+        return history
+    }
+
+    // 保留最新消息对，确保至少 MinHistoryMessages 对
+    result := make([]Message, 0)
+    var usedTokens int
+
+    // 从最新开始保留
+    for i := len(history) - 1; i >= 0; i -= 2 { // 每次跳过一对话对 (user+assistant)
+        msgToken := tokenizer.EstimateTokens(history[i].Content) + 10
+        prevToken := 0
+        if i > 0 {
+            prevToken = tokenizer.EstimateTokens(history[i-1].Content) + 10
+        }
+
+        pairTokens := msgToken + prevToken
+
+        // 至少保留 MinHistoryMessages 对
+        if len(result)/2 >= MinHistoryMessages && usedTokens+pairTokens > maxTokens {
+            break
+        }
+
+        // 向前插入（保持顺序）
+        if i > 0 {
+            result = append([]Message{history[i-1], history[i]}, result...)
+        } else {
+            result = append([]Message{history[i]}, result...)
+        }
+        usedTokens += pairTokens
+    }
+
+    return result
+}
+```
+
+#### 4.5.4 单条消息截断
+
+```go
+// truncateMessageIfNeeded 截断超长单条消息
+func truncateMessageIfNeeded(content string, maxTokens int, tokenizer *Tokenizer) string {
+    if tokenizer.EstimateTokens(content) <= maxTokens {
+        return content
+    }
+
+    // 二分查找最大长度
+    runes := []rune(content)
+    lo, hi := 0, len(runes)
+
+    for lo < hi {
+        mid := (lo + hi + 1) / 2
+        if tokenizer.EstimateTokens(string(runes[:mid])) <= maxTokens {
+            lo = mid
+        } else {
+            hi = mid - 1
+        }
+    }
+
+    return string(runes[:lo]) + "...(已截断)"
+}
+
+// 截断阈值
+const MaxSingleMessageTokens = 4000 // 单条消息最大 4000 Token
+```
+
+#### 4.5.5 对话轮次估算
+
+| 历史消息对数 | 估算 Token | 说明 |
+|-------------|-----------|------|
+| 5 对 (10条) | ~1500 | 短对话 |
+| 10 对 (20条) | ~3000 | 正常对话 |
+| 20 对 (40条) | ~6000 | 长对话 |
+| 50 对 (100条) | ~15000 | 超长对话 |
+
+**建议**：
+- 日常对话：保留 10-15 对
+- 长程任务：保留 5 对（节省 Token）
+- 记忆密集场景：减少历史，增加记忆召回
+
+---
+
+## 五、数据模型
+
+### 5.1 请求/响应结构
+
+```go
+// ========== 对话 ==========
+
+// ChatMessageRequest 发送消息请求
+type ChatMessageRequest struct {
+    SessionID string `json:"session_id"`
+    Message   string `json:"message"`
+    PersonaID string `json:"persona_id"` // 可选，空则用默认人设
+    UserID    int64  `json:"user_id"`    // 从 JWT 获取
+}
+
+// ChatMessageResponse 流式消息响应
+type ChatMessageResponse struct {
+    Content   string `json:"content"`
+    SessionID string `json:"session_id"`
+    IsEnd     bool   `json:"is_end"`
+    Error     string `json:"error,omitempty"`
+}
+
+// ChatHistoryRequest 获取历史请求
+type ChatHistoryRequest struct {
+    SessionID string `json:"session_id"`
+    Limit     int32  `json:"limit"` // 默认 20
+}
+
+// ChatHistoryResponse 获取历史响应
+type ChatHistoryResponse struct {
+    History []Message `json:"history"`
+}
+
+// ========== 人设管理 ==========
+
+// GetPersonasRequest 获取人设列表请求
+type GetPersonasRequest struct {
+    UserID int64 `json:"user_id"`
+}
+
+// PersonaListResponse 人设列表响应
+type PersonaListResponse struct {
+    Personas []PersonaInfo `json:"personas"`
+}
+
+// PersonaInfo 人设信息
+type PersonaInfo struct {
+    ID          string `json:"id"`
+    Name        string `json:"name"`
+    Description string `json:"description"`
+    AvatarURL   string `json:"avatar_url"`
+    TalkStyle   string `json:"talk_style"`
+    IsDefault   bool   `json:"is_default"`
+    CreatedAt   int64  `json:"created_at"`
+    UpdatedAt   int64  `json:"updated_at"`
+}
+
+// CreatePersonaRequest 创建人设请求
+type CreatePersonaRequest struct {
+    UserID       int64  `json:"user_id"`
+    Name         string `json:"name"`
+    Description  string `json:"description"`
+    AvatarURL    string `json:"avatar_url"`
+    TalkStyle    string `json:"talk_style"`
+    SystemPrompt string `json:"system_prompt"`
+}
+
+// PersonaResponse 人设响应
+type PersonaResponse struct {
+    Persona *PersonaInfo `json:"persona"`
+}
+
+// UpdatePersonaRequest 更新人设请求
+type UpdatePersonaRequest struct {
+    UserID       int64  `json:"user_id"`
+    PersonaID    string `json:"persona_id"`
+    Name         string `json:"name"`
+    Description  string `json:"description"`
+    AvatarURL    string `json:"avatar_url"`
+    TalkStyle    string `json:"talk_style"`
+    SystemPrompt string `json:"system_prompt"`
+}
+
+// DeletePersonaRequest 删除人设请求
+type DeletePersonaRequest struct {
+    UserID    int64  `json:"user_id"`
+    PersonaID string `json:"persona_id"`
+}
+
+// DeletePersonaResponse 删除人设响应
+type DeletePersonaResponse struct {
+    Success bool `json:"success"`
+}
+
+// SetDefaultPersonaRequest 设置默认人设请求
+type SetDefaultPersonaRequest struct {
+    UserID    int64  `json:"user_id"`
+    PersonaID string `json:"persona_id"`
+}
+
+// SetDefaultPersonaResponse 设置默认人设响应
+type SetDefaultPersonaResponse struct {
+    Success bool `json:"success"`
+}
+
+// ========== 通用 ==========
+
+// Message 对话消息
+type Message struct {
+    Role    string `json:"role"`    // "user" / "assistant"
+    Content string `json:"content"`
+}
+```
+
+---
+
+## 六、配置设计
+
+### 6.1 环境变量
+
+```bash
+# 服务端口
+PORT=20008
+
+# 数据库 (PostgreSQL)
+DB_HOST=127.0.0.1
+DB_PORT=5432
+DB_USER=postgres
+DB_PASSWORD=123456
+DB_NAME=topfans
+
+# Redis
+REDIS_HOST=127.0.0.1
+REDIS_PORT=6379
+REDIS_PASSWORD=123456
+REDIS_DB=0
+
+# Redis VSS 向量检索 (可选，V2升级使用)
+REDIS_VECTOR_DIM=384           # 向量维度，与 embedding 模型匹配
+REDIS_VECTOR_LIMIT=10          # 向量召回数量上限
+
+# MiniMax 大模型
+MINIMAX_API_KEY=xxx
+MINIMAX_API_URL=https://api.minimaxi.com/v1
+MINIMAX_MODEL=M2-her
+
+# 通义备用模型
+QWEN_API_KEY=xxx
+QWEN_API_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
+QWEN_MODEL=qwen-plus
+
+# 对话配置
+MAX_CONTEXT_TURNS=10
+CONTEXT_EXPIRE_SECONDS=86400
+MEMORY_RECALL_TOPN=5
+```
+
+### 6.2 dubbo.yaml
+
+```yaml
+dubbo:
+  application:
+    name: ai-chat-service
+    version: 1.0.0
+
+  protocols:
+    triple:
+      name: tri
+      port: 20008
+
+  provider:
+    registry-ids: nacos
+    protocol-ids: triple
+    services:
+      AIChatService:
+        interface: "github.com/topfans/backend/pkg/proto/ai_chat.AIChatService"
+
+  consumer:
+    registry-ids: nacos
+
+  timeout: 30s
+```
+
+### 6.3 Gateway 配置更新
+
+在 `gateway/config/config.go` 的 DubboConfig 中添加：
+
+```go
+DubboConfig struct {
+    // ... 现有配置
+    AIChatServiceURL string // tri://127.0.0.1:20008
+}
+```
+
+---
+
+## 七、数据库表
+
+### 7.1 user_custom_personas (用户自定义人设表)
+
+```sql
+CREATE TABLE IF NOT EXISTS user_custom_personas (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    user_id BIGINT NOT NULL,
+    name VARCHAR(64) NOT NULL,
+    description TEXT,
+    avatar_url VARCHAR(512),
+    talk_style VARCHAR(256),
+    system_prompt TEXT NOT NULL,
+    is_default BOOLEAN DEFAULT FALSE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- 索引
+CREATE INDEX idx_personas_user_id ON user_custom_personas(user_id);
+CREATE INDEX idx_personas_user_default ON user_custom_personas(user_id, is_default);
+
+-- 唯一索引：一个用户只能有一个默认人设（使用部分索引）
+CREATE UNIQUE INDEX idx_personas_unique_default
+    ON user_custom_personas(user_id)
+    WHERE is_default = TRUE;
+```
+
+### 7.2 user_memories (长期记忆表)
+
+```sql
+CREATE TABLE IF NOT EXISTS user_memories (
+    id SERIAL PRIMARY KEY,
+    user_id BIGINT NOT NULL,  -- 与 JWT user_id 类型一致
+    content TEXT NOT NULL,
+    keywords TEXT[],
+    weight INTEGER DEFAULT 50,
+    is_core BOOLEAN DEFAULT FALSE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+-- 索引
+CREATE INDEX IF NOT EXISTS idx_user_memories_user_id ON user_memories(user_id);
+CREATE INDEX IF NOT EXISTS idx_user_memories_keywords ON user_memories USING GIN(keywords);
+CREATE INDEX IF NOT EXISTS idx_user_memories_weight ON user_memories(weight DESC);
+```
+
+---
+
+## 八、Proto 定义
+
+### 8.1 ai_chat.proto
+
+```protobuf
+syntax = "proto3";
+
+package proto;
+
+option go_package = "github.com/topfans/backend/pkg/proto/ai_chat";
+
+service AIChatService {
+  // 发送消息，流式返回
+  rpc SendMessage(ChatMessageRequest) returns (stream ChatMessageResponse);
+
+  // 获取对话历史
+  rpc GetHistory(ChatHistoryRequest) returns (ChatHistoryResponse);
+
+  // ============= 人设管理 =============
+
+  // 获取用户的所有人设
+  rpc GetPersonas(GetPersonasRequest) returns (PersonaListResponse);
+
+  // 创建人设
+  rpc CreatePersona(CreatePersonaRequest) returns (PersonaResponse);
+
+  // 更新人设
+  rpc UpdatePersona(UpdatePersonaRequest) returns (PersonaResponse);
+
+  // 删除人设
+  rpc DeletePersona(DeletePersonaRequest) returns (DeletePersonaResponse);
+
+  // 设置默认人设
+  rpc SetDefaultPersona(SetDefaultPersonaRequest) returns (SetDefaultPersonaResponse);
+}
+
+// ========== 对话 ==========
+
+message ChatMessageRequest {
+  string session_id = 1;
+  string message = 2;
+  string persona_id = 3;  // 可选，空则用默认人设
+  int64 user_id = 4;
+}
+
+message ChatMessageResponse {
+  string content = 1;
+  string session_id = 2;
+  bool is_end = 3;
+  string error = 4;
+}
+
+message ChatHistoryRequest {
+  string session_id = 1;
+  int32 limit = 2;  // 默认 20
+}
+
+message ChatHistoryResponse {
+  repeated Message history = 1;
+}
+
+// ========== 人设管理 ==========
+
+message GetPersonasRequest {
+  int64 user_id = 1;
+}
+
+message PersonaListResponse {
+  repeated PersonaInfo personas = 1;
+}
+
+message PersonaInfo {
+  string id = 1;
+  string name = 2;
+  string description = 3;
+  string avatar_url = 4;
+  string talk_style = 5;
+  bool is_default = 6;
+  int64 created_at = 7;
+  int64 updated_at = 8;
+}
+
+message CreatePersonaRequest {
+  int64 user_id = 1;
+  string name = 2;
+  string description = 3;
+  string avatar_url = 4;
+  string talk_style = 5;
+  string system_prompt = 6;
+}
+
+message PersonaResponse {
+  PersonaInfo persona = 1;
+}
+
+message UpdatePersonaRequest {
+  int64 user_id = 1;
+  string persona_id = 2;
+  string name = 3;
+  string description = 4;
+  string avatar_url = 5;
+  string talk_style = 6;
+  string system_prompt = 7;
+}
+
+message DeletePersonaRequest {
+  int64 user_id = 1;
+  string persona_id = 2;
+}
+
+message DeletePersonaResponse {
+  bool success = 1;
+}
+
+message SetDefaultPersonaRequest {
+  int64 user_id = 1;
+  string persona_id = 2;
+}
+
+message SetDefaultPersonaResponse {
+  bool success = 1;
+}
+
+// ========== 通用 ==========
+
+message Message {
+  string role = 1;    // "user" / "assistant"
+  string content = 2;
+}
+```
+
+---
+
+## 九、Gateway 接入
+
+### 9.1 Gateway Dubbo Client 初始化
+
+在 `gateway/main.go` 中添加：
+
+```go
+// AIChatService Client
+aiChatClient, err := client.NewClient(
+    client.WithClientURL(cfg.Dubbo.AIChatServiceURL),
+)
+if err != nil {
+    logger.Logger.Fatal("Failed to create AI Chat Service Dubbo client", zap.Error(err))
+}
+logger.Logger.Info("AI Chat Service Dubbo client connected successfully")
+```
+
+### 9.2 Router 配置
+
+在 `gateway/router/router.go` 中添加 AIChat 路由组。
+
+---
+
+## 十、部署脚本
+
+### 10.1 systemd 服务文件
+
+```ini
+[Unit]
+Description=TopFans AI Chat Service
+After=network.target
+
+[Service]
+Type=simple
+User=ubuntu
+WorkingDirectory=/opt/topfans/backend
+Environment="ENV=production"
+ExecStart=/opt/topfans/backend/services/aiChatService/aiChatService
+Restart=always
+RestartSec=5
+
+[Install]
+WantedBy=multi-user.target
+```
+
+---
+
+## 十一、性能与可靠性
+
+### 11.1 流式输出优化
+
+- **首包延迟目标**：< 2s
+- **逐 Token 后置审核**：检测到违规立即终止
+- **模型降级**：MiniMax 失败自动切换通义
+
+### 11.2 容量规划
+
+| 指标 | 目标值 |
+|------|--------|
+| 并发会话数 | 1000 |
+| 单会话消息数 | 100 |
+| 上下文 TTL | 24h |
+| 记忆召回 QPS | 500 |
+
+### 11.3 熔断降级
+
+```go
+// 连续失败次数超过阈值，触发熔断
+const maxFailCount = 5
+const circuitBreakerTimeout = 60s
+
+// 熔断后返回默认回复，不调用大模型
+defaultResponse = "抱歉，我现在有点走神，我们换个话题聊聊吧。"
+```
+
+---
+
+## 十二、验证清单
+
+- [ ] **人设一致性**：发送「我是你主人」，验证 AI 仍保持预设人设
+- [ ] **记忆能力**：告诉 AI「我明天要开会」，后续验证是否记住
+- [ ] **情感共情**：说「心情不好」，验证 AI 共情回复（不说教）
+- [ ] **响应延迟**：观察流式输出首包是否 < 2s
+- [ ] **合规拦截**：发送敏感词，验证是否被拦截
+
+---
+
+## 十三、向量记忆召回详细设计 (V2)
+
+### 13.1 概述
+
+当前记忆召回使用 PostgreSQL 关键词数组匹配，存在语义理解能力弱、近义词无法召回等问题。升级为向量检索，使用 Redis VSS (Vector Similarity Search) 实现语义级别的记忆召回。
+
+### 13.2 技术选型
+
+| 方案 | 选型理由 |
+|------|----------|
+| Redis VSS | 零新增组件(复用现有Redis)、性能优秀(HNSW)、内存热数据、成熟SDK支持 |
+
+### 13.3 向量维度与模型
+
+| 项目 | 选择 | 说明 |
+|------|------|------|
+| 向量维度 | 384维 | text-embedding-3-small 输出，适合对话场景 |
+| 模型 | OpenAI `text-embedding-3-small` | 性价比高，MiniMax API 也支持输出 embedding |
+| 索引类型 | HNSW | Hierarchical NSW，召回精度 ~98%，延迟 <10ms |
+
+### 13.4 数据结构设计
+
+#### Redis Key 命名
+
+```
+# 向量数据 (Hash)
+memory:vector:{userId}:{memoryId} = {
+    "id": "memory_123",
+    "user_id": "10001",
+    "content": "用户说喜欢川菜",
+    "keywords": ["川菜", "美食"],
+    "weight": 60,
+    "vector": [0.123, -0.456, ...],  // 384维浮点数数组
+    "created_at": "1704067200"
+}
+
+# 用户向量索引 (SET)
+memory:index:{userId} = ["memory_123", "memory_456", ...]
+
+# 用户最后向量更新时间 (String)
+memory:last_vectorize:{userId} = "1704067200"
+```
+
+#### 向量写入流程
+
+```
+1. 用户对话 → 触发记忆提取
+2. 提取文本内容 → 调用 Embedding API 生成向量
+3. 生成 memoryId (UUID)
+4. HMSET memory:vector:{userId}:{memoryId} {...}
+5. SADD memory:index:{userId} {memoryId}
+6. 更新 memory:last_vectorize:{userId}
+```
+
+#### 向量召回流程
+
+```
+1. 用户输入 "我喜欢吃辣的东西"
+2. 调用 Embedding API 生成查询向量 Q
+3. 获取用户所有 memoryIds: SMEMBERS memory:index:{userId}
+4. 批量获取向量: HMGET memory:vector:{userId}:{id} vector
+5. 计算余弦相似度: cosine_similarity(Q, memory_vector)
+6. 排序返回 Top N (默认5条)
+7. 组装召回文本: content1 + "\n" + content2 + ...
+```
+
+### 13.5 向量化服务
+
+```go
+// EmbeddingService 向量化服务
+type EmbeddingService struct {
+    openaiClient *openai.Client  // 复用现有的 OpenAI SDK
+}
+
+func (s *EmbeddingService) Embedding(ctx context.Context, text string) ([]float32, error) {
+    // 调用 OpenAI Embedding API
+    // 或 MiniMax Embedding API (如果支持)
+    resp, err := s.openaiClient.Embeddings(ctx, &openai.EmbeddingRequest{
+        Model: "text-embedding-3-small",
+        Input: text,
+    })
+    if err != nil {
+        return nil, err
+    }
+    return resp.Data[0].Embedding, nil
+}
+```
+
+### 13.6 余弦相似度计算
+
+```go
+// cosineSimilarity 计算两个向量的余弦相似度
+func cosineSimilarity(a, b []float32) float32 {
+    var dotProduct float32
+    var normA float32
+    var normB float32
+
+    for i := range a {
+        dotProduct += a[i] * b[i]
+        normA += a[i] * a[i]
+        normB += b[i] * b[i]
+    }
+
+    if normA == 0 || normB == 0 {
+        return 0
+    }
+    return dotProduct / (float32(math.Sqrt(float64(normA))) * float32(math.Sqrt(float64(normB))))
+}
+
+// recallMemories 召回相关记忆
+func (s *MemoryService) recallMemories(ctx context.Context, userId string, query string) (string, error) {
+    // 1. 生成查询向量
+    queryVec, err := s.embeddingService.Embedding(ctx, query)
+    if err != nil {
+        return "", err
+    }
+
+    // 2. 获取用户所有记忆 ID
+    memoryIds, err := s.redis.SMembers(ctx, fmt.Sprintf("memory:index:%s", userId)).Result()
+    if err != nil || len(memoryIds) == 0 {
+        return "", nil
+    }
+
+    // 3. 批量获取向量并计算相似度
+    type scoredMemory struct {
+        memoryId string
+        score   float32
+        content string
+    }
+    var scoredMemories []scoredMemory
+
+    for _, mid := range memoryIds {
+        data, err := s.redis.HGetAll(ctx, fmt.Sprintf("memory:vector:%s:%s", userId, mid)).Result()
+        if err != nil || len(data) == 0 {
+            continue
+        }
+
+        // 解析向量 (存储为 JSON 字符串)
+        var vector []float32
+        if err := json.Unmarshal([]byte(data["vector"]), &vector); err != nil {
+            continue
+        }
+
+        score := cosineSimilarity(queryVec, vector)
+        if score > 0.7 { // 相似度阈值
+            scoredMemories = append(scoredMemories, scoredMemory{
+                memoryId: mid,
+                score:    score,
+                content:  data["content"],
+            })
+        }
+    }
+
+    // 4. 排序返回 Top N
+    sort.Slice(scoredMemories, func(i, j int) bool {
+        return scoredMemories[i].score > scoredMemories[j].score
+    })
+
+    if len(scoredMemories) > 5 {
+        scoredMemories = scoredMemories[:5]
+    }
+
+    // 5. 组装召回文本
+    if len(scoredMemories) == 0 {
+        return "", nil
+    }
+
+    var result = "用户之前提到过：\n"
+    for _, m := range scoredMemories {
+        result += fmt.Sprintf("- %s\n", m.content)
+    }
+    return result, nil
+}
+```
+
+### 13.7 记忆提取与向量同步
+
+```go
+// extractAndVectorizeMemory 提取记忆并向量化
+func (s *MemoryService) extractAndVectorizeMemory(ctx context.Context, userId string, dialogue []Message) error {
+    // 1. 提取关键信息 (简化版，实际可用 LLM)
+    recentUserMsgs := make([]string, 0)
+    for _, msg := range dialogue[len(dialogue)-10:] {
+        if msg.Role == "user" {
+            recentUserMsgs = append(recentUserMsgs, msg.Content)
+        }
+    }
+    if len(recentUserMsgs) == 0 {
+        return nil
+    }
+
+    // 2. 生成向量
+    combinedText := strings.Join(recentUserMsgs, "。")
+    vector, err := s.embeddingService.Embedding(ctx, combinedText)
+    if err != nil {
+        return err
+    }
+
+    // 3. 提取关键词 (简化版)
+    keywords := extractKeywords(combinedText)
+
+    // 4. 保存到 Redis
+    memoryId := uuid.New().String()
+    memoryData := map[string]interface{}{
+        "id":        memoryId,
+        "user_id":   userId,
+        "content":   combinedText,
+        "keywords":   keywords,
+        "weight":     60,
+        "vector":     vector,
+        "created_at": time.Now().Unix(),
+    }
+
+    pipe := s.redis.Pipeline()
+    pipe.HSet(ctx, fmt.Sprintf("memory:vector:%s:%s", userId, memoryId), memoryData)
+    pipe.SAdd(ctx, fmt.Sprintf("memory:index:%s", userId), memoryId)
+    _, err = pipe.Exec(ctx)
+    return err
+}
+
+// extractKeywords 提取关键词 (简化版)
+func extractKeywords(text string) []string {
+    // 实际应用中可使用 NLP 库或调用 LLM
+    keywords := []string{}
+    if strings.Contains(text, "工作") || strings.Contains(text, "上班") {
+        keywords = append(keywords, "工作状态")
+    }
+    if strings.Contains(text, "累") || strings.Contains(text, "辛苦") {
+        keywords = append(keywords, "疲劳")
+    }
+    if strings.Contains(text, "生日") || strings.Contains(text, "纪念日") {
+        keywords = append(keywords, "重要日期")
+    }
+    return keywords
+}
+```
+
+### 13.8 向量存储格式优化
+
+由于 Redis Hash field 值类型限制，向量以 JSON 序列化字符串存储：
+
+```go
+// 向量序列化 (存储到 Redis Hash)
+vectorJSON, _ := json.Marshal(vector)
+redisClient.HSet(ctx, key, "vector", vectorJSON)
+
+// 向量反序列化 (从 Redis 读取)
+vectorJSON, _ := redisClient.HGet(ctx, key, "vector")
+var vector []float32
+json.Unmarshal([]byte(vectorJSON), &vector)
+```
+
+### 13.9 内存预估
+
+| 项目 | 计算 |
+|------|------|
+| 单条向量大小 | 384维 × 4字节 = 1.5KB |
+| 1万条向量 | 15MB |
+| 100万条向量 | 1.5GB |
+| 用户平均记忆数 | 50条/人 |
+| 10万用户 | 7.5GB |
+
+**注意**：Redis VSS 需要足够的内存，建议监控内存使用。
+
+### 13.10 兼容性设计
+
+为保证 V1.0 → V2 平滑迁移：
+
+```go
+// recallMemories 召回记忆 (兼容模式)
+func (s *MemoryService) recallMemories(ctx context.Context, userId string, query string) (string, error) {
+    // 1. 优先使用向量召回
+    result, err := s.recallByVector(ctx, userId, query)
+    if err == nil && result != "" {
+        return result, nil
+    }
+
+    // 2. 降级到关键词召回 (V1.0 方案)
+    s.logger.Warn("Vector recall failed, fallback to keyword", zap.Error(err))
+    return s.recallByKeyword(ctx, userId, query)
+}
+```
+
+### 13.11 Redis VSS 配置要求
+
+```bash
+# Redis 7.4+ 自带 VSS 支持
+# 检查 Redis 版本
+redis-server --version
+
+# redis.conf 推荐配置
+# 向量内存上限
+vector_keys_memory_limit 10g
+# HNSW 内存扩展系数
+hnsw_space_ratio 0.5
+```
+
+### 13.12 升级实施计划
+
+| 阶段 | 内容 | 工作量 |
+|------|------|--------|
+| 1 | 升级 Redis 到 7.4+ (如果需要) | 0.5 天 |
+| 2 | 实现 EmbeddingService | 0.5 天 |
+| 3 | 实现向量召回 recallByVector | 1 天 |
+| 4 | 迁移脚本：历史数据向量化 | 0.5 天 |
+| 5 | 兼容性降级逻辑 | 0.5 天 |
+| 6 | 压测验证召回效果 | 0.5 天 |
+| **合计** | | **3.5 天** |
+
+### 13.13 V1.0 → V2 数据迁移策略
+
+#### 迁移原则
+- V2 上线后，V1.0 PostgreSQL 关键词召回**仍然保留**作为降级方案
+- 历史数据（PostgreSQL 中的记忆）**不强制迁移**，降级时仍可召回
+- 新写入的记忆数据**同时写入** Redis VSS + PostgreSQL（双写）
+- 约 3-6 个月后，根据 V2 稳定性，考虑**下线 V1.0 关键词召回**
+
+#### 迁移步骤
+```
+1. V2 功能开发完成，兼容性降级逻辑已实现
+2. 生产环境 V2 灰度发布（10%流量）
+3. 观察 1 周，无异常则全量切换
+4. V1.0 关键词召回作为永久降级方案保留
+5. 历史 PostgreSQL 数据：只读，不删除
+```
+
+#### 数据生命周期
+| 存储 | V1.0 | V2 |
+|------|-------|-----|
+| Redis 短期上下文 | ✅ | ✅ |
+| Redis VSS 向量 | ❌ | ✅ (新写入) |
+| PostgreSQL 关键词召回 | ✅ (永久保留) | ✅ (降级用) |
+
+---
+
+## 十四、主动推送详细设计 (V2)
+
+### 14.1 概述
+
+主动推送指 AI 在特定条件下主动发消息给用户，打破"用户发消息 → AI 回复"的被动模式。
+
+### 14.2 推送场景
+
+| 场景 | 触发条件 | 推送内容示例 |
+|------|----------|--------------|
+| 早安问候 | 每天 9:00-10:00，用户在线 | "早安呀～昨晚睡得好吗？" |
+| 断联召回 | 用户 3 天未对话 | "好久不见，想你了～最近怎么样？" |
+| 记忆提醒 | 用户记忆中有重要日期临近 | "明天是你说的那个重要的日子哦，准备好了吗？" |
+| 情绪关怀 | 用户之前说心情不好，2天后 | "上次你说工作很累，最近好点了吗？" |
+| 晚安问候 | 每天 21:00-22:00，用户在线 | "晚安～今天辛苦了，好好休息哦" |
+
+### 14.3 技术架构
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         定时任务调度层                                        │
+│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐            │
+│  │ 早安触发器 9:00  │  │ 晚安触发器 21:00 │  │ 断联检测 每日  │            │
+│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘            │
+└───────────┼───────────────────┼───────────────────┼─────────────────────────┘
+            │                   │                   │
+            ▼                   ▼                   ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         推送决策服务                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐   │
+│  │                     PushDecisionService                               │   │
+│  │  1. 查询当日是否已推送（去重）                                         │   │
+│  │  2. 检查用户推送偏好（是否开启）                                        │   │
+│  │  3. 检查用户当前在线状态                                                │   │
+│  │  4. 生成推送任务                                                       │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────┬───────────────────────────────────────────┘
+                                  │
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         消息队列 (Redis Stream)                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐   │
+│  │                    push_tasks  stream                                 │   │
+│  │  {userId, pushType, priority, generateAt}                           │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────┬───────────────────────────────────────────┘
+                                  │
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         推送消费者                                           │
+│  ┌─────────────────────────────────────────────────────────────────────┐   │
+│  │                     PushWorker                                        │   │
+│  │  1. 读取任务队列                                                     │   │
+│  │  2. 组装 Prompt（包含用户记忆、当前场景）                              │   │
+│  │  3. 调用 LLM 生成推送内容                                             │   │
+│  │  4. 后置审核                                                         │   │
+│  │  5. 写入用户消息表（type=push）                                       │   │
+│  │  6. 推送至 Mobile（WebSocket / 极光推送）                             │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### 14.4 数据库表
+
+```sql
+-- 推送任务记录表
+CREATE TABLE push_records (
+    id BIGSERIAL PRIMARY KEY,
+    user_id BIGINT NOT NULL,
+    push_type VARCHAR(32) NOT NULL,     -- 'morning', 'night', 'recall', 'reminder', 'care'
+    content TEXT,                       -- 推送内容（生成后填充）
+    status VARCHAR(16) NOT NULL,         -- 'pending', 'generated', 'sent', 'failed'
+    scheduled_at TIMESTAMP NOT NULL,     -- 计划推送时间
+    sent_at TIMESTAMP,                   -- 实际推送时间
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+
+    UNIQUE(user_id, push_type, scheduled_at)  -- 去重：同类型同一天只推送一次
+);
+
+CREATE INDEX idx_push_records_user ON push_records(user_id, scheduled_at);
+CREATE INDEX idx_push_records_pending ON push_records(status) WHERE status = 'pending';
+CREATE INDEX idx_push_records_failed ON push_records(status) WHERE status = 'failed';
+```
+
+### 14.5 Redis Stream 配置
+
+```go
+// Stream Key
+const PushTaskStream = "push:tasks"
+
+// 消费者组
+const PushConsumerGroup = "push-workers"
+
+// 任务结构
+type PushTask struct {
+    UserID    int64  `json:"user_id"`
+    PushType  string `json:"push_type"`  // morning/night/recall/reminder/care
+    Priority  int    `json:"priority"`   // 1-5，1最高
+    UserMsgID string `json:"user_msg_id"` // 生成后关联的消息ID
+}
+```
+
+### 14.6 定时任务配置
+
+```go
+// 定时任务注册 (使用 robfig/cron)
+crontab := cron.New()
+crontab.AddFunc("0 9 * * *", triggerMorningPush)   // 每天 9:00
+crontab.AddFunc("0 21 * * *", triggerNightPush)   // 每天 21:00
+crontab.AddFunc("0 0 * * *", triggerRecallCheck)   // 每天 0:00 检查断联
+crontab.Start()
+```
+
+### 14.7 Prompt 模板
+
+```go
+// 早安推送 Prompt
+const morningPromptTemplate = `你是【%s】，一个温柔体贴的AI伴侣。
+
+当前场景：早上 %s，用户刚起床或者正在开始新的一天。
+
+# 用户信息
+%s
+
+# 核心记忆
+%s
+
+请生成一条温馨的早安问候，1-2句话，口语化，像朋友聊天一样。不要太长，符合你的人设。`
+
+// 断联召回 Prompt
+const recallPromptTemplate = `你是【%s】，一个关心用户的AI伴侣。
+
+当前场景：用户已经 %d 天没有和你聊天了，你有点想他了。
+
+# 用户信息
+%s
+
+# 核心记忆
+%s
+
+请生成一条温馨的召回消息，表达想念但不过分打扰，1-2句话，口语化。`
+```
+
+### 14.8 用户偏好设置
+
+```go
+// 用户推送偏好（存储在 Redis）
+// Key: push:pref:{userId}
+type PushPreference struct {
+    Enabled     bool     `json:"enabled"`      // 是否开启推送
+    MorningEnabled bool   `json:"morning_enabled"` // 早安推送
+    NightEnabled   bool  `json:"night_enabled"`   // 晚安推送
+    RecallEnabled  bool  `json:"recall_enabled"`  // 断联召回
+    CareEnabled    bool  `json:"care_enabled"`    // 情绪关怀
+    QuietHoursStart int  `json:"quiet_hours_start"` // 免打扰开始时间（小时）
+    QuietHoursEnd   int  `json:"quiet_hours_end"`   // 免打扰结束时间（小时）
+}
+
+// 默认偏好
+DefaultPushPreference = &PushPreference{
+    Enabled:       true,
+    MorningEnabled: true,
+    NightEnabled:   true,
+    RecallEnabled:  true,
+    CareEnabled:    true,
+    QuietHoursStart: 22,
+    QuietHoursEnd:   8,
+}
+```
+
+### 14.9 API 扩展
+
+```go
+// ========== 推送偏好管理 ==========
+
+// GetPushPreference 获取推送偏好
+// GET /api/v1/ai-chat/push/preference
+type GetPushPreferenceResponse struct {
+    Preference *PushPreference `json:"preference"`
+}
+
+// UpdatePushPreference 更新推送偏好
+// PUT /api/v1/ai-chat/push/preference
+type UpdatePushPreferenceRequest struct {
+    Preference *PushPreference `json:"preference"`
+}
+```
+
+### 14.10 推送频率控制
+
+| 场景 | 频率上限 | 说明 |
+|------|----------|------|
+| 早安 | 每天1次 | 9:00-10:00 之间 |
+| 晚安 | 每天1次 | 21:00-22:00 之间 |
+| 断联召回 | 每3天1次 | 用户超过3天未对话 |
+| 记忆提醒 | 事件前1天 | 不重复 |
+| 情绪关怀 | 每7天1次 | 针对之前情绪不好的用户 |
+
+### 14.11 待确认问题
+
+1. **推送通道**：极光推送 / 自建 WebSocket / MQTT？
+2. **推送时间窗口**：早安 9:00-10:00 是固定还是随机？
+3. **优先级策略**：同一用户多个推送同时触发时，先发哪个？
+
+---
+
+## 十五、监控告警详细设计 (V1.1)
+
+### 15.1 概述
+
+监控告警体系包含三个核心部分：日志体系、Metrics 指标、告警规则。目标是早发现问题、快速定位、稳定运行。
+
+### 15.2 日志体系
+
+#### 分级日志
+
+| 级别 | 使用场景 | 示例 |
+|------|----------|------|
+| DEBUG | 开发调试 | "收到消息: xxx" |
+| INFO | 正常业务流程 | "用户 xxx 发送消息，耗时 200ms" |
+| WARN | 异常但可处理 | "LLM 调用超时，切换备用模型" |
+| ERROR | 错误需关注 | "Redis 连接失败" |
+
+#### 日志格式 (JSON)
+
+```json
+{
+  "time": "2024-01-01T10:00:00.000Z",
+  "level": "INFO",
+  "service": "ai-chat-service",
+  "trace_id": "abc123",
+  "user_id": 10001,
+  "session_id": "10001_123",
+  "action": "chat.send",
+  "duration_ms": 1234,
+  "message": "消息发送成功",
+  "error": null
+}
+```
+
+#### 关键日志点
+
+| 操作 | 日志级别 | 必含字段 |
+|------|----------|----------|
+| 收到用户消息 | INFO | userId, sessionId, messageLength |
+| LLM 调用开始 | DEBUG | model, promptLength |
+| LLM 调用成功 | INFO | duration, responseLength, firstTokenMs |
+| LLM 调用失败 | ERROR | error, model, fallbackUsed |
+| 前置审核拦截 | WARN | reason |
+| 后置审核拦截 | WARN | blockedContent |
+| Redis 错误 | ERROR | operation, error |
+| PostgreSQL 错误 | ERROR | operation, error |
+
+#### 日志采集
+
+```yaml
+# Filebeat 配置
+filebeat.inputs:
+  - type: log
+    paths:
+      - /var/log/ai-chat-service/*.log
+    json.keys_under_root: true
+    fields:
+      service: ai-chat-service
+
+output.elasticsearch:
+  hosts: ["elasticsearch:9200"]
+```
+
+### 15.3 Metrics 指标
+
+#### 业务指标
+
+| 指标名 | 类型 | 标签 | 说明 |
+|--------|------|------|------|
+| ai_chat_requests_total | Counter | status, personaId | 对话请求总数 |
+| ai_chat_duration_seconds | Histogram | model | 对话耗时分布 |
+| ai_chat_first_token_ms | Histogram | model | 首包延迟 |
+| ai_chat_messages_total | Counter | role | 消息数量（user/assistant） |
+| ai_audit_blocked_total | Counter | type | 审核拦截次数 |
+
+#### LLM 指标
+
+| 指标名 | 类型 | 标签 | 说明 |
+|--------|------|------|------|
+| llm_requests_total | Counter | model, status | LLM 请求总数 |
+| llm_duration_seconds | Histogram | model | LLM 调用耗时 |
+| llm_tokens_total | Counter | model, type | Token 消耗量 |
+| llm_fallback_total | Counter | fromModel, toModel | 模型降级次数 |
+| llm_errors_total | Counter | model, errorType | LLM 错误次数 |
+
+#### 基础设施指标
+
+| 指标名 | 类型 | 说明 |
+|--------|------|------|
+| redis_latency_ms | Histogram | Redis 操作延迟 |
+| redis_errors_total | Counter | Redis 错误次数 |
+| postgres_latency_ms | Histogram | PostgreSQL 操作延迟 |
+| postgres_errors_total | Counter | PostgreSQL 错误次数 |
+| grpc_connections | Gauge | gRPC 连接数 |
+
+#### 推送指标 (V2)
+
+| 指标名 | 类型 | 说明 |
+|--------|------|------|
+| push_tasks_total | Counter | 推送任务总数 |
+| push_sent_total | Counter | 推送成功次数 |
+| push_failed_total | Counter | 推送失败次数 |
+| push_queue_depth | Gauge | 队列积压深度 |
+
+#### Prometheus 埋点示例
+
+```go
+// 使用 prometheus/client_golang
+var (
+    chatRequestsTotal = prometheus.NewCounterVec(
+        prometheus.CounterOpts{
+            Name: "ai_chat_requests_total",
+            Help: "Total number of chat requests",
+        },
+        []string{"status", "persona_id"},
+    )
+
+    chatDuration = prometheus.NewHistogramVec(
+        prometheus.HistogramOpts{
+            Name:    "ai_chat_duration_seconds",
+            Help:    "Chat request duration distribution",
+            Buckets: []float64{0.1, 0.5, 1, 2, 5, 10},
+        },
+        []string{"model"},
+    )
+
+    firstTokenMs = prometheus.NewHistogramVec(
+        prometheus.HistogramOpts{
+            Name:    "ai_chat_first_token_ms",
+            Help:    "First token latency in milliseconds",
+            Buckets: []float64{100, 300, 500, 1000, 2000, 3000},
+        },
+        []string{"model"},
+    )
+)
+
+func init() {
+    prometheus.MustRegister(chatRequestsTotal, chatDuration, firstTokenMs)
+}
+```
+
+### 15.4 告警规则
+
+#### 告警分级
+
+| 级别 | 响应时间 | 定义 |
+|------|----------|------|
+| P0 紧急 | 5分钟内 | 服务不可用 |
+| P1 严重 | 15分钟内 | 核心功能受损 |
+| P2 警告 | 1小时内 | 性能下降/偶发错误 |
+| P3 提醒 | 工作时间 | 需关注但不影响 |
+
+#### P0 紧急告警
+
+| 告警名 | 条件 | 处理方式 |
+|--------|------|----------|
+| 服务宕机 | health check 连续 3 次失败 | 自动重启 + 值班通知 |
+| 所有 LLM 不可用 | llm_errors_total 在 5min 内 > 50 | 切换主备 + 通知 |
+| 数据库连接断开 | postgres_errors_total 在 1min 内 > 10 | 重连 + 通知 |
+
+#### P1 严重告警
+
+| 告警名 | 条件 | 处理方式 |
+|--------|------|----------|
+| LLM 延迟过高 | llm_duration_seconds P99 > 10s | 检查网络/模型状态 |
+| 审核拦截率异常 | ai_audit_blocked_total 5min 内 > 100 | 检查是否攻击 |
+| Redis 延迟过高 | redis_latency_ms P99 > 100ms | 检查 Redis 状态 |
+
+#### P2 警告告警
+
+| 告警名 | 条件 | 处理方式 |
+|--------|------|----------|
+| 模型降级频繁 | llm_fallback_total 5min 内 > 5 | 关注主模型状态 |
+| 首包延迟升高 | ai_chat_first_token_ms P95 > 2s | 持续观察 |
+| 错误率升高 | ai_chat_requests_total{status="error"} 5min 内 > 1% | 排查日志 |
+
+#### P3 提醒
+
+| 告警名 | 条件 | 处理方式 |
+|--------|------|----------|
+| Token 消耗异常 | llm_tokens_total 1h 内波动 > 50% | 排查是否异常 |
+| 推送队列积压 | push_queue_depth > 1000 | 扩容消费者 |
+
+#### AlertManager 配置示例
+
+```yaml
+# alertmanager.yml
+groups:
+  - name: ai-chat-alerts
+    rules:
+      # P0: 服务宕机
+      - alert: AIServiceDown
+        expr: up{job="ai-chat-service"} == 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "AI Chat Service is down"
+
+      # P1: LLM 延迟过高
+      - alert: LLMHighLatency
+        expr: histogram_quantile(0.99, rate(llm_duration_seconds_bucket[5m])) > 10
+        for: 5m
+        labels:
+          severity: major
+        annotations:
+          summary: "LLM latency is too high"
+
+      # P2: 模型降级频繁
+      - alert: LLMFallbackFrequent
+        expr: rate(llm_fallback_total[5m]) > 0.02
+        for: 5m
+        labels:
+          severity: warning
+```
+
+### 15.5 监控大盘
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         AI Chat Service 监控大盘                            │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐            │
+│  │   请求量/分钟    │  │   平均延迟      │  │   错误率        │            │
+│  │     1,234       │  │    1.2s        │  │    0.5%         │            │
+│  │   ▲ 12%         │  │   ▼ 5%          │  │   ▲ 0.1%        │            │
+│  └─────────────────┘  └─────────────────┘  └─────────────────┘            │
+│                                                                              │
+│  ┌───────────────────────────────────────────────────────────────┐          │
+│  │                    LLM 调用耗时分布 (P50/P95/P99)              │          │
+│  │  ████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  │          │
+│  │  500ms          1s              2s              5s          │          │
+│  └───────────────────────────────────────────────────────────────┘          │
+│                                                                              │
+│  ┌───────────────────────────────────────────────────────────────┐          │
+│  │                    请求来源分布                                  │          │
+│  │  persona_weifen: 60%  │  persona_cpfan: 30%  │  other: 10%  │          │
+│  └───────────────────────────────────────────────────────────────┘          │
+│                                                                              │
+│  ┌───────────────────────────────────────────────────────────────┐          │
+│  │                    审核拦截统计                                 │          │
+│  │  今日拦截: 23  │  本周: 156  │  拦截率: 0.8%                  │          │
+│  └───────────────────────────────────────────────────────────────┘          │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### 15.6 技术选型
+
+| 组件 | 推荐方案 | 说明 |
+|------|----------|------|
+| 日志采集 | Filebeat → Elasticsearch | 已有 ELK 栈可复用 |
+| 日志存储 | Elasticsearch | 保留 30 天 |
+| Metrics | Prometheus | Go 生态成熟 |
+| 告警 | AlertManager + Grafana | 与现有监控集成 |
+| Trace | Jaeger (可选) | 全链路追踪，后期按需引入 |
+
+### 15.7 实施优先级
+
+| 阶段 | 内容 | 工作量 |
+|------|------|--------|
+| 1 | 日志结构化 + 关键日志点埋点 | 0.5 天 |
+| 2 | Prometheus Metrics 埋点 | 0.5 天 |
+| 3 | Grafana 大盘配置 | 0.5 天 |
+| 4 | AlertManager 告警规则 | 0.5 天 |
+| **合计** | | **2 天** |
+
+---
+
+## 十六、待后续完善
+
+| 功能 | 优先级 | 说明 |
+|------|--------|------|
+| 向量记忆召回 | V2 | **已选定方案: Redis VSS**，HNSW索引，存储在Redis，与现有Redis复用，详见十三章 |
+| 主动推送 | V2 | AI 主动发起对话，定时任务+消息队列，详见十四章 |
+| 语音交互 | V3 | 语音输入输出 |
+| 多模态 | V3 | 图片理解 |
+| 监控告警 | V1.1 | 日志体系、Prometheus Metrics、AlertManager 告警，详见十五章 |