docs: self-audit fixes for K8s migration spec
Fixed 7 issues found during self-audit: 1. §4.2.4: stale reference to '$4.2.5 旧内容' (that section is now Secrets, not healthcheck). Pointed to §10.4 instead. 2. §6 Step 3: wrong cross-reference §10.2 (which is Secret strategy). Should be §10.1 (image registry). 3. §5 directory tree: hpa was duplicated in both gateway/ and hpa/. Unified to single hpa/ directory. 4. §5 directory tree: .gitignore was placed under k8s/. Moved indication to repo root with clearer comment. 5. §5 principles: added '关注点分离' (separation of concerns) for HPA/Ingress/Secret dirs. 6. §6 Step 4: title was '灰度切换' (gradual cutover) but content said 'single namespace, all traffic switched at once'. Renamed to '流量切换'. 7. §6 Step 4 + Step 5: sequence sync step was duplicated with unclear timing. Consolidated into Step 4 as a hard blocker. Step 5 now just stops VM. 8. §4.4 (data layer multi-tenancy): duplicated §11.4.4. Deleted §4.4; kept pointer in §4.3. 9. §3 comparison table: '后续按组隔离成本' for B was undersold as 'low (helm values)'. Corrected to '中' with reference to §11.5 (~2-3 months). Advisory items left as-is (not blocking): §1.1 '腰部明星' line, §11.4.3 '方式 2' detail.
This commit is contained in:
parent
4db796f407
commit
12d484e215
@ -174,7 +174,7 @@ Ingress → topfans/gateway:8080
|
||||
| 应用代码改动 | 0 | 0 | 少量(可能要改 Dubbo URL) |
|
||||
| 故障自愈 | ❌ | ✅ | ✅ |
|
||||
| 自动扩缩容 | ❌ | ✅ | ✅ |
|
||||
| 后续按组隔离成本 | — | 低(helm values 调整) | 已经是 |
|
||||
| 后续按组隔离成本 | — | 中(需 helm chart 拆 + 应用层 group_id 改造,详见 §11.5 ~2-3 个月) | 已经是 |
|
||||
| 复杂度 | 最低 | 适中 | 高 |
|
||||
| 与第一阶段目标匹配 | ❌ | ✅✅✅ | ✅(但过度) |
|
||||
| **推荐度** | ❌ | ✅✅✅ 强烈推荐 | ❌ |
|
||||
@ -295,7 +295,7 @@ spec:
|
||||
|
||||
**改用 K8s `livenessProbe` / `readinessProbe` 显式配置**,写在 Helm chart 里的 Deployment 中,而不是依赖 Dockerfile 的 HEALTHCHECK。
|
||||
|
||||
> 顺带说一下:原 `docker/Dockerfile.services` 有 HEALTHCHECK 端口错配 bug(参见 §4.2.5 旧内容,galleryservice 等用 21001 实际监听 20001)。K8s 不依赖 Dockerfile HEALTHCHECK,所以**这次不修这个 bug**,保持向后兼容,留待单独 issue 处理。
|
||||
> 顺带说一下:原 `docker/Dockerfile.services` 有 HEALTHCHECK 端口错配 bug(galleryservice 等用 21001 实际监听 20001),K8s 不依赖 Dockerfile HEALTHCHECK,所以**Phase 1 不修这个 bug**,保持向后兼容,留待单独 issue 处理。详细说明见 §10.4。
|
||||
|
||||
#### 4.2.5 Secrets 管理
|
||||
|
||||
@ -370,19 +370,7 @@ spec:
|
||||
- **values.yaml 结构**: 用嵌套结构(`gateway.xxx`, `userservice.xxx`),后续可拆 chart
|
||||
- **ConfigMap / Secret 命名**: 不用硬编码 group 名(如 `gateway-config` 而非 `group-a-gateway-config`)
|
||||
- **服务发现**: 第一阶段用 K8s 短 DNS;第二阶段如需跨 ns,用全限定名(`userservice.topfans-group-a.svc.cluster.local`)
|
||||
- **数据层多租户**: 已在本文档 §11 记录设计草图,第二阶段另起文档详细设计
|
||||
|
||||
### 4.4 数据层多租户设计 (应用层改动,第二阶段)
|
||||
|
||||
第一阶段 K8s 迁移不涉及。多组数据隔离需要在应用层做配套改动,简要列出:
|
||||
|
||||
| 表 | 加 group_id 字段 | 中间件透传 group_id |
|
||||
|---|---|---|
|
||||
| users | ✅ | JWT 携带 group_id |
|
||||
| galleries / assets / stars | ✅ | Dubbo attachment 透传 |
|
||||
| 评论/点赞/收藏 | ✅ | 同上 |
|
||||
|
||||
**第一阶段不实现,第二阶段另起专门的设计文档**。
|
||||
- **数据层多租户**: 已在本文档 §11.4.4 记录设计草图,第二阶段另起文档详细设计
|
||||
|
||||
---
|
||||
|
||||
@ -401,7 +389,7 @@ k8s/ (新目录,根目录同级)
|
||||
│ ├── external-db/
|
||||
│ │ ├── postgres-external.yaml ExternalName → RDS
|
||||
│ │ └── redis-external.yaml ExternalName → ElastiCache
|
||||
│ ├── gateway/ deployment + service + ingress + hpa + configmap
|
||||
│ ├── gateway/ deployment + service + configmap
|
||||
│ ├── userservice/ deployment + service + configmap
|
||||
│ ├── assetservice/ 同上
|
||||
│ ├── galleryservice/ 同上
|
||||
@ -412,10 +400,11 @@ k8s/ (新目录,根目录同级)
|
||||
│ ├── aichatservice/ 同上
|
||||
│ ├── lasercompositor/ 同上
|
||||
│ ├── hpa/ 各类 HorizontalPodAutoscaler
|
||||
│ ├── ingress.yaml 集群级 Ingress → gateway:8080
|
||||
│ ├── secrets/ 各类 Secret (DB/OSS/JWT/AI keys)
|
||||
│ └── future-services/ (admin/review/ai-*) .gitkeep 占位
|
||||
│
|
||||
└── .gitignore 忽略 values-prod.yaml 等含真值文件
|
||||
└── (仓库根目录 .gitignore) 忽略 values-prod.yaml 等含真值文件
|
||||
```
|
||||
|
||||
**原则**:
|
||||
@ -423,8 +412,10 @@ k8s/ (新目录,根目录同级)
|
||||
- `docker/` 目录保留,继续支撑本地开发 (`docker-compose.local.yml`)
|
||||
- `k8s/` 是新增的部署维度,与 `docker/` 并存
|
||||
- **单个 Helm chart**(`topfans/`)覆盖整个 Phase 1,第二阶段再考虑拆 chart
|
||||
- HPA / Ingress / Secret 单独目录管理,各 service 子目录只放 deployment + service + configmap(关注点分离)
|
||||
- 未来四个新服务**不在本次实现**,只留 `.gitkeep` 占位
|
||||
- 镜像构建继续走 `docker/Dockerfile.services` (多阶段),不重复造轮子
|
||||
- `values-prod.yaml` 之类的真值文件不进 git,`values-prod.example.yaml` 入 git(配合仓库根目录 `.gitignore`)
|
||||
|
||||
---
|
||||
|
||||
@ -454,13 +445,13 @@ k8s/ (新目录,根目录同级)
|
||||
|
||||
- [ ] 选 1: 阿里云 ACR 仓库(推荐)— 写 `.github/workflows/` 或 `gitlab-ci.yml` 构建并推送
|
||||
- [ ] 选 2: 保留 `deploy.sh`,改目标为推 ACR 而非 SSH 到服务器
|
||||
- [ ] 选其一,写明 deploy.sh 的去留(参见 §10.2)
|
||||
- [ ] 选其一,写明 deploy.sh 的去留(参见 §10.1)
|
||||
|
||||
### Step 4: 灰度切换 (单 namespace,所有流量一次性切)
|
||||
### Step 4: 流量切换 (单 namespace,所有流量一次性切)
|
||||
|
||||
- [ ] 把现有 `.env.prod` 的所有密钥搬到 K8s Secret(真值不入 git,CI 注入)
|
||||
- [ ] 准备 DNS 切换预案 (`api.example.com` 先解析到 K8s,旧 VM 保留回滚)
|
||||
- [ ] **首次部署到 K8s 后,验证序列同步** — 按 CLAUDE.md 规范,`setval('xxx_id_seq', ...)` 必须在流量切换**前**完成
|
||||
- [ ] **PostgreSQL 序列同步 (硬性 blocker)** — 按 CLAUDE.md 规范,切换流量**前**以 K8s Job 形式跑 `setval('xxx_id_seq', (SELECT MAX(id) FROM xxx))` 对所有 BIGSERIAL 表,验证 `pg_sequences` 全部 `is_healthy=true`(详见 §10.3)
|
||||
- [ ] 切换 DNS → K8s
|
||||
- [ ] 观察 1~2 周
|
||||
|
||||
@ -468,7 +459,6 @@ k8s/ (新目录,根目录同级)
|
||||
|
||||
- [ ] 验证 K8s 部署稳定后,停 VM 上的 `docker-compose`
|
||||
- [ ] 释放 VM 资源
|
||||
- [ ] 第一次: 取消注释 `init-db.sql` 中所有 `setval` 同步(从手工迁移数据开始时,见 CLAUDE.md 规范)
|
||||
|
||||
### Step 6: 后续优化 (Phase 1 内可做)
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user