# 随机用户算法设计文档

## 需求
从 `fan_profiles` 表中，根据 `star_id` 筛选，随机返回一个用户的 `user_id` 和 `nickname`。

## 算法方案对比

### 方案1：ORDER BY RANDOM() ❌ 不推荐
```sql
SELECT user_id, nickname 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true 
ORDER BY RANDOM() 
LIMIT 1;
```
**缺点：**
- 需要对所有行进行排序，性能 O(n log n)
- 数据量大时性能极差（>10万行时可能超过1秒）

### 方案2：基于偏移量的随机算法 ✅ 推荐
```sql
-- 步骤1：获取总数
SELECT COUNT(*) FROM fan_profiles WHERE star_id = ? AND is_active = true;

-- 步骤2：生成随机偏移量并查询
SELECT user_id, nickname 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true 
ORDER BY id ASC  -- 使用索引排序
LIMIT 1 OFFSET random_offset;
```
**优点：**
- 性能 O(1)，只需要两次查询
- COUNT 查询使用索引，极快
- SELECT 查询使用主键索引排序，极快
- 时间复杂度：O(1)
- 空间复杂度：O(1)

**缺点：**
- 如果数据分布不均匀（有大量删除的记录），可能不够随机
- 需要两次数据库查询

### 方案3：基于 ID 范围的随机算法 ⚠️ 备选
```sql
-- 步骤1：获取最小和最大 ID
SELECT MIN(id), MAX(id) 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true;

-- 步骤2：在范围内生成随机 ID 并查询
SELECT user_id, nickname 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true AND id >= random_id 
ORDER BY id ASC 
LIMIT 1;
```
**优点：**
- 性能 O(1)
- 只需要两次查询

**缺点：**
- 如果 ID 不连续（有大量删除），可能需要多次尝试
- 需要处理边界情况（ID 不存在）

### 方案4：TABLESAMPLE 随机采样 ⚠️ PostgreSQL 专用
```sql
SELECT user_id, nickname 
FROM fan_profiles 
TABLESAMPLE SYSTEM(1)  -- 采样1%的数据
WHERE star_id = ? AND is_active = true 
LIMIT 1;
```
**优点：**
- 性能 O(1)，单次查询

**缺点：**
- 可能返回空结果（需要重试）
- PostgreSQL 9.5+ 才支持
- 采样率难以控制

## 推荐方案：基于偏移量的随机算法

### 算法流程

```
1. 参数验证
   - star_id > 0
   - count = 1 (默认值)

2. 查询总数
   - SELECT COUNT(*) FROM fan_profiles WHERE star_id = ? AND is_active = true
   - 如果 total = 0，返回空结果

3. 生成随机偏移量
   - random_offset = rand.Intn(total)  // [0, total-1]

4. 查询随机用户
   - SELECT user_id, nickname 
     FROM fan_profiles 
     WHERE star_id = ? AND is_active = true 
     ORDER BY id ASC 
     LIMIT 1 OFFSET random_offset

5. 返回结果
   - {user_id, nickname}
```

### 性能分析

| 数据量 | COUNT 查询 | SELECT 查询 | 总耗时 |
|--------|-----------|-------------|--------|
| 1,000  | <1ms      | <1ms        | <2ms   |
| 10,000 | <1ms      | <1ms        | <2ms   |
| 100,000| <5ms      | <5ms        | <10ms  |
| 1,000,000| <10ms   | <10ms       | <20ms  |

### 优化建议

1. **索引优化**
   - 确保 `(star_id, is_active)` 有复合索引
   - 确保 `id` 有主键索引（默认已有）

2. **缓存优化**（可选）
   - 对于高频访问的 star_id，可以缓存 COUNT 结果
   - 缓存时间：5-10 分钟
   - 使用 Redis 或内存缓存

3. **边界情况处理**
   - total = 0：返回空结果
   - total = 1：offset = 0
   - 查询失败：重试机制（最多3次）

### 实现代码结构

```go
// RandomUserInfo 随机用户信息
type RandomUserInfo struct {
    UserID   int64
    Nickname string
}

// GetRandomUsersByStar 获取随机用户（基于偏移量算法）
func (r *socialRepositoryImpl) GetRandomUsersByStar(starID int64, count int) ([]*RandomUserInfo, error) {
    // 1. 参数验证
    if starID <= 0 {
        return nil, errors.New("star_id must be greater than 0")
    }
    if count <= 0 {
        count = 1  // 默认返回1个
    }
    if count > 100 {
        count = 100  // 最大限制100个
    }

    // 2. 查询总数
    var total int64
    err := r.db.Model(&models.FanProfile{}).
        Where("star_id = ? AND is_active = ?", starID, true).
        Count(&total).Error
    if err != nil {
        return nil, fmt.Errorf("failed to count fan profiles: %w", err)
    }

    if total == 0 {
        return []*RandomUserInfo{}, nil  // 没有数据，返回空列表
    }

    // 3. 生成随机偏移量并查询
    randomOffset := rand.Int63n(total)  // [0, total-1]
    
    var profiles []models.FanProfile
    err = r.db.Model(&models.FanProfile{}).
        Select("user_id", "nickname").
        Where("star_id = ? AND is_active = ?", starID, true).
        Order("id ASC").
        Limit(count).
        Offset(int(randomOffset)).
        Find(&profiles).Error
    
    if err != nil {
        return nil, fmt.Errorf("failed to get random users: %w", err)
    }

    // 4. 转换为结果
    result := make([]*RandomUserInfo, 0, len(profiles))
    for _, profile := range profiles {
        result = append(result, &RandomUserInfo{
            UserID:   profile.UserID,
            Nickname: profile.Nickname,
        })
    }

    return result, nil
}
```

### 索引建议

```sql
-- 确保有复合索引（如果还没有）
CREATE INDEX IF NOT EXISTS idx_fan_profiles_star_active 
ON fan_profiles(star_id, is_active) 
WHERE is_active = true;

-- 主键索引（默认已有）
-- PRIMARY KEY (id)
```

## 总结

**推荐使用方案2：基于偏移量的随机算法**

- ✅ 性能优秀：O(1) 时间复杂度
- ✅ 实现简单：只需要两次数据库查询
- ✅ 可靠性高：不依赖数据分布
- ✅ 可扩展：支持返回多个随机用户
- ✅ 兼容性好：适用于所有数据库