xiaoyu/topfans

Fork 0

zerosaturation afb235afd3 feat: 合并代码，解决冲突

2026-05-16 02:42:32 +08:00

5.6 KiB

Raw Blame History

随机用户算法设计文档

需求

从 fan_profiles 表中，根据 star_id 筛选，随机返回一个用户的 user_id 和 nickname。

算法方案对比

方案1：ORDER BY RANDOM() ❌ 不推荐

SELECT user_id, nickname 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true 
ORDER BY RANDOM() 
LIMIT 1;

缺点：

需要对所有行进行排序，性能 O(n log n)
数据量大时性能极差（>10万行时可能超过1秒）

方案2：基于偏移量的随机算法 ✅ 推荐

-- 步骤1：获取总数
SELECT COUNT(*) FROM fan_profiles WHERE star_id = ? AND is_active = true;

-- 步骤2：生成随机偏移量并查询
SELECT user_id, nickname 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true 
ORDER BY id ASC  -- 使用索引排序
LIMIT 1 OFFSET random_offset;

优点：

性能 O(1)，只需要两次查询
COUNT 查询使用索引，极快
SELECT 查询使用主键索引排序，极快
时间复杂度：O(1)
空间复杂度：O(1)

缺点：

如果数据分布不均匀（有大量删除的记录），可能不够随机
需要两次数据库查询

方案3：基于 ID 范围的随机算法 ⚠️ 备选

-- 步骤1：获取最小和最大 ID
SELECT MIN(id), MAX(id) 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true;

-- 步骤2：在范围内生成随机 ID 并查询
SELECT user_id, nickname 
FROM fan_profiles 
WHERE star_id = ? AND is_active = true AND id >= random_id 
ORDER BY id ASC 
LIMIT 1;

优点：

性能 O(1)
只需要两次查询

缺点：

如果 ID 不连续（有大量删除），可能需要多次尝试
需要处理边界情况（ID 不存在）

方案4：TABLESAMPLE 随机采样 ⚠️ PostgreSQL 专用

SELECT user_id, nickname 
FROM fan_profiles 
TABLESAMPLE SYSTEM(1)  -- 采样1%的数据
WHERE star_id = ? AND is_active = true 
LIMIT 1;

优点：

性能 O(1)，单次查询

缺点：

可能返回空结果（需要重试）
PostgreSQL 9.5+ 才支持
采样率难以控制

推荐方案：基于偏移量的随机算法

算法流程

1. 参数验证
   - star_id > 0
   - count = 1 (默认值)

2. 查询总数
   - SELECT COUNT(*) FROM fan_profiles WHERE star_id = ? AND is_active = true
   - 如果 total = 0，返回空结果

3. 生成随机偏移量
   - random_offset = rand.Intn(total)  // [0, total-1]

4. 查询随机用户
   - SELECT user_id, nickname 
     FROM fan_profiles 
     WHERE star_id = ? AND is_active = true 
     ORDER BY id ASC 
     LIMIT 1 OFFSET random_offset

5. 返回结果
   - {user_id, nickname}

性能分析

数据量	COUNT 查询	SELECT 查询	总耗时
1,000	<1ms	<1ms	<2ms
10,000	<1ms	<1ms	<2ms
100,000	<5ms	<5ms	<10ms
1,000,000	<10ms	<10ms	<20ms

优化建议

索引优化
- 确保 (star_id, is_active) 有复合索引
- 确保 id 有主键索引（默认已有）
缓存优化（可选）
- 对于高频访问的 star_id，可以缓存 COUNT 结果
- 缓存时间：5-10 分钟
- 使用 Redis 或内存缓存
边界情况处理
- total = 0：返回空结果
- total = 1：offset = 0
- 查询失败：重试机制（最多3次）

实现代码结构

// RandomUserInfo 随机用户信息
type RandomUserInfo struct {
    UserID   int64
    Nickname string
}

// GetRandomUsersByStar 获取随机用户（基于偏移量算法）
func (r *socialRepositoryImpl) GetRandomUsersByStar(starID int64, count int) ([]*RandomUserInfo, error) {
    // 1. 参数验证
    if starID <= 0 {
        return nil, errors.New("star_id must be greater than 0")
    }
    if count <= 0 {
        count = 1  // 默认返回1个
    }
    if count > 100 {
        count = 100  // 最大限制100个
    }

    // 2. 查询总数
    var total int64
    err := r.db.Model(&models.FanProfile{}).
        Where("star_id = ? AND is_active = ?", starID, true).
        Count(&total).Error
    if err != nil {
        return nil, fmt.Errorf("failed to count fan profiles: %w", err)
    }

    if total == 0 {
        return []*RandomUserInfo{}, nil  // 没有数据，返回空列表
    }

    // 3. 生成随机偏移量并查询
    randomOffset := rand.Int63n(total)  // [0, total-1]
    
    var profiles []models.FanProfile
    err = r.db.Model(&models.FanProfile{}).
        Select("user_id", "nickname").
        Where("star_id = ? AND is_active = ?", starID, true).
        Order("id ASC").
        Limit(count).
        Offset(int(randomOffset)).
        Find(&profiles).Error
    
    if err != nil {
        return nil, fmt.Errorf("failed to get random users: %w", err)
    }

    // 4. 转换为结果
    result := make([]*RandomUserInfo, 0, len(profiles))
    for _, profile := range profiles {
        result = append(result, &RandomUserInfo{
            UserID:   profile.UserID,
            Nickname: profile.Nickname,
        })
    }

    return result, nil
}

索引建议

-- 确保有复合索引（如果还没有）
CREATE INDEX IF NOT EXISTS idx_fan_profiles_star_active 
ON fan_profiles(star_id, is_active) 
WHERE is_active = true;

-- 主键索引（默认已有）
-- PRIMARY KEY (id)

总结

推荐使用方案2：基于偏移量的随机算法

✅ 性能优秀：O(1) 时间复杂度
✅ 实现简单：只需要两次数据库查询
✅ 可靠性高：不依赖数据分布
✅ 可扩展：支持返回多个随机用户
✅ 兼容性好：适用于所有数据库

5.6 KiB Raw Blame History Unescape Escape

随机用户算法设计文档

需求

算法方案对比

方案1：ORDER BY RANDOM() ❌ 不推荐

方案2：基于偏移量的随机算法 ✅ 推荐

方案3：基于 ID 范围的随机算法 ⚠️ 备选

方案4：TABLESAMPLE 随机采样 ⚠️ PostgreSQL 专用

推荐方案：基于偏移量的随机算法

算法流程

性能分析

优化建议

实现代码结构

索引建议

总结

5.6 KiB

Raw Blame History