Laravel 實現電商快速搜尋與商品推薦：演算法深入解析（以蝦皮為例）

在電商平台如蝦皮（Shopee）中，快速搜尋與個性化商品推薦是核心功能，仰賴高效的演算法來處理大規模商品資料與用戶行為。這些演算法需要解決關鍵字搜尋、多維篩選、行為收集與推薦生成等問題，同時在非 AWS 環境（如本地或 VPS）中保持高性能。本文深入解析 Laravel 專案中實現快速搜尋與商品推薦的五大核心演算法場景，聚焦於演算法原理、數學推導、優化策略與實務應用。內容以蝦皮為例，涵蓋資料收集、搜尋與推薦的完整流程，提供 PHP/Laravel 程式碼範例、時間與空間複雜度分析，以及非 AWS 環境的部署建議，確保你能將這些演算法應用於電商專案。

1. 用戶行為收集：布隆過濾器與加權計分

場景描述

蝦皮透過追蹤用戶行為（如點擊、加入購物車、購買）建立偏好模型，驅動推薦與廣告。行為數據量大且需快速去重（如防止重複記錄點擊），因此需要高效的演算法。

演算法與原理

布隆過濾器（Bloom Filter）：
- 用途：快速檢查某行為是否已記錄，適合高併發去重。
- 原理：使用多個哈希函數將行為（如 user:123:click:456）映射到固定大小的位陣列。檢查時若所有位為 1，則「可能」存在；若任一位為 0，則一定不存在。
- 數學推導：假設位陣列大小為 $ m $，哈希函數數量為 $ k $，插入 $n$ 個元素，誤判率 $ P \approx (1 - e^{-kn/m})^k $。例如：$ m = 10^7 $ 位（約 1.2MB），$ k = 7 $，$ n = 10^6 $，誤判率約 0.01%。
- 優化：調整 $m$ 和 $ k $，平衡記憶體與誤判率。
加權計分：
- 用途：為行為分配權重（購買 > 加入購物車 > 點擊），計算用戶偏好分數。
- 原理：使用加權和公式 $ S = \sum w_i \cdot a_i $，其中 $w_i$ 為行為權重，$ a_i $ 為行為次數。
- 優化：用 Redis Sorted Set 儲存分數，支援即時更新與排序。

程式碼範例

以下實現布隆過濾器去重與加權計分：

PHP
use Predis\Client;

class UserBehaviorController extends Controller
{
    private $bloomFilter;
    private $redis;

    public function __construct()
    {
        $this->redis = new Client(['host' => 'redis']);
        $this->bloomFilter = new BloomFilter(10000000, 7); // 1000 萬位，7 個哈希函數
    }

    public function track(Request $request)
    {
        $userId = $request->input('user_id');
        $productId = $request->input('product_id');
        $action = $request->input('action'); // click/add_to_cart/purchase
        $signature = "$userId:$action:$productId";
        $weight = ['purchase' => 5, 'add_to_cart' => 3, 'click' => 1][$action] ?? 1;

        // 布隆過濾器檢查重複
        if ($this->bloomFilter->exists($signature)) {
            return response()->json(['status' => 'duplicate']);
        }
        $this->bloomFilter->add($signature);

        // 更新加權分數
        $this->redis->zIncrBy("user:$userId:actions", $weight, "$action:$productId");
        $this->redis->expire("user:$userId:actions", 86400); // 24 小時

        // 異步儲存到 MySQL
        Queue::push(new LogUserActionJob([
            'user_id' => $userId,
            'product_id' => $productId,
            'action' => $action,
            'weight' => $weight,
            'created_at' => now()
        ]));

        return response()->json(['status' => 'tracked']);
    }
}

// 簡單布隆過濾器實現
class BloomFilter
{
    private $bitmap;
    private $size;
    private $hashCount;

    public function __construct($size, $hashCount)
    {
        $this->size = $size;
        $this->hashCount = $hashCount;
        $this->bitmap = array_fill(0, ceil($size / 8), 0); // 模擬位陣列
    }

    public function add($item)
    {
        for ($i = 0; $i < $this->hashCount; $i++) {
            $hash = $this->hash($item, $i);
            $this->bitmap[$hash >> 3] |= (1 << ($hash % 8));
        }
    }

    public function exists($item)
    {
        for ($i = 0; $i < $this->hashCount; $i++) {
            $hash = $this->hash($item, $i);
            if (!($this->bitmap[$hash >> 3] & (1 << ($hash % 8)))) {
                return false;
            }
        }
        return true;
    }

    private function hash($item, $seed)
    {
        return abs(crc32($item . $seed)) % $this->size;
    }
}

時間複雜度：布隆過濾器： $O (k)$ ，k 為哈希函數數量（常數）。加權計分： $O (lo g N)$ ，Redis zIncrBy 操作。
空間複雜度：布隆過濾器： $O (m /8)$ 位元組，m 為位陣列大小。Redis： $O (U \cdot A)$ ，U 為用戶數，A 為行為數。
電商應用：蝦皮用類似機制去重高併發點擊，並根據購買行為加權推薦。

演算法優化

誤判率調整：增加哈希函數數量 $k$ 或位陣列大小 $ m $，降低誤判率。
Redis 持久化：啟用 AOF（Append-Only File）確保行為數據不丟失。
面試技巧：推導布隆過濾器的誤判率公式，比較與 Redis Set 的記憶體效率。

2. 快速搜尋：前綴樹與倒排索引

場景描述

蝦皮的搜尋功能需支援即時關鍵字建議（如輸入「耳」顯示「耳機」）與多維篩選（如價格、類別）。這涉及前綴樹（Trie）與倒排索引。

演算法與原理

前綴樹（Trie）：
- 用途：實現即時搜尋建議，快速匹配前綴。
- 原理：將商品名稱拆為字元，構建樹狀結構。搜尋時遍歷樹節點，獲取所有匹配前綴的詞。
- 數學推導：假設平均詞長為 $L$ ，詞數為 $N$ ，搜尋時間為 $O (L)$ ，空間為 $O (N \cdot L)$ 。優化：用**壓縮前綴樹（Compressed Trie）**減少記憶體。
倒排索引（Elasticsearch）：
- 用途：快速匹配關鍵字與篩選條件。
- 原理：將商品屬性（如名稱、類別）映射到商品 ID，搜尋時直接定位匹配 ID。
- 數學推導：查詢時間 $O (lo g N)$ ，基於 B+ 樹索引。

程式碼範例

以下實現前綴樹搜尋建議與 Elasticsearch 查詢：

PHP
// 前綴樹實現
class TrieNode
{
    public $children = [];
    public $isEnd = false;
    public $products = [];
}

class Trie
{
    private $root;

    public function __construct()
    {
        $this->root = new TrieNode();
    }

    public function insert($word, $productId)
    {
        $node = $this->root;
        foreach (str_split(strtolower($word)) as $char) {
            if (!isset($node->children[$char])) {
                $node->children[$char] = new TrieNode();
            }
            $node = $node->children[$char];
        }
        $node->isEnd = true;
        $node->products[] = $productId;
    }

    public function searchPrefix($prefix)
    {
        $node = $this->root;
        foreach (str_split(strtolower($prefix)) as $char) {
            if (!isset($node->children[$char])) {
                return [];
            }
            $node = $node->children[$char];
        }
        return $node->products;
    }
}

// Laravel 搜尋控制器
use Laravel\Scout\Searchable;

class Product extends Model
{
    use Searchable;

    public function searchableAs()
    {
        return 'products';
    }
}

class ProductSearchController extends Controller
{
    public function search(Request $request)
    {
        $redis = new Client(['host' => 'redis']);
        $keyword = $request->input('keyword', '無線耳機');
        $cacheKey = md5($request->fullUrl());

        // 檢查快取
        if ($cached = $redis->get($cacheKey)) {
            return response()->json(json_decode($cached));
        }

        // 前綴樹搜尋建議
        $trie = new Trie();
        foreach (Product::select('id', 'name')->get() as $product) {
            $trie->insert($product->name, $product->id);
        }
        $suggestions = $trie->searchPrefix($keyword);

        // Elasticsearch 查詢
        $results = Product::search($keyword)
            ->where('price', '>=', $request->input('min_price', 1000))
            ->where('price', '<=', $request->input('max_price', 2000))
            ->where('in_stock', true)
            ->orderBy('sales', 'desc')
            ->paginate(20);

        $redis->setex($cacheKey, 600, json_encode($results));
        return response()->json([
            'results' => $results,
            'suggestions' => $suggestions
        ]);
    }
}

時間複雜度：前綴樹： $O (L)$ 搜尋，L 為關鍵字長度。Elasticsearch： $O (lo g N)$ ，N 為商品數。
空間複雜度：前綴樹： $O (N \cdot L)$ ，N 為商品數，L 為平均名稱長度。Elasticsearch： $O (N \cdot M)$ ，M 為屬性數。
電商應用：蝦皮的搜尋欄提供即時建議，結合價格與庫存篩選。

演算法優化

前綴樹壓縮：合併單一子節點，降低記憶體使用。
Elasticsearch 分片：將索引分為多個分片，加速查詢（本地部署建議 2-4 分片）。
面試技巧：推導前綴樹的空間複雜度，比較 Trie 與 Elasticsearch 的適用場景。

3. 商品推薦：協同過濾與 Jaccard 相似度

場景描述

蝦皮的「你可能也喜歡」功能根據用戶行為生成個性化推薦，類似小紅書的「種草」機制。這需要協同過濾演算法與高效相似度計算。

演算法與原理

協同過濾（Collaborative Filtering）：
- 用途：根據用戶行為矩陣推薦相似用戶喜歡的商品。
- 原理：構建用戶-商品交互矩陣，計算用戶間或商品間的相似度。
- 數學推導：Jaccard 相似度：$ J(A, B) = \frac{|A \cap B|}{|A \cup B|} $，其中 A、B 為用戶行為集合。時間複雜度：計算單對相似度 $O (P)$ ，P 為平均行為數；所有用戶對 $O (U^{2} \cdot P)$ ，U 為用戶數。
優化：
- 離線計算：用 Laravel Job 預計算相似度矩陣。
- 近似最近鄰（ANN）：用 Annoy 或 HNSW 降低計算複雜度至 $O (U \cdot lo g U)$ 。

程式碼範例

以下實現基於 Jaccard 相似度的協同過濾：

PHP
use Predis\Client;

class ProductRecommendationController extends Controller
{
    public function recommend(Request $request)
    {
        $redis = new Client(['host' => 'redis']);
        $userId = $request->input('user_id');
        $k = $request->input('limit', 5);

        // 檢查快取
        $cached = $redis->zRange("recommend:user:$userId", 0, $k - 1);
        if ($cached) {
            return response()->json(Product::whereIn('id', $cached)->get());
        }

        // 獲取用戶行為
        $userActions = $redis->zRange("user:$userId:actions", 0, -1);
        $userSet = array_unique(array_map(fn($action) => explode(':', $action)[1], $userActions));
        $similarProducts = [];

        // 計算 Jaccard 相似度
        foreach ($redis->keys('user:*:actions') as $otherUserKey) {
            $otherUserId = str_replace('user:', '', str_replace(':actions', '', $otherUserKey));
            if ($otherUserId == $userId) continue;

            $otherActions = $redis->zRange($otherUserKey, 0, -1);
            $otherSet = array_unique(array_map(fn($action) => explode(':', $action)[1], $otherActions));

            $intersection = array_intersect($userSet, $otherSet);
            $union = array_unique(array_merge($userSet, $otherSet));
            $jaccard = count($intersection) / count($union);

            if ($jaccard > 0.1) { // 閾值過濾
                $similarProducts = array_merge($similarProducts, $otherSet);
            }
        }

        // 排序並選取 Top-K
        $productCounts = array_count_values($similarProducts);
        arsort($productCounts);
        $recommendedIds = array_slice(array_keys($productCounts), 0, $k);

        // 快取結果
        $redis->zAdd("recommend:user:$userId", array_combine($recommendedIds, array_values($productCounts)));
        $redis->expire("recommend:user:$userId", 3600);

        return response()->json(Product::whereIn('id', $recommendedIds)->get());
    }
}

時間複雜度： $O (U \cdot P)$ 即時計算，離線計算降至 $O (1)$ 。
空間複雜度： $O (P)$ ，儲存交集與推薦結果。
電商應用：蝦皮根據購買行為推薦相關商品。

演算法優化

離線計算：用 Laravel Scheduler 每晚計算相似度矩陣，儲存至 Redis。
ANN 加速：用 Annoy（Spotify 開源）進行近似最近鄰搜尋，降低計算複雜度。
面試技巧：推導 Jaccard 相似度公式，比較與餘弦相似度的適用性。

4. 熱門商品排行：Top-K 堆與時間衰減

場景描述

蝦皮的「熱門商品」排行榜基於點擊與購買行為，類似小紅書的「黃金 3 小時」排序，需要 Top-K 演算法與時間衰減。

演算法與原理

Top-K 堆：
- 用途：快速選出前 K 個熱門商品。
- 原理：用最小堆維護 K 個最高分商品，遍歷 N 個商品時插入/彈出操作為 $O (lo g K)$ 。
- 數學推導：總時間 $O (N lo g K)$ ，空間 $O (K)$ 。
時間衰減：
- 用途：新商品獲得更高曝光。
- 原理：分數公式 $ S = w \cdot e^{-\lambda t} $，其中 $w$ 為行為權重，$ t $ 為時間差，$ \lambda $ 為衰減常數（例如 0.1/小時）。
- 優化：用 Redis Sorted Set 實現動態排序。

程式碼範例

以下實現熱門商品排行：

PHP
use Predis\Client;

class ProductRankingController extends Controller
{
    public function updateRank(Request $request)
    {
        $redis = new Client(['host' => 'redis']);
        $productId = $request->input('product_id');
        $action = $request->input('action'); // click/purchase
        $weight = ['purchase' => 5, 'click' => 1][$action] ?? 1;

        // 時間衰減
        $product = Product::find($productId);
        $timeFactor = exp(-0.1 * (now()->timestamp - $product->created_at->timestamp) / 3600);
        $redis->zIncrBy('product_scores', $weight * $timeFactor, $productId);
    }

    public function getTopK(Request $request)
    {
        $redis = new Client(['host' => 'redis']);
        $k = $request->input('limit', 10);
        $topProducts = $redis->zRevRange('product_scores', 0, $k - 1, ['withscores' => true]);

        return response()->json(Product::whereIn('id', array_keys($topProducts))->get());
    }
}

時間複雜度： $O (lo g N)$ 更新， $O (K)$ 獲取 Top-K，N 為商品數。
空間複雜度： $O (N)$ ，儲存商品分數。
電商應用：蝦皮首頁「熱門推薦」基於加權與時間衰減。

演算法優化

動態調整：根據業務需求調整 $ \lambda $，控制新舊商品曝光。
快取：用 Redis 持久化排行榜，減少計算。
面試技巧：推導時間衰減公式，比較最小堆與 Sorted Set 的效率。

5. 商品標籤匹配：集合交集與 TF-IDF

場景描述

蝦皮的商品標籤（如「無線」「耳機」）用於搜尋與推薦，類似 Instagram 的 hashtag 功能，需要高效的標籤匹配演算法。

演算法與原理

集合交集：
- 用途：快速匹配多個標籤的商品。
- 原理：用 Redis Set 儲存標籤-商品映射，計算交集獲取匹配結果。
- 數學推導：交集時間 $O (min (∣ S 1∣, ∣ S 2∣))$ ，S1、S2 為標籤集合大小。
TF-IDF（Term Frequency-Inverse Document Frequency）：
- 用途：為標籤分配權重，優先推薦高相關性商品。
- 原理：計算 $ \text{TF-IDF}(t, d) = \text{TF}(t, d) \cdot \log(\frac{N}{\text{DF}(t)}) $，其中 $t$ 為標籤，$ d $ 為商品，$ N $ 為商品總數，$ \text{DF}(t) $ 為包含標籤的商品數。
- 優化：預計算 TF-IDF 分數，儲存至 Redis。

程式碼範例

以下實現標籤匹配與 TF-IDF：

PHP
use Predis\Client;

class ProductTagController extends Controller
{
    public function match(Request $request)
    {
        $redis = new Client(['host' => 'redis']);
        $tags = $request->input('tags', ['無線', '耳機']);
        $cacheKey = 'tags:' . implode(':', $tags);

        // 檢查快取
        if ($cached = $redis->get($cacheKey)) {
            return response()->json(json_decode($cached));
        }

        // 集合交集
        $productIds = $redis->sInter(array_map(fn($tag) => "tag:$tag", $tags));

        // 計算 TF-IDF（簡化版）
        $totalProducts = Product::count();
        $scores = [];
        foreach ($productIds as $productId) {
            $tagCount = $redis->sCard("product:$productId:tags");
            $df = array_sum(array_map(fn($tag) => $redis->sCard("tag:$tag"), $tags));
            $tfIdf = $tagCount * log($totalProducts / ($df ?: 1));
            $scores[$productId] = $tfIdf;
        }
        arsort($scores);
        $topIds = array_slice(array_keys($scores), 0, 10);

        $results = Product::whereIn('id', $topIds)->get();
        $redis->setex($cacheKey, 3600, json_encode($results));

        return response()->json($results);
    }
}

時間複雜度： $O (min (∣ S 1∣, ∣ S 2∣))$ 交集， $O (N \cdot T)$ 計算 TF-IDF，T 為標籤數。
空間複雜度： $O (N)$ ，儲存標籤映射。
電商應用：蝦皮用標籤匹配推薦相關商品。

演算法優化

預計算 TF-IDF：用 Laravel Job 定期更新分數，儲存至 Redis。
標籤索引：用 Elasticsearch 儲存標籤，加速查詢。
面試技巧：推導 TF-IDF 公式，比較集合交集與 Elasticsearch terms 查詢。

結論與實務建議

演算法總結

在非 AWS 環境的 Laravel 電商專案中，核心演算法包括：

布隆過濾器：高效去重用戶行為，誤判率可控。
前綴樹與倒排索引：實現即時搜尋建議與多維查詢。
協同過濾與 Jaccard 相似度：生成個性化推薦。
Top-K 堆與時間衰減：排序熱門商品，模擬「黃金 3 小時」。
集合交集與 TF-IDF：高效匹配商品標籤。

非 AWS 部署建議

本地環境：
- 用 Docker Compose 部署 Elasticsearch、Redis、MySQL（參見前文配置）。
- 確保伺服器記憶體足夠（Elasticsearch 2-4GB，Redis 1GB）。
效能優化：
- Redis 快取熱門查詢與推薦，TTL 10-60 分鐘。
- RabbitMQ 處理高併發行為數據，降低 MySQL 壓力。
監控與維護：
- 用 Laravel Telescope 監控 API 延遲。
- 定期備份 Elasticsearch 索引（snapshot API）。
成本控制：
- 選擇低成本 VPS（如 Linode $5/月）。
- 將圖片儲存於本地磁碟，替代 S3。

面試技巧

深入演算法：推導布隆過濾器誤判率、Jaccard 相似度或 TF-IDF 公式，展示數學基礎。
實務結合：以蝦皮的搜尋與推薦為例，說明演算法如何提升商業價值。
進階話題：討論 ANN（如 HNSW）或 NLP（如 MeCab）在推薦中的應用。

學習資源

演算法：《Introduction to Algorithms》、LeetCode PHP 題目。
Laravel：《Laravel: Up & Running》、Scout 與 Redis 文件。
開源工具：Elasticsearch 官方文件、Redis 官方教程。

羅培宸的技術筆記：從面試戰場到 AI 實戰

2025年9月29日星期一