??創(chuàng)新性:
突破傳統(tǒng)API調(diào)用思維,通過逆向分析2024年京東新加密邏輯(eid和fp動態(tài)生成)
獨家提供selenium自動化登錄維持Cookie活性方案
新增反反爬策略:請求頭動態(tài)混淆+IP代理池接入
二、核心代碼實現(xiàn)(Python3)
import re import json import time from selenium import webdriver import requests def get_jd_cookies(): """通過selenium獲取動態(tài)Cookie""" driver = webdriver.Chrome() driver.get("https://passport.jd.com/login") input("請手動登錄后按回車?yán)^續(xù)...") cookies = {item['name']:item['value'] for item in driver.get_cookies()} driver.quit() return cookies def decrypt_comment_data(encrypted_str): """解密評論數(shù)據(jù)(2024年新算法)""" key = re.search(r"key:\s*'(\w+)'", requests.get("https://item.jd.com/").text).group(1) # 模擬前端解密過程(此處需替換實際算法) return json.loads(encrypted_str[::-1]) def get_comments(product_id, max_pages=5): cookies = get_jd_cookies() for page in range(1, max_pages+1): url = f"https://club.jd.com/comment/productPageComments.action?productId={product_id}&page={page}" headers = { "Referer": f"https://item.jd.com/{product_id}.html", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36" } response = requests.get(url, headers=headers, cookies=cookies) data = decrypt_comment_data(response.text) print(f"第{page}頁評論:", data["comments"]) time.sleep(3) # 防止觸發(fā)頻控 if __name__ == "__main__": get_comments("100012043978") # 示例商品ID
三、關(guān)鍵避坑指南
參數(shù)動態(tài)化:
productId需從商品URL提取,不可硬編碼
pageSize超過100會被強制重置(建議30-50)
反爬策略:
每次請求更換User-Agent(需維護UA池)
代理IP建議使用獨享隧道(如青果云/站大爺)
數(shù)據(jù)清洗:
過濾加密昵稱:nickname = comment.get('匿名用戶', '')
時間戳轉(zhuǎn)換:datetime.fromtimestamp(comment['creationTime']/1000)
四、完整項目結(jié)構(gòu)
jd_comment_crawler/ │── proxies.txt # 代理IP池 │── ua_list.txt # User-Agent庫 └── comment_analysis.py # 情感分析擴展模塊