需要将图片从 Zendesk Community 迁移到 Discourse?这里有一个 Python 脚本!
大家好,
我一直在致力于将 Zendesk Community 迁移到 Discourse,并遇到了一个令人沮丧的问题:从 Zendesk 导出图片。问题是什么?Zendesk 的 CDN 在直接检索几张图片后开始阻止访问,这使得批量下载变得非常困难。
在尝试了几种方法后,我最终创建了一个 Python 脚本(在 AI 的帮助下),该脚本可以绕过此限制。该脚本使用 Selenium 打开浏览器中的每个图片 URL,截取图片的屏幕截图,然后将其保存在本地。这不像直接下载图片那么简洁,但它能可靠地工作,并且导出的图片质量很高。
如果您也面临类似的迁移问题,希望这个脚本能帮到您!
您需要什么
- Python: 已安装并准备就绪。
- ChromeDriver:
从 Chrome for Testing 下载,解压缩,然后更新脚本中的驱动程序路径。 - CSV 文件:
- 创建一个名为
URL的单列 CSV 文件。 - 用您要从 Zendesk 导出的图片的直接 URL 填充该文件。
- 更新脚本中的文件路径。
- 创建一个名为
- 保存位置:
更新脚本中的文件夹路径,您希望将图片保存在那里。
最后,您需要安装几个 Python 库:
pip install selenium pillow
脚本
这是 Python 脚本。您可以随意修改它以适应您的设置:
import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from PIL import Image
import io
import re
import os
# Function to extract the image ID from the URL
def extract_image_id(url):
match = re.search(r'/([^/]+)\.(png|jpg|jpeg|gif)$', url, re.IGNORECASE)
if match:
return match.group(1)
return 'image'
# Function to download and save an image from a URL
def download_image(driver, image_url, download_folder):
driver.get(image_url)
# Wait for the image to load
wait = WebDriverWait(driver, 20)
img_element = wait.until(EC.presence_of_element_located((By.TAG_NAME, 'img')))
# Get the image element's location and size
location = img_element.location
size = img_element.size
# Take a screenshot of the entire page
screenshot = driver.get_screenshot_as_png()
# Convert screenshot to PIL Image
screenshot_image = Image.open(io.BytesIO(screenshot))
# Define the bounding box for the image (left, top, right, bottom)
left = location['x']
top = location['y']
right = left + size['width']
bottom = top + size['height']
bbox = (left, top, right, bottom)
# Crop the image to the bounding box
cropped_image = screenshot_image.crop(bbox)
# Extract the image ID from the URL
image_id = extract_image_id(image_url)
# Save the cropped image with the image ID as the filename in the download folder
cropped_image.save(os.path.join(download_folder, f'{image_id}.png'))
# Function to load URLs from a CSV file
def load_urls_from_csv(csv_file):
urls = []
with open(csv_file, mode='r', newline='', encoding='utf-8') as file:
reader = csv.DictReader(file)
for row in reader:
urls.append(row['URL']) # Assuming the CSV has 'id' and 'url' columns
return urls
# Set up ChromeDriver service
service = Service("C:\\Users\\tslam\\Downloads\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe")
driver = webdriver.Chrome(service=service)
try:
# Maximize browser window to full screen
driver.maximize_window()
# Load URLs from the CSV file
csv_file = 'C:\\Users\\tslam\\Zendesk Migration\\image_urls.csv'
image_urls = load_urls_from_csv(csv_file)
# Define the download folder path
download_folder = 'C:\\Users\\tslam\\Zendesk Migration\\downloads'
# Ensure the download folder exists
if not os.path.exists(download_folder):
os.makedirs(download_folder)
# Process each image URL
for url in image_urls:
download_image(driver, url, download_folder)
finally:
driver.quit()