导出Zendesk社区图片的Python脚本

需要将图片从 Zendesk Community 迁移到 Discourse?这里有一个 Python 脚本!

大家好,

我一直在致力于将 Zendesk Community 迁移到 Discourse,并遇到了一个令人沮丧的问题:从 Zendesk 导出图片。问题是什么?Zendesk 的 CDN 在直接检索几张图片后开始阻止访问,这使得批量下载变得非常困难。

在尝试了几种方法后,我最终创建了一个 Python 脚本(在 AI 的帮助下),该脚本可以绕过此限制。该脚本使用 Selenium 打开浏览器中的每个图片 URL,截取图片的屏幕截图,然后将其保存在本地。这不像直接下载图片那么简洁,但它能可靠地工作,并且导出的图片质量很高。

如果您也面临类似的迁移问题,希望这个脚本能帮到您!


您需要什么

  1. Python: 已安装并准备就绪。
  2. ChromeDriver:
    Chrome for Testing 下载,解压缩,然后更新脚本中的驱动程序路径。
  3. CSV 文件:
    • 创建一个名为 URL 的单列 CSV 文件。
    • 用您要从 Zendesk 导出的图片的直接 URL 填充该文件。
    • 更新脚本中的文件路径。
  4. 保存位置:
    更新脚本中的文件夹路径,您希望将图片保存在那里。

最后,您需要安装几个 Python 库:

pip install selenium pillow

脚本

这是 Python 脚本。您可以随意修改它以适应您的设置:

import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from PIL import Image
import io
import re
import os

# Function to extract the image ID from the URL
def extract_image_id(url):
    match = re.search(r'/([^/]+)\.(png|jpg|jpeg|gif)$', url, re.IGNORECASE)
    if match:
        return match.group(1)
    return 'image'

# Function to download and save an image from a URL
def download_image(driver, image_url, download_folder):
    driver.get(image_url)
    
    # Wait for the image to load
    wait = WebDriverWait(driver, 20)
    img_element = wait.until(EC.presence_of_element_located((By.TAG_NAME, 'img')))

    # Get the image element's location and size
    location = img_element.location

    size = img_element.size

    # Take a screenshot of the entire page
    screenshot = driver.get_screenshot_as_png()
    
    # Convert screenshot to PIL Image
    screenshot_image = Image.open(io.BytesIO(screenshot))

    # Define the bounding box for the image (left, top, right, bottom)
    left = location['x']
    top = location['y']
    right = left + size['width']
    bottom = top + size['height']
    bbox = (left, top, right, bottom)

    # Crop the image to the bounding box
    cropped_image = screenshot_image.crop(bbox)

    # Extract the image ID from the URL
    image_id = extract_image_id(image_url)

    # Save the cropped image with the image ID as the filename in the download folder
    cropped_image.save(os.path.join(download_folder, f'{image_id}.png'))

# Function to load URLs from a CSV file
def load_urls_from_csv(csv_file):
    urls = []
    with open(csv_file, mode='r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        for row in reader:
            urls.append(row['URL'])  # Assuming the CSV has 'id' and 'url' columns
    return urls

# Set up ChromeDriver service
service = Service("C:\\Users\\tslam\\Downloads\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe")
driver = webdriver.Chrome(service=service)

try:

    # Maximize browser window to full screen
    driver.maximize_window()
    
    # Load URLs from the CSV file
    csv_file = 'C:\\Users\\tslam\\Zendesk Migration\\image_urls.csv'
    image_urls = load_urls_from_csv(csv_file)
    
    # Define the download folder path
    download_folder = 'C:\\Users\\tslam\\Zendesk Migration\\downloads'
    
    # Ensure the download folder exists
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)

    # Process each image URL
    for url in image_urls:
        download_image(driver, url, download_folder)

finally:
    driver.quit()

6 个赞

太好了!这甚至可以用于解决同样问题的其他迁移,而不仅仅是Zendesk。非常感谢您的分享。

3 个赞