Python Script to Export Images from Zendesk Community

Need to Migrate Images from Zendesk Community to Discourse? Here’s a Python Script!

Hi everyone,

I’ve been working on migrating a Zendesk Community over to Discourse and ran into a frustrating issue: exporting images from Zendesk. The problem? Zendesk’s CDN starts blocking access after retrieving a few images directly, making bulk downloading a real challenge.

After trying several approaches, I ended up creating a Python script (with some help from AI) that gets around this limitation. The script uses Selenium to open each image URL in a browser, takes a screenshot of the image, and saves it locally. It’s not as clean as directly downloading the images, but it works reliably, and the images come out in high quality.

If you’re dealing with a similar migration, I hope this helps you out!


What You’ll Need

  1. Python: Installed and ready to go.
  2. ChromeDriver:
    Download it from Chrome for Testing, extract it, and update the script with the path to your driver.
  3. CSV File:
  • Create a CSV file with one column named URL.
  • Populate it with the direct URLs of the images you want to export from Zendesk.
  • Update the script with the path to this file.
  1. A Save Location:
    Update the script with the folder path where you want the images to be saved.

Lastly, you’ll need to install a couple of Python libraries:

pip install selenium pillow

The Script

Here’s the Python script. Feel free to tweak it to suit your setup:

import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from PIL import Image
import io
import re
import os

# Function to extract the image ID from the URL
def extract_image_id(url):
    match = re.search(r'/([^/]+)\.(png|jpg|jpeg|gif)$', url, re.IGNORECASE)
    if match:
        return match.group(1)
    return 'image'

# Function to download and save an image from a URL
def download_image(driver, image_url, download_folder):
    driver.get(image_url)
    
    # Wait for the image to load
    wait = WebDriverWait(driver, 20)
    img_element = wait.until(EC.presence_of_element_located((By.TAG_NAME, 'img')))

    # Get the image element's location and size
    location = img_element.location
    size = img_element.size

    # Take a screenshot of the entire page
    screenshot = driver.get_screenshot_as_png()
    
    # Convert screenshot to PIL Image
    screenshot_image = Image.open(io.BytesIO(screenshot))

    # Define the bounding box for the image (left, top, right, bottom)
    left = location['x']
    top = location['y']
    right = left + size['width']
    bottom = top + size['height']
    bbox = (left, top, right, bottom)

    # Crop the image to the bounding box
    cropped_image = screenshot_image.crop(bbox)

    # Extract the image ID from the URL
    image_id = extract_image_id(image_url)

    # Save the cropped image with the image ID as the filename in the download folder
    cropped_image.save(os.path.join(download_folder, f'{image_id}.png'))

# Function to load URLs from a CSV file
def load_urls_from_csv(csv_file):
    urls = []
    with open(csv_file, mode='r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        for row in reader:
            urls.append(row['URL'])  # Assuming the CSV has 'id' and 'url' columns
    return urls

# Set up ChromeDriver service
service = Service("C:\\Users\\tslam\\Downloads\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe")
driver = webdriver.Chrome(service=service)

try:

    # Maximize browser window to full screen
    driver.maximize_window()
    
    # Load URLs from the CSV file
    csv_file = 'C:\\Users\\tslam\\Zendesk Migration\\image_urls.csv'
    image_urls = load_urls_from_csv(csv_file)
    
    # Define the download folder path
    download_folder = 'C:\\Users\\tslam\\Zendesk Migration\\downloads'
    
    # Ensure the download folder exists
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)

    # Process each image URL
    for url in image_urls:
        download_image(driver, url, download_folder)

finally:
    driver.quit()

5 Likes

Nice! This could even be used for other migrations that face the same issue, not only Zendesk. Thank you so much for sharing.

2 Likes