Need to Migrate Images from Zendesk Community to Discourse? Here’s a Python Script!
Hi everyone,
I’ve been working on migrating a Zendesk Community over to Discourse and ran into a frustrating issue: exporting images from Zendesk. The problem? Zendesk’s CDN starts blocking access after retrieving a few images directly, making bulk downloading a real challenge.
After trying several approaches, I ended up creating a Python script (with some help from AI) that gets around this limitation. The script uses Selenium to open each image URL in a browser, takes a screenshot of the image, and saves it locally. It’s not as clean as directly downloading the images, but it works reliably, and the images come out in high quality.
If you’re dealing with a similar migration, I hope this helps you out!
What You’ll Need
- Python: Installed and ready to go.
- ChromeDriver:
Download it from Chrome for Testing, extract it, and update the script with the path to your driver. - CSV File:
- Create a CSV file with one column named
URL
. - Populate it with the direct URLs of the images you want to export from Zendesk.
- Update the script with the path to this file.
- A Save Location:
Update the script with the folder path where you want the images to be saved.
Lastly, you’ll need to install a couple of Python libraries:
pip install selenium pillow
The Script
Here’s the Python script. Feel free to tweak it to suit your setup:
import csv
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
from PIL import Image
import io
import re
import os
# Function to extract the image ID from the URL
def extract_image_id(url):
match = re.search(r'/([^/]+)\.(png|jpg|jpeg|gif)$', url, re.IGNORECASE)
if match:
return match.group(1)
return 'image'
# Function to download and save an image from a URL
def download_image(driver, image_url, download_folder):
driver.get(image_url)
# Wait for the image to load
wait = WebDriverWait(driver, 20)
img_element = wait.until(EC.presence_of_element_located((By.TAG_NAME, 'img')))
# Get the image element's location and size
location = img_element.location
size = img_element.size
# Take a screenshot of the entire page
screenshot = driver.get_screenshot_as_png()
# Convert screenshot to PIL Image
screenshot_image = Image.open(io.BytesIO(screenshot))
# Define the bounding box for the image (left, top, right, bottom)
left = location['x']
top = location['y']
right = left + size['width']
bottom = top + size['height']
bbox = (left, top, right, bottom)
# Crop the image to the bounding box
cropped_image = screenshot_image.crop(bbox)
# Extract the image ID from the URL
image_id = extract_image_id(image_url)
# Save the cropped image with the image ID as the filename in the download folder
cropped_image.save(os.path.join(download_folder, f'{image_id}.png'))
# Function to load URLs from a CSV file
def load_urls_from_csv(csv_file):
urls = []
with open(csv_file, mode='r', newline='', encoding='utf-8') as file:
reader = csv.DictReader(file)
for row in reader:
urls.append(row['URL']) # Assuming the CSV has 'id' and 'url' columns
return urls
# Set up ChromeDriver service
service = Service("C:\\Users\\tslam\\Downloads\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe")
driver = webdriver.Chrome(service=service)
try:
# Maximize browser window to full screen
driver.maximize_window()
# Load URLs from the CSV file
csv_file = 'C:\\Users\\tslam\\Zendesk Migration\\image_urls.csv'
image_urls = load_urls_from_csv(csv_file)
# Define the download folder path
download_folder = 'C:\\Users\\tslam\\Zendesk Migration\\downloads'
# Ensure the download folder exists
if not os.path.exists(download_folder):
os.makedirs(download_folder)
# Process each image URL
for url in image_urls:
download_image(driver, url, download_folder)
finally:
driver.quit()