Like you wrote, I originally thought this was a network stack issue, however repeated OpenAI API calls from the very container work fine.
This is also with a very recent build, commits as at Feb 21.
To prove this (at the cost of burning tokens), I whipped up a quick script to check the OpenAI network stack.
- Runs for 600 seconds (10 mins)
- Makes one chat completion call per second
- Changes the prompt to avoid cache
Execute inside the container ./launcher enter app
, save the below script, make it executable with chmod +x test_openai.sh
and then call it with OPENAI_API_KEY=.... ./test_openai.sh
test_openai.sh
#!/bin/bash
# Duration to run
DURATION_SECS=600
# Initialize counters
successful=0
unsuccessful=0
declare -A error_messages
# Function to calculate percentage
calc_percentage() {
local total=$(($1 + $2))
if [ $total -eq 0 ]; then
echo "0.00"
else
echo "scale=2; ($2 * 100) / $total" | bc
fi
}
# Function to print statistics
print_stats() {
local percent=$(calc_percentage $successful $unsuccessful)
echo "-------------------"
echo "Successful calls: $successful"
echo "Failed calls: $unsuccessful"
echo "Failure rate: ${percent}%"
echo "Error messages:"
for error in "${!error_messages[@]}"; do
echo " - $error (${error_messages[$error]} times)"
done
}
end_time=$((SECONDS + DURATION_SECS))
counter=1
while [ $SECONDS -lt $end_time ]; do
# Make the API call with timeout
response=$(curl -s -w "\n%{http_code}" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d "{
\"model\": \"gpt-4o-mini\",
\"messages\": [{\"role\": \"user\", \"content\": \"Use this number to choose a one word response: $counter\"}]
}" \
--connect-timeout 5 \
--max-time 10 \
https://api.openai.com/v1/chat/completions 2>&1)
# Get the last line (status code) and response body
http_code=$(echo "$response" | tail -n1)
body=$(echo "$response" | sed '$d')
# Check if the call was successful
if [ "$http_code" = "200" ]; then
((successful++))
else
((unsuccessful++))
# Extract error message
error_msg=$(echo "$body" | grep -o '"message":"[^"]*"' | cut -d'"' -f4)
if [ -z "$error_msg" ]; then
error_msg="Connection error: $body"
fi
# Increment error message counter
((error_messages["$error_msg"]++))
fi
# Print current statistics
print_stats
((counter++))
# Wait for 1 second before next call
sleep 1
done
For the test script, my failure rate was under 0.5%, which is acceptable at that scale.
This tells me that the issue lies with the Discourse software rather than the container or network stack that powers it.
If it hasn’t been fixed with a recent commit, I’ll have a look further into it.