The rebuild likely reset the container’s cgroup placement, which would explain why it’s stable again.
Given the original can’t alloc thread errors and the fact that everything else (ulimits, TasksMax, Docker PIDs) is unlimited, the remaining suspect is PID cgroup pressure.
Could you check during normal load:
cat /sys/fs/cgroup/pids.current
cat /sys/fs/cgroup/pids.max
If pids.current is approaching ~2000+ against a max of ~2285, that would confirm the container was hitting the cgroup PID ceiling during the scheduler / Redis reconnect bursts.
That would also explain why the issue only appeared after the upgrade (higher thread churn), and why the rebuild temporarily cleared it.