Yes.
One idea—among several—is to generate image overlays that include callouts with text, each pointing to a specific component (e.g., a wire, switch, or connector).
Earlier image generation models struggled with rendering technical text accurately. Text was not treated as symbolic content but emerged from diffusion processes, which often produced distorted or unreadable results.
In contrast, the latest model appears to handle text more explicitly. It seems to construct bounded regions (e.g., rectangular containers) and then place text within those regions.
A close inspection of the coffee machine image background reveals multiple layered and interconnected rectangles. These structures are likely artifacts of the image generation pipeline rather than intentional visual elements.
You may also want to review this OpenAI topic, where community members are actively sharing insights and findings.