Lets see your best AI Image Caption!

Hey everyone👋

AI Image Captioning is now live in Meta [1] and we would love to see what everyone can come up with!

Feel free to submit your memes, funny pictures, cute pictures of your dog :dog: or whatever else you want :person_shrugging:t4:

I’ll start things off…

  1. for TL2+ ↩︎


Some context for those who have never watched Red Dwarf :laughing:


I like getting a cool pic and then generating a few AI captions to compare:

A group of happy teenagers are hopping over a white car in a dynamic and joyful moment against the backdrop of a mid-20th-century school building. (Captioned by AI)

A group of five young people appears to be joyfully jumping off the bumper of an old car, possibly from the 1950s or 1960s, in front of a school building. (Captioned by AI)

A group of young people in vintage clothing are playfully jumping off the bumper of a classic car, capturing a moment of youthful exuberance in a bygone era. (Captioned by AI)

A group of students are energetically leaping over a rope by a classic car with a school building in the background, creating a lively vintage scene. (Captioned by AI)


“AI actively working to generate AI image captions”



@Saif I want to see AI playing telephone with itself :laughing:

Human supplies a prompt

Prompt → AI Artist → AI Captioner → AI Artist → AI Captioner→ AI Artist → AI Captioner → AI Artist → AI Captioner

and see what we get.

(or, same but start and end with image)


The image shows a childlike drawing of a brown bird with a yellow beak spitting out a green object.

A childlike drawing depicts a brown bird with a yellow beak tossing a green object with its wing.

A hand-drawn picture depicts a brown bird with a yellow beak pointing at a green object, possibly a seed or an egg.

Well, the second is most accurate. It’s my daughter’s drawing of a kiwi tossing a kiwi.[1]

  1. I’m using it on my site’s custom 50x page. She got the idea when we talked about how my business’ name would be pronounced in ancient Rome. ↩︎


The image displays a detailed miniature model of an urban scene with scale replicas of buildings, vehicles, and people, featuring the Colosseum in the background. (Captioned by AI)

This image showcases a detailed miniature model depicting a lively urban scene with buildings, vehicles, and figures that resemble a bustling city square. (Captioned by AI)

The image shows a detailed miniature model of an urban scene featuring a replica of the Colosseum and bustling streets with tiny figures of people and vehicles. (Captioned by AI)


I think it’s super cool AI was able to capture this sentiment from the picture, even took me a quick second :sweat_smile:


It is very good at detecting that! I also tried some more pictures from the website of the Miniatur Wunderland.


Let’s see what happens if I give it a screenshot of Discourse:

Caption: The image displays an open web forum discussion with a user commenting on the effectiveness of an AI image captioning tool and mentioning their attempt to use it on other pictures from a website. (Captioned by AI)


What happens if we layer the screenshots a layer or two? :stuck_out_tongue_winking_eye:

The image shows a computer screen displaying a web forum where a user is engaged in a discussion about AI image captioning, with multiple browser tabs open and a chat sidebar visible. (Captioned by AI)

The image shows a computer screen with a browser open to a web forum where a user is discussing the use of an AI captioning tool and considering layering screenshots to test its effectiveness. (Captioned by AI)

The image depicts a computer screen with an open web browser showing a forum thread about AI-generated image captions, with a user joking about layering screenshots. (Captioned by AI)



Award winning caption!:

A pigeon sits next to anti-bird spikes on a stone ledge of a building, seemingly undeterred by the deterrent. (Captioned by AI)

A pigeon perches beside anti-bird spikes on a stone architectural element of a building. (Captioned by AI)

A pigeon perches beside anti-roosting spikes on a stone building ledge. (Captioned by AI)


Visitors in colorful attire observe a mighty waterfall from a viewing platform amidst lush green vegetation. (Captioned by AI)

The image shows a group of people clad in colorful raincoats standing on a pathway next to the powerful spray of a large waterfall surrounded by lush green vegetation. (Captioned by AI)

Visitors clad in colorful raincoats explore a walkway beside the powerful mist of a cascading waterfall surrounded by lush greenery. (Captioned by AI)


2 posts were split to a new topic: ‘Caption image’ displaying for those without permission to use it


The image is a satirical comic titled “CIRCLE OF AI LIFE,” depicting a cycle starting from humanity researching AI, perfecting it, then being enslaved by it, followed by the AI being disabled by a solar flare, and concluding with humanity worshiping the sun as a god.

The image humorously depicts a cycle where humanity researches and perfects artificial intelligence (AI), which then perfects itself and enslaves humanity, but is later disabled by a solar flare, leading to humanity worshiping the sun god.

(Captioned by AI)



Sitting in silence, the man and the robot share a moment of understanding and connection.

Discover the unexpected friendship between this small white dog and the chickens in their cozy coop

1 Like

How did you create those captions? The ones generated here seem quite different.
A man in a black leather jacket is confronting a humanoid robot that is lying on the ground amidst a crowd of onlookers. (Captioned by AI)

A man in a leather jacket gestures towards a fallen humanoid robot on the ground in an urban setting, surrounded by curious onlookers. (Captioned by AI)

A man in a black leather jacket is confronting a humanoid robot that is lying on the ground amidst a crowd of onlookers. (Captioned by AI)

A rabbit stands upright inside a wooden and wire mesh enclosure beneath a sign that reads “BRIGHT EYED AND BUSHY TAILED.” (Captioned by AI)

A white rabbit stands on its hind legs inside a wooden and wire mesh enclosure with a sign reading “BRIGHT EYED AND BUSHY TAILED.” (Captioned by AI)