The questions you ask yourself when creating the primary text alternative for an image should not be
It should be
“In context” is important.
Think your friend across the room who is scrolling Tumblr on their own phone and not looking at your phone which you, too, are using to scroll Tumblr, and a short post makes you laugh. You’d probably say whatever text is in the post (potentially abbreviated) and then if, say, a meme was used, you would say what the meme is, not what the photo is.
Imagine (I say, as if you have never seen this online) if every time someone used a memetic image they included a wall of text explaining the content of the image, with subjective, redundant or even incorrect details:
A photograph of a man in partial profile from the right arm up. He is standing outside with city buildings and a teenage or young adult boy’s face in the background. He has light brown skin, dark coily hair, and a short beard. His mouth is open, and he is wearing a backwards neon pink baseball cap, silver-framed futuristic sunglasses, a candy necklace, a neon green t-shirt, and a wide silver cuff bracelet. He has one wired earbud in, and is holding up a white bottle without a label that is shaped like a bottle for salad dressing. The caption on the bottom is yellow text all in lower case. It reads, “cheers I’ll drink to that bro.” The log in the bottom right corner says “bracket adult swim dot com bracket”.
instead of its moniker/purpose
meme: Eric Andre cheers I’ll drink to that bro
If you know the meme, you know what that means. In the context of Tumblr, a user can be expected to know the purpose & significance of a meme, because that is literally what a meme is!
If you don’t know the meme, in nine words and 36 characters you have enough information to search for or ask for more. With the long description, you have no clue what the image actually is on the whole nor what purpose it is serving on the page until the the second to last sentence.
It takes me about ~2 seconds to look at the meme in question (shown below) recognize it, and read the caption.

The long text description above takes more than 51 seconds for Natural Reader text to speech software to read aloud.
The short one takes only four (4) seconds.
Which text alternative do you think provides the most equivalent experience to someone looking at the image?
For additional information on established web accessibility standards for text alternatives, you can go to:
sitta-pusilla liked this
rosemarysealavender reblogged this from silentwalrus1
rosemarysealavender liked this
byfoculous reblogged this from spookyvance
beetleboppin reblogged this from cmoonghost
beetleboppin liked this
disableddeaconness reblogged this from h0neybunzworld
pricklyest liked this
beatle-capaldi liked this
gessorly reblogged this from exoscopy
lululandia reblogged this from bifca
11-eeleven reblogged this from samathekittycat
11-eeleven liked this