I have a discourse where I am a moderator (I have no access to the backend). Someone has posted numerous topics where images have been hotlinked to a 3rd party hosting provider (in this case, Google docs). They left the company, and all those image links are now broken.
I can (and have) manually gone through some of their posts, to find and fix (thanks internet archive) broken images. But that’s laborious. I’d like to get a list of every topic containing these broken image URLs so we can collectively fix them, by re-uploading the images to the site.
I can of course use search to find with:images #tutorials, but I cannot search inside the image URLs for (for example) googleusercontent. Is that possible, without API or backend rake access?
An admin could create a data explorer query that finds those posts.
But if the admin wanted this not to happen they’d have download images to local turned on. It’s a problem they glgave created and it’s not really a moderator’s job to fix it.
Does that mean you can’t install data explorer either? That would be the tool of choice for this.
How are the image formatted in the posts? Do they only show the plain URL, use [img], <img>, ![](url)…?
Just to illustrate your issue. A post could contain a broken image url, such as https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNaW4QQ43EQ-8qqQPntDP7so6Cg19PVSLN9bXv3ZhQqHZtomb8CGY3XArx3GIaZ04d0p9K3V-buaf73-M5dpq2wPuvnjsapStHdTkTVoPj2q9RAmcdczmE12HYz57PNOdVuft1/s1600-h/eastern_coastal_pcn_ap.jpg
If I search for googleusercontent zero results are returned. Yet I can find posts which have images in, referenced by the a URL containing the text googleusercontent. I don’t know if this is a bug or a feature that discourse doesn’t search the urls of markdown-formatted image links.
I believe Discourse search is performed on the processed post, which contains HTML.
The search ignores html tags, and IMG tags contain no text, hence the impossibility to return what you’re looking for.
Why can’t you use the API?
You could create a local script that triggers a search query for the user’s posts containing images, iterates through the results (slowly enough to not reach rate limits, and also you can query the raw posts content if needed) and outputs the posts containing the substring you’re looking for.
Maybe there’s a simpler solution, but that’s I would go for with no other option. Fairly simple to do.
I haven’t requested an API key (bureaucracy), and I wasn’t aware that I’d need one to do what I perceived as a “simple” search query. I wasn’t aware it doesn’t peek into the HTML tags in the content. So that’s explained that, thank you.
It’s not a problem the admin created. It’s just a situation the admins and content creators were not aware of until someone left the company, and access to google docs was shut off for that account, making the images disappear/break.
I agree that I could ask for an API key, or write something locally to scrape the site and find the offending posts. I’ll do one of those things.
You don’t need an API key to do a simple search, but I don’t see the point in “using the API” to do a simple search.
Perhaps I misunderstood the issue. It sounded like an issue that wouldn’t have happened if download remote images to local had been on, and it’s on by default. But it’s also likely that it got turned off for some beaurocratic reason that the admin did that. I think it’s going to be unnecessarily hard to solve your problem without the data explorer plugin or access to Rails.