PR was merged and the new server outlet was added to beta and stable, so now the plugin can have all the code that overwrites robots.txt in bulk amended without worrying about backwards compat.
After recent updates my https://forum.dobreprogramy.pl/newssitemap.xml returns 404.
Sorry. I forgot to mention it. News sitemap location is changed as
So your URL should be https://forum.dobreprogramy.pl/sitemap/news.xml
Great, this works - thanks.
BTW the news sitemap returns en error in Google Webaster tools:
This XML tag has an invalid value.
Parent tag: publication
<language>is the language of your publication. It should be an ISO 639 Language Code (either 2 or 3 letters). Exception: For Chinese, please use zh-cn for Simplified Chinese or zh-tw for Traditional Chinese.
So as per google’s documentation language should be like
pl instead of
pl_PL. But in Discourse we are using language codes with country suffix for many languages. So we have to remove it in sitemap generation.
Not quite. I just fixed this for RSS feeds, you simply need to replace the
- and it will be valid. Give me a sec, I’ll find my commit which may help
Here is the PR, I did for RSS
And the conversation around that commit
But they telling like language code should be 2 to 3 letters. So I guess
pl-PL also not valid for sitemaps.
Edit: Anyway I will try your suggestion too thank you
Does their webmaster tool let you copy and paste in a sitemap? If so, taking an existing one and putting pl-PL as the language, could quickly tell you if it’ll validate.
Also, those exceptions seem, like they would permit the use of the hyphenated approach, but I very well could be wrong.
That language code is two letters. The other two letters are the country code.
i.e. The language is Polish, the country is Poland
FYI, https://www.google.com/schemas/sitemap-news/0.9/sitemap-news.xsd reports the
language element should be
Language of the publication. It should be an ISO 639 Language Code (either 2 or 3 letters); see: ISO 639-2 Language Code List - Codes for the representation of names of languages (Library of Congress) Exception: For Chinese, please use zh-cn for Simplified Chinese or zh-tw for Traditional Chinese. Required.
Which is the same ISO spec RSS supports, so I’d be really surprised if it doesn’t work using a hyphen.
I think the confusion is about what is meant by “exception”. Most are combinations of language-country. Chinese is more like language-variant.
This would be a bit analogous to russian-latin and russian-cyrillic or german-low and german-high
That is not totally correct. You should read LL-CC as “Language (ISO 639) LL as it is spoken in Country CC”.
The language code is NOT just the first two letters, since the spelling can differ.
For instance en-us : color and en-gb: colour. So en-us and en-gb are two different languages, the language is not defined by the first two letters alone.
You can’t always reduce things like pl-pl to pl, because for some European languages there is a difference (there is nl, nl-be and nl-nl: Dutch, Dutch as spoken in Belgium, Dutch as spoken in the Netherlands, and another example is pt, pt-br and pt-pt: Portuguese, Portuguese as spoken in Brazil and Portuguese as spoken in Portugal).
Now there is a difference between RSS and sitemaps.
RSS does NOT use bare ISO 639 codes. The list of supported codes for RSS is here RSS Language Codes . You can see that there are many LL-CC type codes.
For sitemaps, like @vinothkannans said, it’s always two letters, and the only exceptions are zh-cn and zh-tw .
lol, it seems l’m the one that’s confused
So unlike the RSS fix of replacing underscores with hyphens, It needs to use only up to the underscore - except when Chinese - when used in a sitemap’s publication tag.
A bit more logic but not that much more I guess.
While checking a sample news sitemap using below online validator it is not accepting both
pl-PL language codes. It only accepting language code
pl without country suffix.
I’ve merged the PR, thank you!
These sitemaps created should be cached, as my site have 1 million posts, it gives error 502 when again and again sitemap is created.
It have cache already. Is there any problem below?