When using Google to search about LibreOffice, it is common to be redirected to very old help pages. For instance, today I was searching about the Basic IDE "Watch Window", so I typed in Google "libreoffice basic ide watch window". The first result was: https://help.libreoffice.org/6.2/en-US/text/sbasic/shared/01020100.html?&DbPAR=WRITER&System=UNIX Which is the page from LibreOffice 6.2. Most users wouldn't notice that this is an outdated version of LibreOffice and would keep on reading the help page. And there's no easy way to either (i) know that this is outdated or (ii) switch to the latest release. This isn't the worst case... sometimes I get redirected to help pages written in Hebrew (even though I'm in Brazil and I have never searched anything in Hebrew). In other occasions I get redirected to the old Wiki help pages. This is a major problem because: 1) If the user gets an outdated help page the instructions might be different than the ones they need in the LO version being used 2) Recent help pages have more content and more detailed instructions 3) In the latest releases the help pages are much more visually appealing than in older versions I know it's Google's algorithm at fault here... However, I would like to propose a workaround. As soon as a LibreOffice version gets unsupported, we could add a warning message at the top of the webpage saying: "You are accessing a help page that refers to an unsupported version of LibreOffice. Click here if you would like to switch to the latest version of this help page." Clicking the link above would simply rebuild the URL and send the user to the most recent version of the same help page.
in the same idea, since translations are partial to many languages : https://ask.libreoffice.org/t/website-bug-multiple-languages/36451/2
(In reply to fpy from comment #1) > in the same idea, since translations are partial to many languages : > > https://ask.libreoffice.org/t/website-bug-multiple-languages/36451/2 Just a note on that thread: in 2018 we were using AskBot, not Discourse for the Q&A forum. It is not useful to mix such cases together.
I asked Guilhem if we could use https://nginx.org/en/docs/http/ngx_http_sub_module.html to inject this note to old pages. <guilhem> i don't see the value of doing that dynamically given all pages there are static <guilhem> all updates are manually done by olivier already <guilhem> i mean compiled locally and rsynced to a placed where they can be served Olivier: manual mass-replacement seems errorprone and also quite tedious to do (and remember to do), but is this the way you would prefer to do it? Should we only use English or start agonising about multi-language warnings?
For the records, this was in @tdf-infra on IRC: -------------------------8<---------------------- 07:41:14 AM - guilhem: iirc the sitemap only list latest, could the same for robots.txt 07:42:16 AM - ohallot: The sitemap list the latest. 07:44:21 AM - ohallot: "latest" and "master" are links to 24.8/ and 25.2/ respectively. 07:44:23 AM - Quikee3 has left the room (Quit: Client closed). 07:46:54 AM - ohallot: must be simplet to fix robots.txt 07:57:07 AM - ohallot: just checked... there are no robots.txt in help.libreoffice.org 07:57:44 AM - guilhem: there is 07:57:46 AM - guilhem: it's in salt 07:59:42 AM - ohallot: okay then. 08:15:42 AM - guilhem: `curl https://help.libreoffice.org/robots.txt` -------------------------8<---------------------- and the result of the curl command is: -------------------------8<---------------------- olivier@olivier-ntbk:/tmp$ curl https://help.libreoffice.org/robots.txt User-agent: * Disallow: /6. Disallow: /7. Disallow: /24. Disallow: /25. Disallow: /26. Disallow: /27. Disallow: /28. Disallow: /29. Disallow: /30. Disallow: /31. Disallow: /32. Disallow: /33. Disallow: /34. Disallow: /35. Allow: /latest/ Disallow: /master/ Sitemap: https://help.libreoffice.org/sitemap.xml -------------------------8<---------------------- Is this enough to fix the issue?
from https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt#syntax "A page that's disallowed in robots.txt can still be indexed if linked to from other sites. While Google won't crawl or index the content blocked by a robots.txt file, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google search results, password-protect the files on your server, use the noindex meta tag or response header, or remove the page entirely. " since old versions are not updated(?), would be easy to add <meta name="robots" content="noindex"> add also a sort of banner in the content. Q: what's the process to do/publish these changes ?