The following API documentation page: https://api.libreoffice.org/docs/idl/ref/servicecom_1_1sun_1_1star_1_1document_1_1MediaDescriptor.html#a777c03d61101090a1539aafc6ba1c4ca says that the FilterName option passed using com.sun.star.beans.PropertyValue should be an internal filter name and references to the following Online Help page: https://help.libreoffice.org/latest/en-US/text/shared/guide/convertfilters.html However, the Filter Names as listed on this Online Help page are actually UI Names (or Localized Names) of those filters, not API Names. For instance: * on the online help page the Filter Name for ODT is "Writer 8", but according to filter/source/config/fragments/filters/writer8.xcu in the source code the API name for ODT should be "writer8". The API name should be used in the MediaDescriptor::FilterName rather than the UI name, otherwise the API calling will be wrong and the API then falls back to the default fiter. * Online help page says "Word 2007–365", but the API name should be "MS Word 2007 XML". The solution is not to revise those names on Online Help to those UI names, but to add a new column indicating their corresponding API Names, just like the OpenOffice wiki page did: https://wiki.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_3_0
There is a helper script in: helpcontent2/helpers/convertfilters.py which can be used to update the convertfilters.xhp file. Olivier Hallot: I see you had updated that script before, are you going to update it to include the API Names column? If not, then I will take this. The information should be taken from the oor:name attribute such as: <node oor:name="MS Excel 4.0" oor:op="replace"> This column will be marked as localize="false" so that no translation is needed.
Yes I created helpcontent2/helpers/convertfilters.py, based on Eike Rathke advice. Please feel free to take for improvements and let me know if you need assistance.
A path is submitted on gerrit for review: https://gerrit.libreoffice.org/c/help/+/116405 In my patch, I did some tweak for the python script. The important change is the following: 1. The previously code generated the filters list based on all the Types nodes. However, the problem is that there are duplicated entries (e.g. for writer.xcd there are two oor:component-data/node[@oor:name="Types"] and thus causes duplicates. The revised script takes the last oor:name="Filter" node as the start point, and get the UI Name and API Name information from that node; then, it dig into the oor:name="Types" node to query for the Media Type and Extension info, using the API Name as the key. Since the API Names are unique, no duplicates will be generated, and also we take the order from the xcd files directly and no sorting is done. 2. The previous code uses some random numbers or the "count" sequential numbers for the IDs. However, this will cause problem for translators because once the xhp file is re-generated using this script the IDs will change and thus the translated strings will become fuzzy in the PO files. The revised script uses some fixed words (i.e., the API Names which may not change overtime) for the IDs, thus (best to my knowledge) when re-generated the translations will remain the same if the string is not changed. In theory we can manually edit the xhp file, but there is a lot and mistake may happen and the future maintenance is not easy. I am aware that this change will cause the l10n team to re-translate the strings on this page, but this change will benefit in the long-run.
Kevin Suo committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/help/commit/226a545d33667a0c9526593a5182ac0a849933e2 tdf#142417: Improve convertfilters.py and add API Names column