Created attachment 130543 [details]
Visio doc where Russian text is imported as garbage
Currently there's no way to define default language used in libvisio on importing files when an object does not define a codepage in file.
Then in libvisio::VSDContentCollector::appendCharacters(), default clause sets hardcoded value "windows-1252".
Can a way be added to libvisio (ant maybe other DLPs) to specify default language on init (e.g. in writerperfect::detail::ImportFilterImpl<Generator>::filter() as an (optional?) argument? Then it would be possible to use LO's default document language for that purpose (and thus it would be user-controllable), like is currently done for e.g. DXF.
The attachment is an VSD document which contents is in Russian, but isn't detected as such, and is imported as garbage.
No need to put default encoding. I found the bug. We are basically assuming that the text is ANSI for all text in versions <= 6. It will nevertheless take some time here to fix, since in one block of text we can have several fonts and thus several encodings. We will have to change the way how we iterate over the chars.
IMHO, this commit https://cgit.freedesktop.org/libreoffice/libvisio/commit/?id=94f36d00499808d7588a0970ce0dc7470d1245c7 fixes the issue.
Thank you Fridrich! We will see that as soon as libvisio gets updated, right?
(In reply to Mike Kaganski from comment #3)
> Thank you Fridrich! We will see that as soon as libvisio gets updated, right?
Depends, if you use a distro that has system libvisio, just to build the current git master and install it on your system could work. As long as you use the same install path as the system one. If you are on Windows, you will have to wait for official release and its integration to libreoffice.