| Summary: | SDI file with greek letters does not work | ||
|---|---|---|---|
| Product: | LibreOffice | Reporter: | Michael Herbst <herbst> |
| Component: | Writer | Assignee: | Andreas Heinisch <andreas.heinisch> |
| Status: | RESOLVED DUPLICATE | ||
| Severity: | normal | CC: | buzea.bogdan, serval2412, xiscofauli |
| Priority: | medium | ||
| Version: | 3.3.0 release | ||
| Hardware: | All | ||
| OS: | Windows (All) | ||
| See Also: | https://bugs.documentfoundation.org/show_bug.cgi?id=108910 | ||
| Whiteboard: | target:7.4.0 target:7.3.1 | ||
| Crash report or crash signature: | Regression By: | ||
| Bug Depends on: | |||
| Bug Blocks: | 89606 | ||
| Attachments: | SDI File requested by Xisco Faulí | ||
|
Description
Michael Herbst
2019-05-25 22:02:01 UTC
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. (Please note that the attachment will be public, remove any sensitive information before attaching it. See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.) I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided. Created attachment 152086 [details]
SDI File requested by Xisco Faulí
it works fine for me in Versió: 6.1.4.2 ID de la construcció: 1:6.1.4-0ubuntu0.16.04.1~lo2 Fils de CPU: 4; SO: Linux 4.15; Renderitzador de la IU: per defecte; VCL: gtk3; Configuració local: en-AU (ca_ES.UTF-8); Calc: group threaded Could you please paste the info from Help - about LibreOffice ? I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the information has been provided About has: Version: 6.1.6.3 Build-ID: 5896ab1714085361c45cf540f76f60673dd96a72 CPU-Threads: 4; BS: Windows 6.1; UI-Render: Standard; Gebietsschema: de-DE (de_DE); Calc: group threaded It's obviously the Encoding witch scrambles the greek Letters. On Windows 10 with 6.2.4 or with master sources updated today, I could reproduce this. I just opened Writer, then File/open and select sdi file. Here are console logs which may be relevant: Throwing InvalidHeaderException Throwing InvalidHeaderException warn:oox.storage:36748:26504:oox/source/helper/zipstorage.cxx:67: ZipStorage::ZipStorage exception opening input storage com.sun.star.io.IOException Throwing InvalidHeaderException Throwing InvalidHeaderException AbiDocument::isFileFormatSupported Found xml parser severity error Document is empty Throwing InvalidHeaderException warn:oox.storage:36748:26504:oox/source/helper/zipstorage.cxx:67: ZipStorage::ZipStorage exception opening input storage com.sun.star.io.IOException ... VisioDocument: version 0 Found xml parser severity error Document is empty Sorry, don't take into account previous comment. Getting some info about sdi, I followed this link (in French) to open sdi correctly: https://dutailly.net/un-fichier-de-concordance-pour-indexer-un-document On Win10 with master sources updated today I have scrambled letters but no specific console logs. UI comes from "createautomarkdialog.ui"
This file is used by sw/source/ui/index/cnttab.cxx
Search "encod" here gives 3 locations:
3815 void SwEntryBrowseBox::ReadEntries(SvStream& rInStr)
3816 {
3817 AutoMarkEntry* pToInsert = nullptr;
3818 rtl_TextEncoding eTEnc = osl_getThreadTextEncoding();
3819 while (rInStr.good())
3866 void SwEntryBrowseBox::WriteEntries(SvStream& rOutStr)
3867 {
3868 //check if the current controller is modified
...
3878 rtl_TextEncoding eTEnc = osl_getThreadTextEncoding();
3879 for(std::unique_ptr<AutoMarkEntry> & rpEntry : m_Entries)
3956 IMPL_LINK_NOARG(SwAutoMarkDlg_Impl, OkHdl, Button*, void)
3957 {
3958 bool bError = false;
3959 if(m_pEntriesBB->IsModified() || bCreateMode)
3960 {
3961 SfxMedium aMed( sAutoMarkURL,
3962 bCreateMode ? StreamMode::WRITE
3963 : StreamMode::WRITE| StreamMode::TRUNC );
3964 SvStream* pStrm = aMed.GetOutStream();
3965 pStrm->SetStreamCharSet( RTL_TEXTENCODING_MS_1253 );
So it seems it doesn't try to detect the encoding of the file.
Also, line 3965 seems weird to me, why fixed encoding RTL_TEXTENCODING_MS_1253 ?
I checked the attached file with online hexa editor and it doesn't contain BOM (should be the sequence 0xEF,0xBB,0xBF since it's UTF-8, see https://en.wikipedia.org/wiki/Byte_order_mark) Anyway I also gave a try with BOM file, it doesn't change anything, still scrambled letters but no surprise considering LO code (see my previous comment). Keeping on debugging, I put some traces on 3 methods quoted in comment 7. I confirm that when opening the file, it goes into SwEntryBrowseBox::ReadEntries "osl_getThreadTextEncoding()" returns 1 (so "RTL_TEXTENCODING_MS_1252", see https://opengrok.libreoffice.org/xref/core/include/rtl/textenc.h?r=189abcf0#38) to "eTEnc" variable (type "rtl_TextEncoding") Forcing "eTEnc" to "RTL_TEXTENCODING_UTF8" allows to see Greek characters. *** This bug has been marked as a duplicate of bug 108910 *** Andreas Heinisch committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/7e6e0fd63eac57de0f76ab1efdb1283c22ad6e6c tdf#108910, tdf#125496 - read/write index entries using utf8 It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. Andreas Heinisch committed a patch related to this issue. It has been pushed to "libreoffice-7-3": https://git.libreoffice.org/core/commit/4dc4dfe0f249f454291a2d57e28f11342421bb00 tdf#108910, tdf#125496 - read/write index entries using utf8 It will be available in 7.3.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. |