Description: I made an index of sentences in various languages for a book, using a different SDI file for each language. All worked as expected except the greek one, and you can easily see why, if you edit the attached file with the SDIeditor of Swriter, witch also did not recognize an additional spended UTF8-BOM. - I regard this a bug, because *any* not completely outdated simple text editor (including good old Notepad) will recognize the file without any problem. Steps to Reproduce: 1.Edit the attached SDI file and you will see, that it cannot work. 2. 3. Actual Results: greek letters are scrambled Expected Results: Show the letters unscrambled and proceed the file Reproducible: Always User Profile Reset: No Additional Info: [Information automatically included from LibreOffice] Locale: en-GB Module: TextDocument [Information guessed from browser] OS: Windows (All) OS is 64bit: no
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. (Please note that the attachment will be public, remove any sensitive information before attaching it. See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.) I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
Created attachment 152086 [details] SDI File requested by Xisco Faulí
it works fine for me in Versió: 6.1.4.2 ID de la construcció: 1:6.1.4-0ubuntu0.16.04.1~lo2 Fils de CPU: 4; SO: Linux 4.15; Renderitzador de la IU: per defecte; VCL: gtk3; Configuració local: en-AU (ca_ES.UTF-8); Calc: group threaded Could you please paste the info from Help - about LibreOffice ? I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the information has been provided
About has: Version: 6.1.6.3 Build-ID: 5896ab1714085361c45cf540f76f60673dd96a72 CPU-Threads: 4; BS: Windows 6.1; UI-Render: Standard; Gebietsschema: de-DE (de_DE); Calc: group threaded It's obviously the Encoding witch scrambles the greek Letters.
On Windows 10 with 6.2.4 or with master sources updated today, I could reproduce this. I just opened Writer, then File/open and select sdi file. Here are console logs which may be relevant: Throwing InvalidHeaderException Throwing InvalidHeaderException warn:oox.storage:36748:26504:oox/source/helper/zipstorage.cxx:67: ZipStorage::ZipStorage exception opening input storage com.sun.star.io.IOException Throwing InvalidHeaderException Throwing InvalidHeaderException AbiDocument::isFileFormatSupported Found xml parser severity error Document is empty Throwing InvalidHeaderException warn:oox.storage:36748:26504:oox/source/helper/zipstorage.cxx:67: ZipStorage::ZipStorage exception opening input storage com.sun.star.io.IOException ... VisioDocument: version 0 Found xml parser severity error Document is empty
Sorry, don't take into account previous comment. Getting some info about sdi, I followed this link (in French) to open sdi correctly: https://dutailly.net/un-fichier-de-concordance-pour-indexer-un-document On Win10 with master sources updated today I have scrambled letters but no specific console logs.
UI comes from "createautomarkdialog.ui" This file is used by sw/source/ui/index/cnttab.cxx Search "encod" here gives 3 locations: 3815 void SwEntryBrowseBox::ReadEntries(SvStream& rInStr) 3816 { 3817 AutoMarkEntry* pToInsert = nullptr; 3818 rtl_TextEncoding eTEnc = osl_getThreadTextEncoding(); 3819 while (rInStr.good()) 3866 void SwEntryBrowseBox::WriteEntries(SvStream& rOutStr) 3867 { 3868 //check if the current controller is modified ... 3878 rtl_TextEncoding eTEnc = osl_getThreadTextEncoding(); 3879 for(std::unique_ptr<AutoMarkEntry> & rpEntry : m_Entries) 3956 IMPL_LINK_NOARG(SwAutoMarkDlg_Impl, OkHdl, Button*, void) 3957 { 3958 bool bError = false; 3959 if(m_pEntriesBB->IsModified() || bCreateMode) 3960 { 3961 SfxMedium aMed( sAutoMarkURL, 3962 bCreateMode ? StreamMode::WRITE 3963 : StreamMode::WRITE| StreamMode::TRUNC ); 3964 SvStream* pStrm = aMed.GetOutStream(); 3965 pStrm->SetStreamCharSet( RTL_TEXTENCODING_MS_1253 ); So it seems it doesn't try to detect the encoding of the file. Also, line 3965 seems weird to me, why fixed encoding RTL_TEXTENCODING_MS_1253 ?
I checked the attached file with online hexa editor and it doesn't contain BOM (should be the sequence 0xEF,0xBB,0xBF since it's UTF-8, see https://en.wikipedia.org/wiki/Byte_order_mark) Anyway I also gave a try with BOM file, it doesn't change anything, still scrambled letters but no surprise considering LO code (see my previous comment).
Keeping on debugging, I put some traces on 3 methods quoted in comment 7. I confirm that when opening the file, it goes into SwEntryBrowseBox::ReadEntries "osl_getThreadTextEncoding()" returns 1 (so "RTL_TEXTENCODING_MS_1252", see https://opengrok.libreoffice.org/xref/core/include/rtl/textenc.h?r=189abcf0#38) to "eTEnc" variable (type "rtl_TextEncoding") Forcing "eTEnc" to "RTL_TEXTENCODING_UTF8" allows to see Greek characters.
*** This bug has been marked as a duplicate of bug 108910 ***
Andreas Heinisch committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/7e6e0fd63eac57de0f76ab1efdb1283c22ad6e6c tdf#108910, tdf#125496 - read/write index entries using utf8 It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Andreas Heinisch committed a patch related to this issue. It has been pushed to "libreoffice-7-3": https://git.libreoffice.org/core/commit/4dc4dfe0f249f454291a2d57e28f11342421bb00 tdf#108910, tdf#125496 - read/write index entries using utf8 It will be available in 7.3.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.