Created attachment 132299 [details] Source File and utf8-Concordance.sdi for reproducing the bug Repro Steps: - Open the srcFile.odt, go to end of file - Insert -> Indexes and Tables -> Indexes and Tables -> Index/Table - In the Type box, select "Alphabetical Index". - In the Options area, select the Concordance file check box. - Click the File button, and then choose Open - Open the utf8-Concordance.sdi File - Click 'OK'/open the concordance file //utf8-Concordance.sdi As shown at the end of the srcFile, all the English words are indexed with page number while all other non-English unicode words are ignored. This makes concordance file unusable with non-english text!
Not reproduced. I do get Bắc Việt, Cựu Kim Sơn and Nguyễn Khánh in the index. Perhaps you should try 5.3. Arch Linux 64-bit, KDE Plasma 5 Version: 5.3.2.2 Build ID: 5.3.2-1 CPU Threads: 8; OS Version: Linux 4.10; UI Render: default; VCL: kde4; Layout Engine: new; Locale: fi-FI (fi_FI.UTF-8); Calc: group Set to NEEDINFO. Change back to UNCONFIRMED, if the problem persists in 5.3. Change to RESOLVED WORKSFORME, if the problem went away.
Please note. This bug might not be reproduced on Linux, as one of the developer has reported, but it is a bug on MS-Windows platform. The bug is reproducible in both LibreOffice 5.2.6 and 5.3 for all Windows XP, 7,8, and 10.
You are right. I repro on Win. Version: 5.4.0.0.alpha0+ (x64) Build ID: 193f8966135064a32164c9da08d01dab9c1fc15d CPU threads: 4; OS: Windows 6.19; UI render: default; TinderBox: Win-x86_64@42, Branch:master, Time: 2017-03-25_02:08:45 Locale: fi-FI (fi_FI); Calc: group
** Please read this message in its entirety before responding ** To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from http://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://kiwiirc.com/nextclient/irc.freenode.net/#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug
Retested with Libreoffice 5.4.6.2 on win10, win8, win7. Retested with Libreoffice 6.0.2 on win10, win8, win7. The bugs are still there.
Still present in: Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community Build ID: f344df3721b3fc5c9657fe5f7dce26af45de7bc6 CPU threads: 6; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win Locale: de-DE (de_DE); UI: en-US Calc: CL Should we detect the charset in the .sid-file? Or should we just assume that the files are in utf8. I tested it and it seems LO writes these files in utf8 without BOM.
Andreas Heinisch committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/f7a5be583f0b3b99f7e9def6be8be02ae645bd75 tdf#106899 - Import concordance file using appropriate charset It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Andreas Heinisch committed a patch related to this issue. It has been pushed to "libreoffice-7-3": https://git.libreoffice.org/core/commit/ddf9b2e23768a33041a3efe20840f1e11abff434 tdf#106899 - Import concordance file using appropriate charset It will be available in 7.3.1. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Andreas Heinisch committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/4f3b6eac84e0a5381f6a9637d29418ae9353deb5 tdf#106899 - Remove header definition of buffer size It will be available in 7.4.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.