Bug 106899 - Unicode Index Entries that are not English is ignored.
Summary: Unicode Index Entries that are not English is ignored.
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.2.6.2 release
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Andreas Heinisch
URL:
Whiteboard: target:7.4.0 target:7.3.1
Keywords:
Depends on:
Blocks: TableofContents-Indexes
  Show dependency treegraph
 
Reported: 2017-03-31 15:23 UTC by dlphan
Modified: 2022-02-23 08:50 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Source File and utf8-Concordance.sdi for reproducing the bug (25.94 KB, application/x-7z-compressed)
2017-03-31 15:23 UTC, dlphan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dlphan 2017-03-31 15:23:47 UTC
Created attachment 132299 [details]
Source File and utf8-Concordance.sdi for reproducing the bug

Repro Steps:
- Open the srcFile.odt, go to end of file
- Insert -> Indexes and Tables -> Indexes and Tables -> Index/Table
- In the Type box, select "Alphabetical Index".
- In the Options area, select the Concordance file check box.
- Click the File button, and then choose Open
- Open the utf8-Concordance.sdi File
- Click 'OK'/open the concordance file //utf8-Concordance.sdi

As shown at the end of the srcFile, all the English words are indexed with page number while all other non-English unicode words are ignored.

This makes concordance file unusable with non-english text!
Comment 1 Buovjaga 2017-04-17 12:08:58 UTC
Not reproduced. I do get Bắc Việt, Cựu Kim Sơn and Nguyễn Khánh in the index.

Perhaps you should try 5.3.

Arch Linux 64-bit, KDE Plasma 5
Version: 5.3.2.2
Build ID: 5.3.2-1
CPU Threads: 8; OS Version: Linux 4.10; UI Render: default; VCL: kde4; Layout Engine: new; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group

Set to NEEDINFO.
Change back to UNCONFIRMED, if the problem persists in 5.3. Change to RESOLVED WORKSFORME, if the problem went away.
Comment 2 dlphan 2017-04-17 15:15:06 UTC
Please note.  This bug might not be reproduced on Linux, as one of the developer has reported, but it is a bug on MS-Windows platform.  The bug is reproducible in both LibreOffice 5.2.6 and 5.3 for all Windows XP, 7,8, and 10.
Comment 3 Buovjaga 2017-04-17 15:44:27 UTC
You are right. I repro on Win.

Version: 5.4.0.0.alpha0+ (x64)
Build ID: 193f8966135064a32164c9da08d01dab9c1fc15d
CPU threads: 4; OS: Windows 6.19; UI render: default; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2017-03-25_02:08:45
Locale: fi-FI (fi_FI); Calc: group
Comment 4 QA Administrators 2018-04-18 02:34:11 UTC Comment hidden (obsolete)
Comment 5 dlphan 2018-04-18 03:58:27 UTC
Retested with Libreoffice 5.4.6.2 on win10, win8, win7.
Retested with Libreoffice 6.0.2 on win10, win8, win7.
The bugs are still there.
Comment 6 QA Administrators 2019-04-19 03:00:49 UTC Comment hidden (obsolete)
Comment 7 Andreas Heinisch 2022-01-09 16:18:31 UTC
Still present in:

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: f344df3721b3fc5c9657fe5f7dce26af45de7bc6
CPU threads: 6; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-US
Calc: CL

Should we detect the charset in the .sid-file? Or should we just assume that the files are in utf8. I tested it and it seems LO writes these files in utf8 without BOM.
Comment 8 Commit Notification 2022-01-20 06:50:32 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f7a5be583f0b3b99f7e9def6be8be02ae645bd75

tdf#106899 - Import concordance file using appropriate charset

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2022-01-31 12:52:21 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/ddf9b2e23768a33041a3efe20840f1e11abff434

tdf#106899 - Import concordance file using appropriate charset

It will be available in 7.3.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2022-01-31 13:23:56 UTC
Andreas Heinisch committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4f3b6eac84e0a5381f6a9637d29418ae9353deb5

tdf#106899 - Remove header definition of buffer size

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.