Bug 153034 - Three wrong Greek characters in WordPerfect 5 import
Summary: Three wrong Greek characters in WordPerfect 5 import
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: medium normal
Assignee: Julien Nabet
URL:
Whiteboard: target:24.2.0 target:7.6.2
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-15 21:23 UTC by em36
Modified: 2023-09-19 09:54 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
WPDOS 5.1 file with Greek characters affected by issue (669 bytes, application/vnd.wordperfect)
2023-05-18 00:18 UTC, em36
Details
The GREEKWP5.WP file opened in WPWin and saved to DOCX format (11.98 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-05-18 00:18 UTC, em36
Details
GREEKWP5.WP opened in LibreOffice and saved as ODT (13.11 KB, application/vnd.oasis.opendocument.text)
2023-05-18 00:19 UTC, em36
Details
screenshot with master sources (7.77 KB, image/png)
2023-09-09 12:47 UTC, Julien Nabet
Details
screenshot with master sources + patch (7.05 KB, image/png)
2023-09-09 12:52 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description em36 2023-01-15 21:23:14 UTC
Description:
When importing Greek text from WordPerfect 5.x documents the WP characters 8,38 and 8,39 produce the wrong unicode symbols. This problem does not occur when importing from WP 6.x and later.

The wrong characters are:

WP character 8,38, SIGNMA terminal (i.e. upper-case terminal Sigma, which should be unicode 0x03a3 (Greek Capital Letter Sigma). In WP 5 import, the character unicode 0x03f9 is used instead; this is a different character, the Greek Capital Lunate Sigma, which has NOT a terminal Sigma. The correct character is present in the WP6 set.

WP character 8,39, sigma terminal, should be unicode 0x03c2, Greek small letter final sigma. In WP 5 import, the wrong character is used: 0x03db, an ordinary (not terminal) signa. The correct character is imported from WP 6+ documents.

The error seems to derive from a mistaken commit in libwpd, as documented here:

https://sourceforge.net/p/libwpd/tickets/22/

It seems that the changes were made in order to accommodate a program called Printer Polyglott. But changes made to accommodate that obsolete program should not be carried through to LibreOffice.

It seems that the developers of libwpd will not fix this error, so perhaps LibreOffice can fix it?

Steps to Reproduce:
1. Either open WordPerfect 5.1 CHARACTR.DOC in LibreOffice, or create a WP file in WordPerfect 5.x with WP characters 8,38 and 8,39.
2. Open the WP 5.x file in LibreOffice.
3.

Actual Results:
The wrong unicode characters are in the converted WP 5 document for WP characters 8,38 and 8,39.

Expected Results:
The correct characters appear (as they do when converted from WP 6.x+)


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 7.4.4.2 / LibreOffice Community
Build ID: 85569322deea74ec9134968a29af2df5663baa21
CPU threads: 8; OS: Mac OS X 13.1; UI render: default; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 Julien Nabet 2023-01-16 07:02:40 UTC
David/Fridrich: do you think the patch https://sourceforge.net/p/libwpd/code/ci/0bacfbb3e035174308cb7dd87acfca320dda3912 can be reverted in libwpd or should we just add a patch on LO to revert it only in LO? (or perhaps you got another idea?)
Comment 2 em36 2023-01-16 20:42:59 UTC
There are three lines with changes in the original commit. I think the first and third lines need to be reverted; but I think the second line MAY correct a real error. I would have take another look at the WP6 code to be certain.
Comment 3 em36 2023-05-18 00:18:02 UTC
Created attachment 187360 [details]
WPDOS 5.1 file with Greek characters affected by issue
Comment 4 em36 2023-05-18 00:18:54 UTC
Created attachment 187361 [details]
The GREEKWP5.WP file opened in WPWin and saved to DOCX format
Comment 5 em36 2023-05-18 00:19:29 UTC
Created attachment 187362 [details]
GREEKWP5.WP opened in LibreOffice and saved as ODT
Comment 6 em36 2023-05-18 00:23:46 UTC
I'd like to revive this bug. I've attached three files:

GREEKWP5.WP - a WPDOS 5.1 document containing the four Greek characters relevant to this issue

GREEKWP5fromWPWin.docx - the same WPDOS 5.1 file, opened in WordPerfect for Windows 2021 and saved from WPWin in DOCX format, showing the correct Unicode mappings of the characters.

GREEKWP5.WP.odt - the same WPDOS 5.1 file opened in LibreOffice, and saved in ODT format, showing the three wrong character mappings in libwpd as used by LibreOffice.

The wrong mappings were introduced many years ago by someone who wanted to print WP files in obsolete software. That is no reason to continue using the wrong mappings today.
Comment 7 Fridrich Strba 2023-05-18 04:47:07 UTC
(In reply to em36 from comment #6)
> The wrong mappings were introduced many years ago by someone who wanted to
> print WP files in obsolete software. That is no reason to continue using the
> wrong mappings today.

I reverted the whole change from 2010. Now, can you cross-check and indicate whether I did not change too many things with my commit? I have no way to generate the documents now? If you find that I did too zealous change, please indicate which Unicode point I should replace by which one.

The original commit did these changes:
- replace lunate small sigma with small stigma
- replace one occurrence of capital Sigma by lunate capital sigma
- replace one occurrence of capital Ypsilon by small eta with tonos
- replace variant of small rho by rho with tonos
Comment 8 em36 2023-05-18 13:00:18 UTC
Thank you! I commented on this in the libwpd SourceForge site. The reversion makes three characters correct, but restores an error that was evidently fixed after the original bad commit. I've specified exactly which hex string to change in my comment on SourceForge.

And thank you for this quick response!
Comment 9 em36 2023-05-19 01:59:41 UTC
Just to repeat what I wrote on SourceForge: the latest commit leaves one character incorrect. I've posted the details in libwpd on SourceForge.
Comment 10 em36 2023-05-19 12:22:57 UTC
Thanks to Fridrich, this is now fixed. I hope the fix can be incorporated in the LibreOffice code before long.
Comment 11 Julien Nabet 2023-09-09 10:37:23 UTC
I've submitted this patch:
https://gerrit.libreoffice.org/c/core/+/156768

where I retrieved Fridrich's commits concerning this part.

No idea when libwpd 0.10.4 will be released but let's avoid to wait more time here for just 2 changed lines.
Comment 12 em36 2023-09-09 12:30:22 UTC
This is already fixed in 7.6.0.3. No need to do anything more about it.
Comment 13 Julien Nabet 2023-09-09 12:47:52 UTC
Created attachment 189459 [details]
screenshot with master sources

Here's the result I got with master sources updated today.
Comment 14 Julien Nabet 2023-09-09 12:52:29 UTC
Created attachment 189460 [details]
screenshot with master sources + patch

Here's the same export with master sources + the patch.
Comment 15 em36 2023-09-09 14:38:52 UTC
I'm sorry - you're right and I was wrong. (I used the wrong test file.) The patch is needed. Apologies for wasting bandwidth!
Comment 16 Julien Nabet 2023-09-09 14:40:07 UTC
(In reply to em36 from comment #15)
> I'm sorry - you're right and I was wrong. (I used the wrong test file.) The
> patch is needed. Apologies for wasting bandwidth!

No pb :-)
Comment 17 Commit Notification 2023-09-10 07:37:46 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/5424bcd28f89b3622f85783633c725be643a0595

tdf#153034: Three wrong Greek characters in WordPerfect 5 import

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Julien Nabet 2023-09-10 07:38:30 UTC
Cherry-pick on 7.6 waiting for review here:
https://gerrit.libreoffice.org/c/core/+/156736
Comment 19 Commit Notification 2023-09-19 09:54:17 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/ffbbc643fdac9ef23387f59373437a06a669fea7

tdf#153034: Three wrong Greek characters in WordPerfect 5 import

It will be available in 7.6.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.