Bug 158793 - EDITING: WordPerfect 4.2 import shows wrong extended characters
Summary: EDITING: WordPerfect 4.2 import shows wrong extended characters
Status: NEW
Alias: None
Product: Document Liberation Project
Classification: Unclassified
Component: General (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL: https://sourceforge.net/p/libwpd/tick...
Whiteboard: libwpd
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-20 13:27 UTC by em36
Modified: 2024-07-26 06:55 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
WordPerfect 4.2 file with extended characters (72 bytes, text/plain)
2023-12-20 13:28 UTC, em36
Details

Note You need to log in before you can comment on or make changes to this bug.
Description em36 2023-12-20 13:27:28 UTC
Description:
WordPerfect 4.2 documents with extended characters (non-USASCII) display the wrong characters in LibreOffice. The correct characters appear when the documents are opened in WordPerfect for Windows.

The libwpd import filter includes a conversaion table for WP4.2 extended characters, but doesn't seem to use it.

Sample file (will be attached after filing the report)
https://www.dropbox.com/scl/fi/91jqfwdtw1akouxhw7d6k/EXTENDED.WPD?rlkey=hhaioasc48ycd1of2hm3cstiu&dl=0

Steps to Reproduce:
1.Open the linked file. 



Actual Results:
LibreOffice displays †ÖÑÉáÇ, etc.

Expected Results:
LibreOffice should display àáäâç etc


Reproducible: Always


User Profile Reset: Yes

Additional Info:
This probably never worked correctly, but here is the version tested.

Version: 7.6.0.3 (X86_64) / LibreOffice Community
Build ID: 69edd8b8ebc41d00b4de3915dc82f8f0fc3b6265
CPU threads: 8; OS: Mac OS X 14.2.1; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 1 em36 2023-12-20 13:28:09 UTC
Created attachment 191531 [details]
WordPerfect 4.2 file with extended characters
Comment 2 em36 2023-12-20 13:36:03 UTC
And, to say what is obvious, the WP4.2 character table is in the file  libwpd_internal.cpp in the libwpd code. It seems to be correct, though I haven't checked it thoroughly. It seems not to be used, however, during import.
Comment 3 raal 2023-12-30 18:21:22 UTC
Confirm with Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 71c28942fbc7f36e5bcd46c5a6cdfbb3fcbcd6a0
CPU threads: 4; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded

and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
Comment 4 Mike Kaganski 2024-07-26 06:55:05 UTC
(In reply to em36 from comment #2)
> It seems not to be used, however, during import.

Id is definitely used. It is a question, how correctly.

The libwpd code parses the first character in the file; and it finds an internal extended character 0xa0 (in WP1Parser::parseDocument). It calls WP1ExtendedCharacterGroup::parse, then WP1ContentListener::insertExtendedCharacter; and there, the character is mapped to UCS4 character u+2020 (†, dagger) using macRomanCharacterMap. Then it is simply converted to UTF-8, and passed to LibreOffice.

The library does something wrong.