158793 – EDITING: WordPerfect 4.2 import shows wrong extended characters

Bug 158793 - EDITING: WordPerfect 4.2 import shows wrong extended characters

Summary: EDITING: WordPerfect 4.2 import shows wrong extended characters

Status:	NEW

Alias:	None

Product:	Document Liberation Project
Classification:	Unclassified
Component:	General (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:	https://sourceforge.net/p/libwpd/tick...
Whiteboard:	libwpd
Keywords:

Depends on:
Blocks:

Reported:	2023-12-20 13:27 UTC by em36
Modified:	2024-07-26 06:55 UTC (History)
CC List:	2 users (show)

See Also:	162186
Crash report or crash signature:

Attachments
WordPerfect 4.2 file with extended characters (72 bytes, text/plain) 2023-12-20 13:28 UTC, em36	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description em36 2023-12-20 13:27:28 UTC

Description:
WordPerfect 4.2 documents with extended characters (non-USASCII) display the wrong characters in LibreOffice. The correct characters appear when the documents are opened in WordPerfect for Windows.

The libwpd import filter includes a conversaion table for WP4.2 extended characters, but doesn't seem to use it.

Sample file (will be attached after filing the report)
https://www.dropbox.com/scl/fi/91jqfwdtw1akouxhw7d6k/EXTENDED.WPD?rlkey=hhaioasc48ycd1of2hm3cstiu&dl=0

Steps to Reproduce:
1.Open the linked file. 



Actual Results:
LibreOffice displays †ÖÑÉáÇ, etc.

Expected Results:
LibreOffice should display àáäâç etc


Reproducible: Always


User Profile Reset: Yes

Additional Info:
This probably never worked correctly, but here is the version tested.

Version: 7.6.0.3 (X86_64) / LibreOffice Community
Build ID: 69edd8b8ebc41d00b4de3915dc82f8f0fc3b6265
CPU threads: 8; OS: Mac OS X 14.2.1; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

Comment 1 em36 2023-12-20 13:28:09 UTC

Created attachment 191531 [details]
WordPerfect 4.2 file with extended characters

Comment 2 em36 2023-12-20 13:36:03 UTC

And, to say what is obvious, the WP4.2 character table is in the file  libwpd_internal.cpp in the libwpd code. It seems to be correct, though I haven't checked it thoroughly. It seems not to be used, however, during import.

Comment 3 raal 2023-12-30 18:21:22 UTC

Confirm with Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 71c28942fbc7f36e5bcd46c5a6cdfbb3fcbcd6a0
CPU threads: 4; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded

and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)

Comment 4 Mike Kaganski 2024-07-26 06:55:05 UTC

(In reply to em36 from comment #2)
> It seems not to be used, however, during import.

Id is definitely used. It is a question, how correctly.

The libwpd code parses the first character in the file; and it finds an internal extended character 0xa0 (in WP1Parser::parseDocument). It calls WP1ExtendedCharacterGroup::parse, then WP1ContentListener::insertExtendedCharacter; and there, the character is mapped to UCS4 character u+2020 (†, dagger) using macRomanCharacterMap. Then it is simply converted to UTF-8, and passed to LibreOffice.

The library does something wrong.