Description: WordPerfect 4.2 documents with extended characters (non-USASCII) display the wrong characters in LibreOffice. The correct characters appear when the documents are opened in WordPerfect for Windows. The libwpd import filter includes a conversaion table for WP4.2 extended characters, but doesn't seem to use it. Sample file (will be attached after filing the report) https://www.dropbox.com/scl/fi/91jqfwdtw1akouxhw7d6k/EXTENDED.WPD?rlkey=hhaioasc48ycd1of2hm3cstiu&dl=0 Steps to Reproduce: 1.Open the linked file. Actual Results: LibreOffice displays †ÖÑÉáÇ, etc. Expected Results: LibreOffice should display àáäâç etc Reproducible: Always User Profile Reset: Yes Additional Info: This probably never worked correctly, but here is the version tested. Version: 7.6.0.3 (X86_64) / LibreOffice Community Build ID: 69edd8b8ebc41d00b4de3915dc82f8f0fc3b6265 CPU threads: 8; OS: Mac OS X 14.2.1; UI render: Skia/Metal; VCL: osx Locale: en-US (en_US.UTF-8); UI: en-US Calc: threaded
Created attachment 191531 [details] WordPerfect 4.2 file with extended characters
And, to say what is obvious, the WP4.2 character table is in the file libwpd_internal.cpp in the libwpd code. It seems to be correct, though I haven't checked it thoroughly. It seems not to be used, however, during import.
Confirm with Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 71c28942fbc7f36e5bcd46c5a6cdfbb3fcbcd6a0 CPU threads: 4; OS: Linux 6.2; UI render: default; VCL: gtk3 Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US Calc: threaded and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)
(In reply to em36 from comment #2) > It seems not to be used, however, during import. Id is definitely used. It is a question, how correctly. The libwpd code parses the first character in the file; and it finds an internal extended character 0xa0 (in WP1Parser::parseDocument). It calls WP1ExtendedCharacterGroup::parse, then WP1ContentListener::insertExtendedCharacter; and there, the character is mapped to UCS4 character u+2020 (†, dagger) using macRomanCharacterMap. Then it is simply converted to UTF-8, and passed to LibreOffice. The library does something wrong.