Bug 63721 - Add Support for Internal charset of Works 2
Summary: Add Support for Internal charset of Works 2
Status: RESOLVED FIXED
Alias: None
Product: Document Liberation Project
Classification: Unclassified
Component: General (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: osnola
URL:
Whiteboard: target:5.0.0
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-19 11:11 UTC by Urmas
Modified: 2015-05-19 08:29 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Codepage 1251 (934 bytes, application/vnd.ms-works)
2013-04-19 11:11 UTC, Urmas
Details
Codepage 1252 (930 bytes, application/vnd.ms-works)
2013-04-19 11:12 UTC, Urmas
Details
Screenshot (12.54 KB, image/png)
2013-06-26 11:02 UTC, Urmas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Urmas 2013-04-19 11:11:51 UTC
Created attachment 78231 [details]
Codepage 1251

Works for Windows 2.0 uses some kind of internal charset for storing documents.

It is not recognized while opening them with LO.

Additionally, standard font suffixes CE or Cyr are not used.
Comment 1 Urmas 2013-04-19 11:12:21 UTC
Created attachment 78232 [details]
Codepage 1252
Comment 2 Joel Madero 2013-05-02 14:48:37 UTC
@Urmas - is this an enhancement request ? Do we claim to support this anywhere? Also when you say we don't support them, the file opens so what should I be looking for to see that we don't support them?
Comment 3 Urmas 2013-05-04 21:12:56 UTC
The file is opened, but in the invalid encoding, so that is a bug.

As evidently from the file names, correct document contents is a representation of 1251 and 1252 codepages.
Comment 4 Thomas Hackert 2013-06-20 08:32:01 UTC
Hello Urmas, *,
I still seem not able to understand, what you want us to look for ... :(

If I open your attached documents, they are set to "English (USA)" and only the font seems to be different. In the first document I see "Courier New Cyr" as font, in the second one it is set to "Courier".

What exactly is the bug here? Which OS are you using (really Win 2.0, as mentioned in Comment #0? This is rather old, isn't it ;? )?

Tested with LO Version: 4.1.0.1
Build ID: 1b3956717a60d6ac35b133d7b0a0f5eb55e9155 under Debian Testing AMD64 with installed Germanophone lang- as well as helppack ... ;)

If you could us provide w/ further information (be it a stop-by-step instruction,, be it screenshots or the like), so we can test it ... ;)
TIA
Thomas.
Comment 5 retired 2013-06-25 23:07:12 UTC
Setting to NEEDINFO (also as of Comment 4). Been reading through this. Also not sure what to make of this.

Urmas: I think it would be the easiest to post a screenshot showing the correct and the bugged version and makr the difference to look for. Than this should be easy to confirm.
Comment 6 Urmas 2013-06-26 11:02:16 UTC
Created attachment 81462 [details]
Screenshot

The above image is a LO screenshot.
The below image is a CP1251 screenshot.

As you can see they are not identical.

Also, note that LO is displaying Western characters despite them being formatted with a Cyrillic font.
Comment 7 osnola 2013-06-28 08:00:47 UTC
Hello,
actually libwps uses only DOS_850 encoding for MicrosoftWorks MSDos 1-3 and MicrosoftWorks Windows 2.0 (*). 

We can probably use the same method than in MicrosoftWorks Windows 3.0 to check for different encodings in MSDos 3 and Windows 2 files ( but revert to DOS_850 if no different encoding is found ), Urmas can you check if the function unicodeFromCP1251 visible in https://sourceforge.net/p/libwps/code/ci/master/tree/src/lib/libwps_tools_win.cpp 
defines the correct encoding.

    osnola


(*) in fact, libwps does not know how to differentiate a Microsoft Works MSDos 3 file from a Microsoft Works Windows 2.0 file :-~

Note:
- it will probably be more appropriated to post this bug in https://sourceforge.net/p/libwps/bugs/
Comment 8 Urmas 2013-08-03 15:11:19 UTC
I was unable to create any bugs for libwps on SourceForge.

The codepage is stored at the word @10h, shifted 4 bits left.

The encoding of Windows text is depending on its font, so it's not mutually exclusive with Windows codepages.

There should be an OEM->ANSI codepages correspondence table to implement the conversion properly.

So it should be File--(866,437,etc)-->Windows encoding--(1250,1251,etc)-->Unicode.

The creator version (@2h) > 10000 could be used to determine whether v.2 file was created in Windows.
Comment 9 osnola 2013-08-04 08:50:16 UTC
Hello Urmas,
>I was unable to create any bugs for libwps on SourceForge.
I just check the permissions of libwps on SourceForge, it may probably work now...

> The codepage is stored at the word @10h, shifted 4 bits left. ... Unicode.
I am not sure that I understand this part, so I will try to contact you by email.

> The creator version (@2h) > 10000 could be used to determine whether v.2 file
> was created in Windows.
Ok, but what happens if a windows version exports the file as a Dos file, does it set the creator version to a number 0,1,da1 or to a windows version number ?
Comment 10 QA Administrators 2014-06-01 21:30:49 UTC
Dear Bug Submitter,

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team
Comment 11 QA Administrators 2014-07-08 17:18:34 UTC
Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INVALID due to inactivity and a lack of information which is needed in order to accurately reproduce and confirm the problem. We encourage you to retest your bug against the latest release. If the issue is still present in the latest stable release, we need the following information (please ignore any that you've already provided):

a) Provide details of your system including your operating system and the latest version of LibreOffice that you have confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED and we will attempt to reproduce the issue. 
Please do not:
a) respond via email 
b) update the version field in the bug or any of the other details on the top section of FDO
Comment 12 Urmas 2014-07-09 02:43:31 UTC
Still present in master.
Comment 13 QA Administrators 2014-07-09 03:06:16 UTC
Back to UNCONFIRMED - never confirmed by an independent person.
Comment 14 steve 2014-10-28 00:20:15 UTC
This is a bug, where a dev has to read through 5 or more comments to get an idea of what the issue even is. From what I understand Opening the test file codepage 1251 with LO should produce an identical result as seen in the bottom part of the screenshot.

I can confirm this is not the case, thus setting this bug to NEW.

I'm not sure if Works for WIndows 2 files are something LO should be supporting, or if that is anything dev time should be spent on. Maybe dev department can elaborate on that, please.
Comment 15 Joel Madero 2014-10-28 01:56:41 UTC
Seems pretty obvious that this is an enhancement request. Marking as such.
Comment 16 osnola 2015-02-08 15:46:15 UTC
(In reply to Urmas from comment #12)
> Still present in master.

I am not sure that I understand:
- the first file: 1251.wps contains the characters:
> b020b120b220b320b420b520b620b720b820b920ba20bb20bc20bd20be20bf200d0a
> c020c120c220c320c420c520c620c720c820c920ca20cb20cc20cd20ce20cf200d0a
> ff20f620f720d020fd208320d220d320f020d420f2201120d620d720d820f4200d0a
> d920da20f820fb20a320db20dc20f920f120fc20f3201020df20fa20fe20f5200d0a
> 80208120822083208420852086208720882089208a208b208c208d208e208f200d0a
> 90209120922093209420952096209720982099209a209b209c209d209e209f200d0a
> a020a120a220a320a420a520a620a720a820a920aa20ab20ac20ad20ae20af200d0a
> e020e120e220e320e420e520e620e720e820e920ea20eb20ec20ed20ee20ef200d0a
in LibreOffice 4.4, it is converted using CP1251
http://en.wikipedia.org/wiki/Windows-1251 
as the font is 'Courier New Cyr' (which seems normal to me ),
- the cp12511252 seems to correspond above to the old libwps output for 
  1251.wps ( converted using DOS latin CP850 ) and below to the CP1251
  table.

So does the below picture of cp12511252 is a real picture of the 1251.wps
document or simple a picture of the CP1251 table ?

Concerning 1252.wps, which contains the characters:
>5f205f2027205f2022203a20c520d8205f2025205f203c205f205f205f205f200d0a
>5f20272027202220222007202d202d205f2054205f203e205f205f205f205f200d0a
>ff20f620f7205f20fd205f20b3201520f0206320f2203c20bf202d205220f4200d0a
>f8202b205f205f205f20e7201420fa20f120fc20f3203e205f205f205f20f5200d0a
>80208120822083208420852086208720882089208a208b208c208d208e208f200d0a
>90209120922093209420952096209720982099209a209b209c209d209e209f200d0a
>a020a120a220a320a420a520a620a720a820a920aa20ab20ac20ad20ae20af200d0a
>e020e120e220e320e420e520e620e720e820e920ea20eb20ec20ed20ee20ef200d0a
in LibreOffice 4.4, it is converted using the encoding CP850 ; a new
version of libwps (when it will be released ) will allow to change the
encoding back to CP1252 or to another encoding...