Bug 118796 - Lotus 1-2-3(Japanese) garbled characters
Summary: Lotus 1-2-3(Japanese) garbled characters
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.4.0.0.beta1
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, regression
Depends on:
Blocks: CJK-Japanese
  Show dependency treegraph
 
Reported: 2018-07-17 08:51 UTC by baffclan
Modified: 2019-02-10 23:10 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Screenshots, Import Lotus Files Dialog (126.32 KB, image/jpeg)
2018-07-17 08:54 UTC, baffclan
Details
Screenshots(AOO vs. LibO) (275.47 KB, image/jpeg)
2018-07-19 10:25 UTC, baffclan
Details
new result (23.63 KB, application/vnd.oasis.opendocument.spreadsheet)
2018-08-01 12:22 UTC, osnola
Details

Note You need to log in before you can comment on or make changes to this bug.
Description baffclan 2018-07-17 08:51:52 UTC
Description:
Lotus 1-2-3(Japanese) garbled characters
Choice does not have correct charset


Steps to Reproduce:
1. Drop a Japanese Lotus 1-2-3 file onto LibO icon on Desktop
2. Appear LibO window with "Import Lotus file" Dialog
3. Open a "Character set:" Dropdown


Actual Results:
4a. Japanese has only Japanese(Windows-932)
4b. Garbled characters when opened


Expected Results:
4. Japanese(Shift-JIS) is required for Japanese



Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.2.0.0.alpha0+
Build ID: fa62b9c4b857eab162282972bc33d2aa001f73e4
CPU threads: 4; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86@62-TDF, Branch:MASTER, Time: 2018-07-09_14:20:29
Locale: ja-JP (ja_JP); Calc: group threaded

Version: 6.1.0.1
Build ID: 378e26bd4f22a135cef5fa17afd5d4171d8da21a
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: ja-JP (ja_JP); Calc: group threaded


AOO has Japanese(Shift-JIS), there is no garbled character.
AOO420m1(Build:9800)  -  Rev. 1835436
Rev.1835436
Comment 1 baffclan 2018-07-17 08:54:17 UTC
Created attachment 143589 [details]
Screenshots, Import Lotus Files Dialog
Comment 2 m.a.riosv 2018-07-17 22:06:38 UTC
An screenshot doesn't help for test, please attach a sample test file, and set to unconfirmed again.
Comment 3 baffclan 2018-07-18 11:58:26 UTC
(In reply to m.a.riosv from comment #2)
The file I tried is a business file so I can not attach it.
I will search for attachable files.
Comment 4 baffclan 2018-07-19 10:25:51 UTC
Created attachment 143639 [details]
Screenshots(AOO vs. LibO)

(In reply to m.a.riosv from comment #2)
> An screenshot doesn't help for test, please attach a sample test file, and
> set to unconfirmed again.
I could find the file you can try.

STR:
1. Open a Apche OpenOffice Forum
2. https://forum.openoffice.org/ja/forum/viewtopic.php?f=10&t=651#p2353
3. downloads a bugdoc.zip, and expanded

LibO:
4. Drop a bugdoc.123 file onto LibO icon on Desktop
5. Appear LibO window with "Import Lotus file" Dialog
6. Open a "Character set:" Dropdown
7. Select a Japanese(Windows-932)
8. Garbled characters when opened

AOO:
4. Drop a bugdoc.123 file onto AOO icon on Desktop
5. Appear AOO window with "Import Lotus file" Dialog
6. Open a "Character set:" Dropdown
7. Select a Japanese(ShiftJIS)
8. No-garbled characters when opened
Comment 5 m.a.riosv 2018-07-19 10:41:40 UTC
Confirmed,
looks there are missing filters for Japanese, not those available on AOo.
Comment 7 himajin100000 2018-07-23 07:40:00 UTC Comment hidden (obsolete)
Comment 8 himajin100000 2018-07-23 07:48:44 UTC Comment hidden (obsolete)
Comment 9 Norbert Thiebaud 2018-07-23 14:57:19 UTC
confusion between the bibisect commit and the underlying commit.

the source commit is

commit b0067c89e6b2a4e29465d9da9a731ae30a66dce6
Author: osnola <alonso@loria.fr>
Date:   Sun Apr 2 09:42:59 2017 +0200

    libwps import filter improvements
    
      + some astyle modifications,
      + add .wk4 and .123 to the list of file extensions,
      + add support to open Lotus files protected by a password.
    
    Change-Id: I94d4afffd73f0999ff2b1958704cb3985fcd0cc9
Comment 10 himajin100000 2018-07-23 16:24:04 UTC Comment hidden (obsolete)
Comment 11 osnola 2018-07-24 08:10:51 UTC
Hello,
yes currently the libwps' filter does not support shift-JIS, one solution is to choose the old filter in the filter's dialog "Lotus-1-2-3" which must still exists:-~

After, it is possible to add support shift-JIS encoding in libwps but I will need a file with contains some SJIS characters and its corresponding screenshot (to check that the conversion is valid).

If I must compare the old and the libwps's filter: clearly the libwps's filter supports less encodings while it must be able to open password protected files, retrieve more styles, ... 

Note: There is also a big problem in the old filter ; in fact, a 123 file can contain (and does contain) many sub streams which can be stored in one consecutive block or decomposed in many sub block. The list of streams is stored at the end of the 123 file, the file begins with the main stream which contains the sheet's data, styles, ... (which explains why the old filter often works) ; but if a chart, a picture is inserted in a spreadsheet, the content of the file will become:
  main Zone (part 1)
  picture Zone
  main Zone (part 2)
  ...
  metadata Zone
  file structure (list of stream)
( so the old parser will begin to parse the main Zone (part 1) and will probably stop somewhere in the picture zone)
Comment 12 osnola 2018-08-01 12:22:54 UTC
Created attachment 143883 [details]
new result

I modified libwps ( see https://sourceforge.net/p/libwps/code/ci/42a5c4e27e3b2c7e4de1bcf045ede9a8737f559d/ ), so now it must be able to use Japanese(Shift-JIS) encoding. I still need to do same for wps v1-v4 filter and the new filter for QuattroPro qpw files.

Notes:
- concerning https://forum.openoffice.org/ja/forum/viewtopic.php?f=10&t=651#p2353, I am
  not sure that the problem pointed in this thread is fixed, ie. when I opened this
  file with OpenOffice and LibreOffice (using the old parser), saving the result seems
  to take forever ; but my versions of OpenOffice and LibreOffice are not the last
  ones...
- concerning libwps, I guess that we will need to use libicu to read more encodings, 
  ...
Comment 13 baffclan 2018-08-12 05:09:14 UTC
(In reply to osnola from comment #12)

I chose Japanese(Windows-932) in Win-x86_64@42, and after opening, there was not garbage character.

Version: 6.2.0.0.alpha0+ (x64)
Build ID: 0a1a4ffb4f87adff7fbbbc60202b6a0e42fedd0c
CPU threads: 4; OS: Windows 10.0; UI render: default; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-08-08_23:17:46
Locale: ja-JP (ja_JP); Calc: group threaded
Comment 14 osnola 2018-08-12 07:54:41 UTC
Hello,
yes normally, this must be ok in master, ie. I release a new version of libwps which  must allow to parse CP932 and CP950 encodings in its different filters and David integrated with https://gerrit.libreoffice.org/#/c/58704/ .

Note:
CP_037, CP_424, CP_437, CP_500, CP_737, CP_775,
	            DOS_850, CP_852, CP_855, CP_856, CP_857, CP_858, CP_860,
	            CP_861, CP_862, CP_863, CP_864, CP_865, CP_866,
	            CP_869, CP_874, CP_878, CP_875, CP_932, CP_936, CP_950,
	            CP_970, CP_1006, CP_1026,
	            EUC_JP,
	            ISO_2022_CN, ISO_2022_JP, ISO_2022_KR,
	            WIN3_ARABIC, WIN3_BALTIC, WIN3_CEUROPE,
	            WIN3_CYRILLIC, WIN3_GREEK, WIN3_HEBREW, WIN3_TURKISH,
	            WIN3_VIETNAMESE, WIN3_WEUROPE,
	            LICS,

	            MAC_ARABIC, MAC_CELTIC, MAC_CEUROPE, MAC_CROATIAN,
	            MAC_CYRILLIC, MAC_DEVANAGA, MAC_FARSI, MAC_GAELIC,
	            MAC_GREEK, MAC_GUJARATI, MAC_GURMUKHI, MAC_HEBREW,
	            MAC_ICELAND, MAC_INUIT, MAC_ROMAN, MAC_ROMANIAN,
	            MAC_THAI, MAC_TURKISH,
Comment 15 osnola 2018-08-12 08:00:21 UTC
Oops...

Note:
- I also create a branch of libwps to test the integration with libicu  
  (https://sourceforge.net/p/libwps/code/ci/testICU/tree/) 
  which may be able to read: 
                    CP_037, CP_424, CP_437, CP_500, CP_737, CP_775,
	            DOS_850, CP_852, CP_855, CP_856, CP_857, CP_858, CP_860,
	            CP_861, CP_862, CP_863, CP_864, CP_865, CP_866,
	            CP_869, CP_874, CP_878, CP_875, CP_932, CP_936, CP_950,
	            CP_970, CP_1006, CP_1026,
                    CP_1250, CP_1251, CP_1252, CP_1253, CP_1254, 
                    CP_1255, CP_1256, CP_1257, CP_1258,
	            EUC_JP,
	            ISO_2022_CN, ISO_2022_JP, ISO_2022_KR,

	            MAC_ARABIC, MAC_CELTIC, MAC_CEUROPE, MAC_CROATIAN,
	            MAC_CYRILLIC, MAC_DEVANAGA, MAC_FARSI, MAC_GAELIC,
	            MAC_GREEK, MAC_GUJARATI, MAC_GURMUKHI, MAC_HEBREW,
	            MAC_ICELAND, MAC_INUIT, MAC_ROMAN, MAC_ROMANIAN,
	            MAC_THAI, MAC_TURKISH,
   if you see some encodings which are missing, I can add them...
Comment 16 baffclan 2018-12-21 12:19:21 UTC
Cannot reproduce with Libreoffice 6.2.0RC1.
No problem with release version(RC1).

Version: 6.2.0.1 (x64)
Build ID: 0412ee99e862f384c1106d0841a950c4cfaa9df1
CPU threads: 4; OS: Windows 10.0; UI render: default; VCL: win; 
Locale: ja-JP (ja_JP); UI-Language: en-US
Calc: threaded
Comment 17 baffclan 2019-02-09 09:18:47 UTC
Cannot reproduce with LibO 6.2.0R.
Thanks for fixing this!
Comment 18 Xisco Faulí 2019-02-10 23:10:10 UTC
(In reply to baffclan from comment #17)
> Cannot reproduce with LibO 6.2.0R.
> Thanks for fixing this!

Thanks for retesting with the latest version.
Setting to RESOLVED WORKSFORME as the commit fixing this issue hasn't been identified.