Bug 50607 - [TASK, METABUG] FILEOPEN, FILESAVE, FORMATTING : Japanese ruby-character handling is broken
Summary: [TASK, METABUG] FILEOPEN, FILESAVE, FORMATTING : Japanese ruby-character hand...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium critical
Assignee: Not Assigned
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 49073 44784
Blocks: CJK-METABUG DOCX-BUGS
  Show dependency treegraph
 
Reported: 2012-06-02 01:54 UTC by zephyrus00jp
Modified: 2016-12-14 03:32 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Files in various formats created by MS-WORD 2007 and LibreOffice 3.6 (357.12 KB, application/zip)
2012-09-11 13:47 UTC, zephyrus00jp
Details
Compatibility of files that use Ruby characters created by MS-Word 2007 and LibreOffice 3.6 (15.30 KB, application/vnd.oasis.opendocument.spreadsheet)
2012-09-11 13:49 UTC, zephyrus00jp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zephyrus00jp 2012-06-02 01:54:28 UTC
Hi,

Someone kindly referred to two bugzilla entries related to
the use of ruby-character in Japanese (and possibly others(.

Bug 49073 - FILEOPEN: Furigana (ruby text) and characters with them are missing in opened .docx files.


Bug 44784 - FORMATTING: Japanese Ruby Characters require an offset control in Asian Phonetic Guide

I am creating this maser bug entry regarding Ruby character handling for
bringing some hints from external sources.

At Documentation Foundation mail archive, there has been a (Japanese)
discussion thread about the functional deficiency of using Ruby
characters in OO and LO from the viewpoint of compatibility with MS
Word (.docx) and .odt format.

http://nabble.documentfoundation.org/MS-Word-td3303042.html#a3987515

It started on Sept 12, 2011 and has been running on and off.

In it, the broken compatibility of reading
Japanese documents with ruby characters was raised initially
and discussed.

Someone aka "NON" did a comprehensive study using a simple data and
LibreOffice 3.4.3 / Word 2010 under Windows 7.  (And he/she later
followed that LO 354 didn't change the behavior.)

OBSERVATION:

When Original Japanese data is created using LO343:

 - doc(Word 97/2000/2003) format
   Word can read the file, the display of ruby characters is OK.
   LO   can read the file, the display of ruby characters is OK.

 - docx (Office Open XML output) format
   Word can NOT read the file :-(
   LO   can     read the file, but ruby characters are gone! :-(

   So LO essentially loses the ruby content even if only LO is used
   if the poor user chooses docx format! :-(

 - docx(Word 2007/2010 XML output format
   Word can NOT read the file :-(
   LO   can     read the file, but ruby characters are gone! :-(

 - xml(Office 2003 XML) format
   Word can read the file, but ruby characters are gone! :-(
   LO   ditto.

  Strange/Worse: even in the case when file could be read, the font
  size changed from 10.5 to 12 for no obvious reason, and ruby
  characters became relatively smaller. (I also noticed this earlier.)

When the original data was created by Word 2010:

 - docx format
   LO can read the file, but ruby characters are gone! :-(
   Word can read the file and the display of ruby characters is OK.

 - doc format
   LO can read the file, and the display of ruby characters is OK.
   Word can read the file and the display of ruby characters is OK.

 - xml(Word 2003 XML) format
   LO can read the file, but ruby characters are gone! :-(
   Word can read the file and the display of ruby characters is OK.

 - xml(Word XML)
   LO can't read the file (Generic I/O error!) :-(
   Word can read the file and the display of ruby characters is OK.

 - odt format
   LO can read the file and display of ruby character is OK.
   Word can read the file and display of ruby character is OK.

His/her comment was that analysis routine of XML used by Word seems to
have a problem, and conventional DOC format handling of LO is OK, and
ODT produced by Word is OK also.

BUT, I hope everybody agrees that we have a serious usability problem 
in a heterogeneous environment where people pass document files
around.
(I raised the issue that this problem is a deal killer for OO and LO
in educational market in Japan.)

For those wishing to fix the problem and 
are afraid to do so because they are  unfamiliar with Japanese layout,
W3C's guide titled "Requirements for Japanese Text Layout"
is a treasure trove for those who
attempt proper Japanese layout in print, on screen, etc., and is 
mentioned
in https://bugs.freedesktop.org/show_bug.cgi?id=44784#c6

PS: I was bitten by this bug a few years ago 
when OO didn't pay attention to the following:
MS Word uses a different characters to de-limit ruby characters
from the main text characters under different LOCALE.
(I raised in the now-defunct OO bugzilla etc.)
This was also discussed in the Documentation Foundation mailing list for
Japanese, and someone pointed out the following fix
is in libreoffice core now.

http://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-3-5&id=b0539229b1a31925d28a13f9bbda6fd672729bd6

It seemed it was not put into the later OO core due to the
shuffle of hosting sites, etc. :-(
If the patch had been put into the mainline at earlier date, the
current problem may have surfaced earlier and thus had a chance of
being fixed earlier :-(
Comment 1 zephyrus00jp 2012-06-02 02:02:24 UTC
added 49073 and 44784 to "Depends on" field.

> I am creating this maser bug entry regarding Ruby character handling for

I meant to say "master" here, but not meaning to override the two bug entries, but
just intends to add the knowledge from the external source to the two bug entries.

TIA
Comment 2 Joel Madero 2012-09-07 15:10:24 UTC
Does this "meta" bug contain anything that isn't in 49073 and 44784 or is it just repetition/summarizing those bugs? If it's nothing new I'm going to close it as we don't really use meta bugs even if there are 10+ bugs associated with one issue. Developers in general don't like them.
Comment 3 zephyrus00jp 2012-09-11 13:43:52 UTC
I had difficulty to submit the following comment before the weekend because of
temporal loss of account:
--- begin quote ---
I think the problems mentioned in 50607, especially the file compatibility issues between MS-WORD are broader than 49073 alone.
But I do agree that we need more files to show the problems here that back up the issues and symptoms mentioned in 50607.

Over this weekend, I will try to upload the files created using MS WORD 2007 that shows the compatibility issues.

TIA
--- end quote

Now I am uploading a set of files.

One is a table that summarizes what happens when a file created by LibreOffice in a certain format is read by MS Word 2007, and vice versa.

The other is a zip file containing the files used to produce the summary.
(Note that some files have extra characters to explain the problem which were further noticed during the testing.)

Two PNG files show the dialog shown by LibreOffice 3.6 and MS Word 2007 when I select a line to add Ruby characters on top of the words on the line.
LibreOffice 3.6 incorrectly break down the words in a very awkward manner.
The breakdown shown in the MS Word 2007 case is the correct one. This can possibly wait until the incapability of reading document files is corrected.

TIA
Comment 4 zephyrus00jp 2012-09-11 13:47:16 UTC
Created attachment 66972 [details]
Files in various formats created by MS-WORD 2007 and LibreOffice 3.6

The files produced in various formats by MS Word 2007 and LibreOffice 3.6

One pdf file was created to show what the page is like in the correct
MS-WORD 2007 and LibreOffice 3.6 cases.

Two PNG files to show the dailog for typing Ruby characters. One is from MS-Word 2007, and the other is from LibreOffice 3.6
Comment 5 zephyrus00jp 2012-09-11 13:49:33 UTC
Created attachment 66974 [details]
Compatibility of files that use Ruby characters created by MS-Word 2007 and LibreOffice 3.6

The summary of what happens when the files created in various formats using MS-Word 2007 is read by LibreOffice 3.6, and vice versa.
Comment 6 ishikawa 2013-09-11 04:41:41 UTC
misspelling in the title fixed.