Bug 120629 - Hebrew numbering sequence choice reverted to 1,2,3... when saving .doc files, export filter issue
Summary: Hebrew numbering sequence choice reverted to 1,2,3... when saving .doc files,...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:7.3.0
Keywords:
Depends on:
Blocks: RTL-CTL Numbering-Formats
  Show dependency treegraph
 
Reported: 2018-10-15 18:00 UTC by Eyal Rozenberg
Modified: 2021-06-24 19:49 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
ODT document for reproducing the bug (8.51 KB, application/vnd.oasis.opendocument.text)
2019-10-21 08:35 UTC, Eyal Rozenberg
Details
bug-120629RT2010.doc: this needs to be fixed as an import first (33.50 KB, application/msword)
2020-03-18 07:29 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2018-10-15 18:00:54 UTC
Description:
If you create a numbering style using the Hebrew letter sequence, and save it to a .doc file (Word 97-2003) - the style reverts to regular numbers (1,2,3,...) when you open it.

Steps to Reproduce:
1. Create a new LO Writer document
2. Set the paragraph direction to RTL (this may not be necessary)
3. Create a new numbering style using Hebrew letters (א,ב,ג,...י,יא,יב,...)
4. Set the paragraph to the new style
5. Save the document in Word 97-2003 format (.doc)
6. Close the document
7. Open the document you saved

Actual Results:
The single paragraph is numbered 1.

Expected Results:
The single paragraph should be numbered א.


Reproducible: Always


User Profile Reset: No



Additional Info:
It's likely a numbering style, whose name begins with WWNum, has been created. If you change that numeral sequence choice there, save close and reopen - you get the same effect.

Seen with:

Version: 6.2.0.0.alpha0+
Build ID: ad6adb1bfadf49af3187a0bb3ceffbf355e9eed1
CPU threads: 4; OS: Linux 4.9; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-09-29_02:45:20
Locale: en-US (en_IL); Calc: threaded

... but I believe this must be an old bug.
Comment 1 Eyal Rozenberg 2018-10-15 18:02:35 UTC
The bug also manifests when you use Arabic letters as the numeral sequence - but _not_ if you use Latin letters (A,B,C,...)
Comment 2 V Stuart Foote 2018-10-17 13:25:42 UTC
And, what happens if saving to OOXML .docx? 

Otherweise, believe for bug 66212 we provided Hebrew native numbering [1], but it must be selected not hand built as in 3. of STR. 

Or, is there an issue with that?

=-ref-=
https://cgit.freedesktop.org/libreoffice/core/commit/?id=08fb6d73f8c22d98ab806dd93f4afe3f78b4ff83
Comment 3 Eyal Rozenberg 2018-10-17 13:44:36 UTC
(In reply to V Stuart Foote from comment #2)
> And, what happens if saving to OOXML .docx? 

Oh, I guess I should have mentioned that. This issue does NOT manifest when saving to OOXML/.docx . At least - not in that I know of, and not in simple cases like the one described here.

I also don't quite understand what you mean by "selected" as opposed to "hand built as in 3 of STR".
Comment 4 V Stuart Foote 2018-10-17 15:18:55 UTC
OK, confirmed on Windows 10 Home 64-bit en-US with
Version: 6.2.0.0.alpha0+ (x64)
Build ID: b63d48a146c3615f56b6ec83361b3c02ebcbb215
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-10-14_01:02:47
Locale: en-US (en_US); Calc: threaded

With Tools -> Language Settings -> Languages: Complex Text layout set Hebrew

1. New custom numbering style
2. select the customize tab
3. number drop list select Hebrew numbering (added for bug 66212 [1])
4. apply the custom style to a series of paragraphs
5. save to .fodt
6. save copy as .docx and also as .doc
7. close

8. open the .docx version in MS Word, exports correctly with Hebrew numbering intact
9. open the  .doc version in MS Word, incorrect export--numbering shows 1.2.3.

10. open the .docx version in Writer, imports correctly as in .fodt
11. open the .doc version in Writer, incorrect 1.2.3. numbering (probably not exported).

So the export filter to .DOC is dropping the numbering. Guess there is additional work to be done. Also related, applying line numbering using the Hebrew numbering does not survive export to OOXML or .DOC so that didn't make it into the export filters.

@Yossi, back to you?

=-ref-=
[1] https://cgit.freedesktop.org/libreoffice/core/commit/?id=08fb6d73f8c22d98ab806dd93f4afe3f78b4ff83
Comment 5 Xisco Faulí 2018-10-18 10:38:58 UTC
Reproduced back to

Version: 5.4.0.0.alpha0+
Build ID: 08fb6d73f8c22d98ab806dd93f4afe3f78b4ff83
CPU Threads: 4; OS Version: Linux 4.15; UI Render: default; VCL: gtk3; 
Locale: en-US (ca_ES.UTF-8); Calc: group

when bug 66212 was implemented
Comment 6 yossi zahn 2018-10-19 11:53:46 UTC
(In reply to V Stuart Foote from comment #4)
> ...
> 3. number drop list select Hebrew numbering (added for bug 66212)
> ...
> So the export filter to .DOC is dropping the numbering. Guess there is
> additional work to be done. Also related, applying line numbering using the
> Hebrew numbering does not survive export to OOXML or .DOC so that didn't
> make it into the export filters.
> 
> @Yossi, back to you?


This doesn't only affect NumberingType::NUMBER_HEBREW (the one which bug 66212 was requesting). It affects NumberingType::CHARS_HEBREW which existed prior to my fix and a few other numbering types also, as pointed out in bug 103345 and in Eyal Roseberg's report above.

Anyone have pointers as to where the code to translate MS Word numbering types to Libreoffice's internal types and vice versa lives?
Comment 7 QA Administrators 2019-10-21 02:29:34 UTC Comment hidden (obsolete)
Comment 8 Eyal Rozenberg 2019-10-21 08:35:27 UTC
Created attachment 155179 [details]
ODT document for reproducing the bug

A document for shortening the bug reproduction instructions.

To reproduce the bug, you only need to:

1. Open this attached ODT document; the numbering should be using a Hebrew letter (א)
2. Save it as an MS-Word DOC file (not DOCX)
3. Close the document.
4. Open the DOC file; the bug manifests if the numbering now uses a (Western-) Arabic numeral (1).
Comment 9 Eyal Rozenberg 2019-10-21 08:35:48 UTC
Bug still manifests with:

Version: 6.3.2.2
Build ID: 1:6.3.2-1
CPU threads: 4; OS: Linux 5.2; UI render: default; VCL: gtk3; 
Locale: he-IL (en_IL); UI-Language: en-US
Comment 10 Justin L 2020-03-18 07:29:42 UTC
Created attachment 158769 [details]
bug-120629RT2010.doc: this needs to be fixed as an import first

I used Word 2010 to create this .doc file, and LibreOffice doesn't import that using the Hebrew alphabet.
Comment 11 Justin L 2021-03-29 11:32:53 UTC
It looks like the importer need to understand
2.2.1.3 MSONFC This specifies the list of numbering formats that can be used for a group of automatically numbered objects.
msonfcHebrew1 0x2D hebrew1  (numbering)
msonfcArabic1 0x2E arabicAlpha
msonfcHebrew2 0x2F hebrew2 (alphabet)

MSONFC is also used for page numbers (sprmSNfcPgn - which is not working for DOC or DOCX).

MSONFC is also used for footnote numbers (sprmSNfcFtnRef -also not working for DOC or DOCX).

For this particular bug, it looks like this is stored in "rgnfc (9 bytes): An array of 8-bit MSONFC elements" which is a part of 2.9.159 NumRM.
It seems we ignore it completely on DOC import:
{NS_sprm::PNumRM::val,            nullptr},
Comment 12 Eyal Rozenberg 2021-03-29 11:40:36 UTC
(In reply to Justin L from comment #11)

Thanks! Were we ignoring it before 5.4.0.0 as well, or is this a regression?
Comment 13 Justin L 2021-03-30 13:12:42 UTC
(In reply to Eyal Rozenberg from comment #12)
> Thanks! Were we ignoring it before 5.4.0.0 as well, or is this a regression?
This is not a regression. (Nor is it likely to be fixed for DOC.)

DOCX problems were reported in bug 141341.
Comment 14 Eyal Rozenberg 2021-03-30 16:42:50 UTC
(In reply to Justin L from comment #13)
> (In reply to Eyal Rozenberg from comment #12)
> > Thanks! Were we ignoring it before 5.4.0.0 as well, or is this a regression?
> This is not a regression. (Nor is it likely to be fixed for DOC.)

Wait, why is this not likely to be fixed for DOC?
Comment 15 Justin L 2021-04-20 12:53:41 UTC
For import, WW8ListManager::GetSvxNumTypeFromMSONFC might be the place to look.
Comment 16 Justin L 2021-06-23 14:21:59 UTC
(In reply to Justin L from comment #15)
> For import, WW8ListManager::GetSvxNumTypeFromMSONFC might be the place to look.
Yup - I guess import wasn't ignoring MSONFC after all. Proposed DOC fix at https://gerrit.libreoffice.org/c/core/+/117736.
Comment 17 Commit Notification 2021-06-23 16:50:20 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1d50b8a7e93178e1ceec0bf95ed6794f73e2f184

tdf#120629 doc {im,ex}port: accept known numberingTypes

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Eyal Rozenberg 2021-06-23 17:16:46 UTC
Thanks, Justin :-)
Comment 19 Commit Notification 2021-06-24 19:48:16 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4493f2191d95a35f8a29cd16912a1378d3c21ced

tdf#120629 ms formats: better exporting of hindiVowels etc.

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Justin L 2021-06-24 19:49:57 UTC
Numbering is used in various places in Writer (and Word).
1.) Line numbering: no option that I see in Word to configure this: Always 1,2,3.
2.) Page numbering: this is a field in Word. AttributeOutputBase::GetNumberPara handles a limited subset for this. I didn't see documentation on the possible numtypes. Unofficially I found https://www.informit.com/articles/article.aspx?p=2455715&seqNum=5. LO doesn't import this field properly.
3.) Footnote numbering: This should now match DOCX support.
4.) Paragraph numbering: This should be the same as Footnote numbering.
5.) Captions can also be numbered. Export works OK.