Bug 118214 - FILEOPEN FILESAVE: handle \noproof(RTF) and w:noProof(DOCX) to disable spell checking on a run/style.
Summary: FILEOPEN FILESAVE: handle \noproof(RTF) and w:noProof(DOCX) to disable spell ...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:doc, filter:docx, filter:rtf
Depends on:
Blocks: RTF DOCX DOC DOCX-Character DOC-Character RTF-Character
  Show dependency treegraph
 
Reported: 2018-06-17 18:55 UTC by Phil Krylov
Modified: 2021-04-07 10:12 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
An RTF document to reproduce the bug (32.08 KB, text/rtf)
2018-06-17 19:07 UTC, Phil Krylov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phil Krylov 2018-06-17 18:55:33 UTC
Description:
The RTF specification says that language 1024 (0x400) denotes text without a language set. LibreOffice Writer treats \lang1024 in RTF as "Switch to system-default language" (English here) and does not switch off language proofing. Then, if one creates some text in Writer, sets its language to None and saves as RTF, Writer emits \lang255 for this text. \lang255 is not a valid (per RTF Specification) value. However, when an RTF text with \lang255 is open in Writer, it is treated as \lang1024 should be - that is, it sets text language to None.

Steps to Reproduce:
1. Open the attached RTF document (generated from MS Word).

Actual Results:
The word "proofeng" is visually marked as misspelled and when you put the caret on it is shows English (USA) as its language

Expected Results:
The word "proofeng" should not be marked as misspelled and its Language should be set to None


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.0.4.2
Build ID: 9b0d9b32d5dcda91d2f1a96dc04c645c450872bf
CPU threads: 4; OS: Mac OS X 10.9.5; UI render: default; 
Locale: en-US (en.UTF-8); Calc: group

A recent master build is still there:

Version: 6.2.0.0.alpha0+
Build ID: b292a27698e85fd9d60c03613c3b0c67835c4dc1
CPU threads: 4; OS: Mac OS X 10.9.5; UI render: default; 
TinderBox: MacOSX-x86_64@49-TDF, Branch:master, Time: 2018-06-06_23:25:55
Locale: en-US (en.UTF-8); Calc: group threaded
Comment 1 Phil Krylov 2018-06-17 19:06:08 UTC
The bug is also seen when opening DOC and DOCX document.

I am not sure if I should file another bug for the RTF export issue (and probably DOC/DOCX as well). I hope there is a single place where both import and export can be fixed.
Comment 2 Phil Krylov 2018-06-17 19:07:39 UTC
Created attachment 142829 [details]
An RTF document to reproduce the bug
Comment 3 Buovjaga 2018-06-24 17:15:29 UTC
Confirmed.

Arch Linux 64-bit
Version: 6.2.0.0.alpha0+
Build ID: 5b42a17dc99fba2ccf8dd8d0a8e0e4e836e30120
CPU threads: 8; OS: Linux 4.17; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group threaded
Built on June 22nd 2018

Arch Linux 64-bit
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 4 QA Administrators 2019-06-25 02:42:46 UTC Comment hidden (obsolete, spam)
Comment 5 Justin L 2021-04-05 08:50:49 UTC
Using Word 2016, loading a document with \lang1024, the language is English(UK), and spell-checking-on-the-fly is checking for British spelling. So LO is acting in the same manner as Word in this case.  [The problem is that LO doesn't handle \noproof.]


From include/i18nlangtag/lang.h
/*! use only for import/export of MS documents, number formatter maps it to
 *! LANGUAGE_SYSTEM and then to effective system language */
#define LANGUAGE_PROCESS_OR_USER_DEFAULT    LanguageType(0x0400)  //aka 1024

And of course as Phil indicated, this is what is returned by Language None:
#define LANGUAGE_NONE                       LanguageType(0x00FF)  //aka  255

DOCX:
LO exports "[None]" to DOCX uses w:lang w:val="zxx"
Microsoft doesn't really allow you to select either system default, or None as a language. It does have a flag to disable spell checking on a character run (w:noProof). However, LO does not import that.
    case NS_ooxml::LN_EG_RPrBase_noProof: // no grammar and spell checking, unsupported

Since MS doesn't seem to have a direct corollary to LO's Language:None, it makes sense to "invent" a language (zxx) or a number (255) so that LO can round-trip its own settings.
Comment 6 Justin L 2021-04-05 17:03:47 UTC
Proposed import fix at https://gerrit.libreoffice.org/c/core/+/113614
Comment 7 Justin L 2021-04-05 18:03:41 UTC Comment hidden (obsolete)
Comment 8 Justin L 2021-04-07 10:12:13 UTC
(In reply to Justin L from comment #6)
> Proposed import fix at https://gerrit.libreoffice.org/c/core/+/113614

Word supports the following scenario: 1) you set the language to German 2) you disable spellcheck for a piece of text 3) later you enable it again 4) Word knows it should do German spellcheck.
We loose data with this change and just change one problem for an other problem. What could be done instead is to add a new text portion property of type bool and then map Word's noproof to that one and the opposite on export. (And of course implement that bool into SW to disable spell checking of that work, with all the UI visibility that goes along with that.)