Bug 123703 - FILEOPEN RTF Size of space sequence is different in Word and Writer
Summary: FILEOPEN RTF Size of space sequence is different in Word and Writer
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.4.0
Keywords: filter:rtf
Depends on:
Blocks: RTF
  Show dependency treegraph
 
Reported: 2019-02-25 12:59 UTC by NISZ LibreOffice Team
Modified: 2023-01-22 18:35 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the original document side by side in Word and Writer. (137.40 KB, image/png)
2019-02-25 13:00 UTC, NISZ LibreOffice Team
Details
The original file. (40.96 KB, application/rtf)
2019-02-25 13:00 UTC, NISZ LibreOffice Team
Details
test file saved as DOCX in MSO 2016 -> resulting normal spaces (13.36 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-08-07 16:39 UTC, László Németh
Details
single spaces and space sequences in RTF (37.42 KB, application/rtf)
2019-08-08 10:15 UTC, László Németh
Details
single spaces and space sequences in DOCX (11.96 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-08-08 10:15 UTC, László Németh
Details
RTF and DOCX space sequences are different length in Word with the default font settings (1.73 KB, image/png)
2019-08-08 10:18 UTC, László Németh
Details
Relevant Word's Options page (27.98 KB, image/png)
2023-01-21 10:13 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2019-02-25 12:59:29 UTC
Description:
Size of spaces changes when the attached file is opened in LibreOffice Writer

Steps to Reproduce:
    1. Open the attached RTF file in LibreOffice Writer.
    2. Open a copy in Microsoft Word.
    3. Compare the original file opened in Writer and Word.
    4. View the attached screenshot.

Actual Results:
Size of spaces is smaller in Writer like in Word.

Expected Results:
Size of spaces should be same in Word and Writer.


Reproducible: Always


User Profile Reset: No



Additional Info:
LibreOffice details:
Version: 6.3.0.0.alpha0+
Build ID: f22ce685260b8b7b792f1f132472c88c6b655589
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: hu-HU (hu_HU); UI-Language: en-US
Calc: threaded

Reproduciable with:
LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735
Comment 1 NISZ LibreOffice Team 2019-02-25 13:00:05 UTC
Created attachment 149576 [details]
Screenshot of the original document side by side in Word and Writer.
Comment 2 NISZ LibreOffice Team 2019-02-25 13:00:33 UTC
Created attachment 149577 [details]
The original file.
Comment 3 Durgapriyanka 2019-02-25 17:19:16 UTC
Thank you for reporting the bug. I can confirm the bug in

Version: 6.3.0.0.alpha0+
Build ID: b6b28931435e44aca92b8c0e1659f701e3ed1a87
CPU threads: 2; OS: Windows 6.1; UI render: default; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2019-01-30_06:57:04
Locale: en-US (en_US); UI-Language: en-US
Calc: threaded
Comment 4 NISZ LibreOffice Team 2019-02-28 13:00:30 UTC
After further examination, we determined the following:
In some cases, Microsoft Word uses wider space character sizes when editing an RTF file. Whether Word uses the wider or the normal space character size depends on the state of the \*\defchp entity.
- If the \*\defchp entity is not present, Word uses the wider space size;
- If the \*\defchp entity is present in the file without any parameters ( {\*\defchp} ), LibreOffice uses the normal space size;
- If the \*\defchp entity is present as follows, wider space sizes are used: {\*\defchp \fs22\loch\af31506\hich\af31506\dbch\af31505 }
- If the \*\defchp entity is present as follows, normal space sizes are used: {\*\defchp \fs22\loch\af31506\hich\af31506\dbch\af31506 }

The file attached to this bugreport uses the {\*\defchp \fs22\loch\af31506\hich\af31506\dbch\af31505 } entity, this is why we see wide space sizes. Changing the \dbch\af31505 to \dbch\af31506 also changes the size of the space character.

According to the RTF specifications, the \*\defchp entity specifies the default character level properties, dbch stands for double byte characters, and the af31505 and af31506 parameters are supposedly some associated font numbers (both referencing probably to a different version or subset of Times New Roman, but I am not entirely sure).

I am not sure how the information above could help solve the issue, but it seems that there might be at least two different outcomes (wide or normal space size) depending on that parameter, but we don't know, what other af numbers might indicate that wider space characters should be used.

By the way, the direct character settings of the text in the document do not matter, so even if Calibri 36pt font setting is actually used for the text in the document, the size of the space character (wide or normal) depends only on the state of the \*\defchp entity.
Comment 5 László Németh 2019-08-07 16:39:50 UTC
Created attachment 153205 [details]
test file saved as DOCX in MSO 2016 -> resulting normal spaces

it seems, this RTF feature is not OOXML compatible
Comment 6 László Németh 2019-08-08 10:14:45 UTC
Single spaces are same size in Writer and Word, but not the space sequences, see the attached test file x_from_word.rtf and x_from_word.docx and the screenshot.

Workaround for document templates: use tabulators instead of space sequences for positioning parts of the document.
Comment 7 László Németh 2019-08-08 10:15:25 UTC
Created attachment 153220 [details]
single spaces and space sequences in RTF
Comment 8 László Németh 2019-08-08 10:15:43 UTC
Created attachment 153221 [details]
single spaces and space sequences in DOCX
Comment 9 László Németh 2019-08-08 10:18:23 UTC
Created attachment 153222 [details]
RTF and DOCX space sequences are different length in Word with the default font settings
Comment 10 Commit Notification 2019-08-24 09:38:48 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/24b04db5a63b57a74e58a7616091437ad68548ac%5E%21

tdf#123703 RTF import: fix length of space character sequence

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2019-08-24 09:40:20 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/9ee96273a2090b63e0f579a1e9c9cef780756e6d%5E%21

tdf#123703 strip six-em-space (U+2006) at line break

It will be available in 6.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 László Németh 2019-08-24 09:41:04 UTC
tdf#123703 RTF import: fix length of space character sequence

A default RTF space character is longer by an extra six-per-em
space in a space sequence. To get the same layout of documents
formatted with consecutive spaces, insert six-per-em space before
every space in a space sequence. Extra spaces are removed during
the RTF export.

Note: This is a workaround to get the same layout in documents based on
RTF templates, often used for example by bussiness applications.
Instead of adding a new RTF specific core/text layout feature,
with this workaround the layout will be compatible with ODT and DOCX
documents, too. (In contrast, MSO's DOCX export messes up the document
layout silently, shortening the length of the space sequence.)
Comment 13 Mike Kaganski 2023-01-21 10:13:14 UTC
Created attachment 184819 [details]
Relevant Word's Options page

(In reply to NISZ LibreOffice Team from comment #4)

The actual cause here is the \fdbminor entry in the corresponding \f31505 font definition that is used as associated font in the \defchp. The \fdbminor tells that "font entry uses East Asian variation of the “Body” theme font".

Opening the file, Word sets "Compress only punctuation" setting on its Typography Options page.

This indeed needs a *proper* fix, using a *compatibility option* that would be set based on information that import filter reads, and that would affect text layout.
Comment 14 Mike Kaganski 2023-01-22 18:35:48 UTC
(In reply to Mike Kaganski from comment #13)
> The actual cause here is the \fdbminor entry

Or maybe \dntblnsbdb (and corresponding layout option "Balance SBCS characters and DBCS characters" in Options->Advanced).