Bug 126256 - FORMATTING, FILEOPEN: Plain text document in (codepage 437) with commercial font displays incorrectly
Summary: FORMATTING, FILEOPEN: Plain text document in (codepage 437) with commercial f...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.2.5.2 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2019-07-06 20:41 UTC by robert
Modified: 2022-09-30 04:04 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Original input file - extension is .h-c, file is plain CP437 text (88.01 KB, application/octet-stream)
2019-07-06 20:41 UTC, robert
Details
File exported as (hybrid) PDF (101.92 KB, application/pdf)
2019-07-06 20:44 UTC, robert
Details
Correct file attached - shouldn't really matter (199.11 KB, text/plain)
2020-05-04 08:58 UTC, robert
Details
Screen print of page 16 showing the ragged formatting (11.61 KB, image/png)
2020-05-05 14:12 UTC, robert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description robert 2019-07-06 20:41:43 UTC
Created attachment 152605 [details]
Original input file - extension is .h-c, file is plain CP437 text

The attached document is generated in the original IBM codepage 437 font. When opening the document with Word XP (Word 2002 10.6866.6870) SP3, a "freebie" that came with my PC, Word has the decency of asking me if I want the document to be converted, suggesting "Encoded Text", and by selecting "MS-DOS", it opens without problems.

Changing the formatting, to use margins of 1 cm and use landscape mode, followed by setting the font to 6pt neatly display the document, defaulting the font to Courier New.

Changing the font to "Cubiculum" https://www.myfonts.com/fonts/johan-winge/cubiculum/ still displays the document, the formatting remains OK, although the box-characters of this font don't seem to have long enough ascenders and descenders to connect the vertical lines.

Now open the document in Writer (Version 6.5.25)...

Writer does not ask if the document might be in a non Windows/UTF-8/etc encoding, it just opens it, and that results into replacement of the box-characters by all kinds of accented characters. 

To open the file, one has to use Ctrl-O, select "Text - Choose Encoding (*.txt)", which, sigh, has also no option to actually show all files, type in "mnth.h-c", click Open. On the next dialogue the "Character Set" has to be set to "Western Europe (DOS/OS2-437/US), the "Default fonts" to "Courier New", and, probably irrelevant here, but "Language" to "English (UK)", with "Raragraph break" "CR & LF".

Clicking OK opens the document, and going to the same motions, setting the margins to 1cm, selecting Landscape, and reducing the font size to 6pt delivers a nicely looking document, with one hell of an annoyance, changing the font size after a "Select All" scrolls the document to page 9. 

However, a next Ctrl-A "Select All", followed by changing the font to "Cubiculum" completely destroys the formatting and I have no clue as to what happens with the font, the also attached PDF tells me that the document contains no less than four fonts...
Comment 1 robert 2019-07-06 20:44:39 UTC
Created attachment 152606 [details]
File exported as (hybrid) PDF
Comment 2 robert 2019-07-06 20:49:54 UTC
Please note that the Cubiculum font does not contain any box-characters, so the issue is one of font-substitution. Word seems to have no problems with this, Writer does!
Comment 3 V Stuart Foote 2019-07-22 13:55:14 UTC
You mistakenly had not selected the document to open in Writer (via the os FilePicker and all files *.*) and are then trying to assign the import filter to use.

The file picker is provided by the os/DE, the import filters are provided by LibreOffice.

Of course if you want the file to appear when not first selecting it--change its file type extension from ".h-c" to simply ".txt".

On selection of the file, applying the 'Text - Choose Encoding (.txt)' filter, and making the selection from the 'ASCII Filter Options' dialog, the text of the document is brought into LibreOffice and assigned a style of "Preformatted Text" on the "Default" page styles.

No selection of text/paragraph is necessary.

Rather, you should modify both styles (Paragraph, Page) to apply them to the imported text. Save to .ott template--useful if you have multiple instances of this format to import--and load its styles in advance.
Comment 4 V Stuart Foote 2019-07-22 14:25:47 UTC
removing the Font-substitution meta -- bug 103342 as this is a usage issue. 

Modifying the style(s) and applying it will replace all references to the not present "Cubiculum" font for the ODF document embedded into the 'hybrid' PDF. Potentially an issue in the PDF export filter if you chose to embed the fonts which does not appear to be the case as the File -> Properties -> Font tab of the hybrid PDF's document shows the font embedding not enabled.

The orginal 'document' has no assigned font or layout, just its encoding.

With that LibreOffice does the correct thing with appropriate import filter.
Comment 5 robert 2019-07-23 11:36:20 UTC
(In reply to V Stuart Foote from comment #3)
> You mistakenly had not selected the document to open in Writer (via the os
> FilePicker and all files *.*) and are then trying to assign the import
> filter to use.
> 
> The file picker is provided by the os/DE, the import filters are provided by
> LibreOffice.
> 
> Of course if you want the file to appear when not first selecting it--change
> its file type extension from ".h-c" to simply ".txt".
> 
> On selection of the file, applying the 'Text - Choose Encoding (.txt)'
> filter, and making the selection from the 'ASCII Filter Options' dialog, the
> text of the document is brought into LibreOffice and assigned a style of
> "Preformatted Text" on the "Default" page styles.
> 
> No selection of text/paragraph is necessary.
> 
> Rather, you should modify both styles (Paragraph, Page) to apply them to the
> imported text. Save to .ott template--useful if you have multiple instances
> of this format to import--and load its styles in advance.

Completely irrelevant reply. I open it like I would with Word. LO should allow me to follow exactly the same steps!
Comment 6 robert 2019-07-23 11:38:54 UTC
Comments are(In reply to V Stuart Foote from comment #4)
> removing the Font-substitution meta -- bug 103342 as this is a usage issue. 
> 
> Modifying the style(s) and applying it will replace all references to the
> not present "Cubiculum" font for the ODF document embedded into the 'hybrid'
> PDF. Potentially an issue in the PDF export filter if you chose to embed the
> fonts which does not appear to be the case as the File -> Properties -> Font
> tab of the hybrid PDF's document shows the font embedding not enabled.
> 
> The orginal 'document' has no assigned font or layout, just its encoding.
> 
> With that LibreOffice does the correct thing with appropriate import filter.

Nice to be so negative, calling it 'document'. It's plain text, one paragraph per line and a monospace font. Is that not a format?
Comment 7 V Stuart Foote 2019-07-23 12:55:13 UTC
Sorry, I am not intentionally being negative in setting this NAB, there just is no way to put it that does not point to incorrect usage of the os/DE and LibreOffice features for import of this type of text document.

(In reply to robert from comment #5)
> 
> Completely irrelevant reply. I open it like I would with Word. LO should
> allow me to follow exactly the same steps!

Sorry, but it would if you make a file association for an .h-c file extension in the os/DE or change the extension. But you'd still have to select the encoding.

(In reply to robert from comment #6)
> 
> Nice to be so negative, calling it 'document'. It's plain text, one
> paragraph per line and a monospace font. Is that not a format?

When imported via the "Text - Choose Encoding (.txt)" filter after selecting its encoding, LibreOffice will assign defaults for the 'Preformatted text' style laid out on the default 'Page' style. 

And if you adjust those style defaults (either in advance, or after import) the document will be correctly laid out with no need to select content and apply direct formatting.
 
That includes making an assignment of the mono spaced Cubiculum font mentioned for use as the 'Preformatted text' style if prefered to default Liberation Mono. 

Or you could select other common monospaced fonts Noto Mono, Courier New, Consolas, etc.  The allignment of the Box line drawing glyphs would be controlled by each font's metrics--so always some potential for gaps.

All of these are usage issues.

LibreOffice is doing the correct thing on import of the text with correct filter.

Would you please clearly state what you think the issue is.
Comment 8 robert 2019-07-28 20:53:52 UTC
1) Start Writer
2) Ctrl-O
3) Select days.h-c (or whatever you want to rename it to)
4) Click on "All files(*.*)" and select "Text - Choose encoding (*.txt)
5) Click Open
6) Select Character set: Western Europe (DOS/OS2-437/US)
7) Use Liberation Mono as default font
8) Language here is English (UK)
9) Select CR & LF
10) Click OK

Result pretty horrible to see, but that's OK:

11) Format > Page
12) Select A4
13) Select Landscape
14) Set all margins to 1cm
15) OK

It's getting better

16) Ctrl-A
17) Change Font size top 6 (and this flucking scrolls the document to page 13, WHY, WHY, WHY?

And sito-presto, the document shows as it should show. 12 pages, four columns per page, and one final blank page.

18) Ctrl-A
19) Change to font to Courier New

And other than that some vertical bars seem to display (and display only) unaligned on sub-100% zoom levels, the document is still OK.

20) Ctrl-A
21) Change font to Cubiculum (which does not contain box characters)

Bang, kaboom, the vertical bars are substituted by a character that's UTF encoded as 226/148/130 (decimal), all other box-characters, when selected, still claim to be Cubiculum

22) Select the very first two box-characters while in "Cubiculum" mode (the top-left corner and one horizontal bar)
23) Change the font to Courier New

What the flipping 'ell? On ONLY the first displayed line (containing only three of the four "boxed" table headings) all horizontal box-chars change to Courier New, EXCEPT the one next to the top-left corner. The top-T chars are left unchanged.

I don't have a clue as to what is happening, but there is absolutely a problem with font-substitutions. If there's a way to record (running W7-64) what's happening, I'm happy to do so!
Comment 9 QA Administrators 2019-07-30 03:20:23 UTC Comment hidden (obsolete)
Comment 10 Dieter 2020-02-03 19:46:20 UTC
Hello Robert, new major release of LibreOffice is available since this bug was reported. Could you please try to reproduce it with the latest version of LibreOffice from https://www.libreoffice.org/download/libreoffice-fresh/ ?I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' if the bug is still present in the latest version.
Comment 11 robert 2020-02-03 20:21:52 UTC
Following the steps in Comment 8 above, the results are exactly the same!
Comment 12 Dieter 2020-05-04 08:39:19 UTC
Robert, I've treid to follow your steps from comment 8, but failed at step 3: I cpuldn't finde the document days.h-c. Attachment in comment 0 is mnth.h-c
If I try step 4, I can't find "Select Text - choose encoding"

Tested with
Version: 7.0.0.0.alpha0+ (x64)Build ID: 8c8b3a4f83f67882b284ddc3b3fe10d3fe6dedf4CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: Skia/Raster; VCL: win; Locale: de-DE (de_DE); UI-Language: en-GBCalc: CL

Coud you please check the steps or give some more hints?
=> NEEDINFO
Comment 13 robert 2020-05-04 08:58:09 UTC
Created attachment 160311 [details]
Correct file attached - shouldn't really matter

Dieter,

mnth.h-c and days.h-c are for all intents and purposes interchangeable, both contain plain text with box characters from the old IBM CP437 code page. Anyway, I've also attached days.h-c (renamed to days.h-c.txt) just to make the comment match the file name.
Comment 14 robert 2020-05-04 09:14:15 UTC
As for the "If I try step 4, I can't find "Select Text - choose encoding"

It's in the "All files (*.*)" dropdown, the third entry under "Rich Text (*.rtf)", at least here on 

Version: 6.4.3.2 (x64)
Build ID: 747b5d0ebf89f41c860ec2a39efd7cb15b54f2d8
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: default; VCL: win; 
Locale: en-GB (en_GB); UI-Language: en-US
Calc: threaded
Comment 15 QA Administrators 2020-05-05 03:44:37 UTC Comment hidden (obsolete)
Comment 16 Dieter 2020-05-05 12:23:02 UTC
(In reply to robert from comment #8)

My result:

> 1) Start Writer
> ...
> 10) Click OK
> 
> Result pretty horrible to see

I agree
> 
> 11) Format > Page
> 12) ... 
> 16) Ctrl-A
> 17) Change Font size to 6
> 
> And sito-presto, the document shows as it should show. 12 pages, four
> columns per page, and one final blank page.

I have 17 pages (4 pages days in trip order, 4 pages days in distance order, 4 pages days in time order, 4 pages days in velocity order, one empty page)


> 
> 18) Ctrl-A
> 19) Change to font to Courier New
> 
> And other than that some vertical bars seem to display (and display only)
> unaligned on sub-100% zoom levels, the document is still OK.

I'm not sure about vertical bars. I can't see them
> 
> 20) Ctrl-A
> 21) Change font to Cubiculum (which does not contain box characters)

I don't have that font, so I can't do the next steps.

Two questions:
1. Is it possible to add a screencast to make your results more visible?
2. Please describe what results do you expect => NEEDINFO

I'm not an expert with .txt files, but perhaps it is worth to check relationship to other bug reports, like bug 61703.
Comment 17 robert 2020-05-05 14:11:25 UTC
(In reply to Dieter from comment #16)
> (In reply to robert from comment #8)
> 
> My result:
> 
> > 1) Start Writer
> > ...
> > 10) Click OK
> > 
> > Result pretty horrible to see
> 
> I agree

Good

> > And sito-presto, the document shows as it should show. 12 pages, four
> > columns per page, and one final blank page.
> 
> I have 17 pages (4 pages days in trip order, 4 pages days in distance order,
> 4 pages days in time order, 4 pages days in velocity order, one empty page)

days.h-c is a live file,it's correct that you now have 17 pages.
 
> > 18) Ctrl-A
> > 19) Change to font to Courier New
> > 
> > And other than that some vertical bars seem to display (and display only)
> > unaligned on sub-100% zoom levels, the document is still OK.
> 
> I'm not sure about vertical bars. I can't see them

Scale to 90% and look at the table on page 16. On my screen (Full HD, 1920x1080) the very top of the box has "dips" at the "T" box-chars - attached page-16.png. For what it's worth, this may not be a LO issue, but just a rendering issue due to a the screen resolution.

> > 20) Ctrl-A
> > 21) Change font to Cubiculum (which does not contain box characters)
> 
> I don't have that font, so I can't do the next steps.

Can you contact me directly - robert@prino.org
 
> Two questions:
> 1. Is it possible to add a screencast to make your results more visible?

Any suggestions as to what program I could use for this? 

> 2. Please describe what results do you expect => NEEDINFO
> 
> I'm not an expert with .txt files, but perhaps it is worth to check
> relationship to other bug reports, like bug 61703.
Comment 18 robert 2020-05-05 14:12:33 UTC
Created attachment 160381 [details]
Screen print of page 16 showing the ragged formatting

Screen print of page 16 showing the ragged formatting
Comment 19 QA Administrators 2020-05-06 03:45:26 UTC Comment hidden (obsolete)
Comment 20 Mike Kaganski 2020-07-07 07:10:58 UTC
The attached PDF (unfortunately, hybrid one -> opens as Writer document) still allows to see the resulting problem, which is definitely a substitution problem.

If you open the PDF in a viewer (I wish I could open it in Draw!), copy the text, and paste into Writer, you will be able to inspect the fonts of the characters - and then it's obvious that the first line (the top border of the table) is a mix of Calibri-Light (proportional-width font!) and Consolas; the second line (with the column titles) consists of OpenSymbol and Cubiculum (it shows its name even if you don't have the font, like myself); the third is Consolas...

So the problem is that LibreOffice mot only does not take monospace font character width into account when searching for substitutes - it even doesn't give a monospace substitution: it's clear that the system has necessary monospace fonts, but the resulting substitution may well be a proportional font like Calibri or OpenSymbol.

For the test, you may use Noto Mono instead of Cubiculum.

Setting to NEW.
Comment 21 robert 2020-09-03 21:58:46 UTC
Same steps as before, and the result is the same, still a total mess:

Version: 7.0.1.2 (x64)
Build ID: 7cbcfc562f6eb6708b5ff7d7397325de9e764452
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win
Locale: en-GB (en_GB); UI: en-US
Calc: threaded
Comment 22 QA Administrators 2022-09-30 03:52:27 UTC
Dear robert,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug