Bug 44548

Summary: Functionality request for FILESAVE: add size optimization that applied during filesave (comment 9)
Product: LibreOffice Reporter: David <davidhodges.nz>
Component: WriterAssignee: Not Assigned <libreoffice-bugs>
Status: RESOLVED WORKSFORME    
Severity: enhancement CC: bugs, LibreOffice, sasha.libreoffice
Priority: medium    
Version: 3.4.4 release   
Hardware: All   
OS: All   
Whiteboard:
Crash report or crash signature: Regression By:
Attachments: My result
a test document that illustrates the fault
a much smaller document that demonstrates the fault

Description David 2012-01-07 00:16:06 UTC
so, for example 

"bad concussion - head hit by steel crate in motor accident." 

is saved as 

</FONT><FONT FACE="Arial, sans-serif">bad</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">concussion</FONT><FONT FACE="Arial, sans-serif">
– </FONT><FONT FACE="Arial, sans-serif">head</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">hit</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">by</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">steel</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">crate</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">in</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">motor</FONT><FONT FACE="Arial, sans-serif">
</FONT><FONT FACE="Arial, sans-serif">accident.</FONT><FONT FACE="Arial, sans-serif">

even though the font has not changed - it is the same for every word.

This is hugely inefficient and also makes it harder to find expressions using grep or other search tools - if I grep for "steel crate" it won't find it - instead I have to use the more complicated and inefficient grep -A1 steel |grep crate.
Comment 1 Rainer Bielefeld Retired 2012-01-08 09:42:25 UTC
Created attachment 55301 [details]
My result

NOT reproducible with "LibreOffice 3.4.5 RC1  - WIN7 Home Premium (64bit) German UI [Build ID: OOO340m1 (Build:501)]" and 
NOT reproducible with Parallel Dev-Installation of  "LibreOffice 3.5.0 Beta2- WIN7 Home Premium (64bit) German UI [Build-ID : 8589e48-760cc4d-f39cf3d-1b2857e-60db978] 

I opened a new WRITER-HTML document, copy/pasted textline from this page to document and saved it. Result: see attachment.

@reporter:
May I ask you to read  hints on <http://wiki.documentfoundation.org/BugReport> carefully?
Then please:
- Attach a sample document (not only screenshot)
- Attach screenshots with comments if you believe that that might explain the 
  problem better than a text comment. Best way is to insert your screenshots
  into a DRAW document and to add comments that explain what you want to show
- Contribute a step by step instruction containing every key press and every 
  mouse click how to reproduce your problem (and if possible how to created a 
  sample document from the scratch)
- add information 
   -- concerning your OS (Version, Distribution, Language)
  -- concerning your LibO version and localization (UI language, Locale setting)
  –- Libo settings that might be related to your problems 
  -- how you launch LibO and how you opened the sample document
  -- everything else crossing your mind after you read linked texts
Comment 2 David 2012-01-08 11:38:30 UTC
Created attachment 55307 [details]
a test document that illustrates the fault
Comment 3 David 2012-01-08 11:44:41 UTC
Fault is reproducible in LibreOffice 3.4.5 rc2 on Linux Ubuntu Oneiric Ocelot - English.

To reproduce: open the attached document (in Micro$oft Word format) in LibreOffice writer and save as HTML.
LibreOffice 3.4.4 and 3.4.5 rc2 produce identical output (compared using cmp).

The very first line of text "HM302 Applied Herbal & Natural Therapeutics..." is split up as documented in my first comment.
I notice however, that this does not happen for every single word in the file - there are several paragraphs that are not split up this way, starting at "Licorice also inhibits CYP3A4..."
Comment 4 Rainer Bielefeld Retired 2012-01-08 22:49:48 UTC
@David:
I see the effect in your sample, but that's an obscure document from obscure source. Please contribute an instruction how to reproduce the problem from the scratch.
Comment 5 David 2012-01-08 23:21:49 UTC
I tested this with 14 other Micro$oft Word format documents I had created with a slightly older version of LibreOffice or OpenOffice on Windows. All of them exhibited this fault to a greater or lesser extent (one of them only for one paragraph out of several dozen). I can't upload them as they all contain confidential information.
Comment 6 Rainer Bielefeld Retired 2012-01-08 23:46:28 UTC
May be those old LibO or OOo versions added all those lots of format information? Currently it seems that the bug is in those old versions and not in current LibO, so that there is nothing that can be fixed.
Comment 7 David 2012-01-08 23:51:08 UTC
Created attachment 55319 [details]
a much smaller document that demonstrates the fault

Here is a much simpler test case which I generated by copying and pasting two paragraphs of non-confidential text out of one of my confidential documents.
The first paragraph saves fine, the second exhibits the bug. I have no idea why as there does not appear to be anything different about the second paragraph in the original text.
Comment 8 David 2012-01-08 23:56:39 UTC
Yes, I think you are probably correct that the bug is in a previous version.
If I save the last attachment as plain text to remove all formatting, there is one strange character at the start of the first paragraph and if I remove that, then save as HTML both paragraphs are saved fine.
Comment 9 sasha.libreoffice 2012-03-15 03:49:32 UTC
I have opened file "HM302 Assignment 1 - IBS.doc", then saved as odt file and unpacked it. I see that information about style provided for each word separately. Obviously it is because so was in original document. But why not merge two or more adjacent pieces of text with the same style to one?
May be such behaviour is useful for something. And may be this bugreport should be transformed to Functionality Request. Something like "Add document internals optimization to Writer filesave filter".
Comment 10 Florian Reisinger 2012-08-14 14:01:05 UTC
Dear bug submitter!

Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.

To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement

Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.

Yours!

Florian
Comment 11 Florian Reisinger 2012-08-14 14:02:11 UTC
Dear bug submitter!

Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.

To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement

Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.

Yours!

Florian
Comment 12 Florian Reisinger 2012-08-14 14:06:50 UTC
Dear bug submitter!

Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.

To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement

Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.

Yours!

Florian
Comment 13 Florian Reisinger 2012-08-14 14:08:54 UTC
Dear bug submitter!

Due to the fact, that there are a lot of NEEDINFO bugs with no answer within the last six months, we close all of these bugs.

To keep this message short, more infos are available @ https://wiki.documentfoundation.org/QA/NeedinfoClosure#Statement

Thanks for understanding and hopefully updating your bug, so that everything is prepared for developers to fix your problem.

Yours!

Florian
Comment 14 sasha.libreoffice 2012-08-21 08:38:34 UTC
changing to Enhancement request, see comment 9
Comment 15 Roman Eisele 2012-08-22 07:49:33 UTC
IMHO this (see comment #9) is a valid and very reasonable enhancement request, therefore I change the status of this report to NEW.

If some further explanations are necessary, I can write a detailed specification for the feature requested ;-)
Comment 16 David 2012-08-22 08:32:11 UTC
This appears to be fixed in LibreOffice 3.6.0
Comment 17 sasha.libreoffice 2012-08-22 08:40:51 UTC
I tried to save as fodt and problem remains. Although with html situation is much better. But not completely. Unneeded style definition inside of text still present. But it may be because spaces have one style and words another.