Bug Hunting Session
Bug 49889 - Exporting Text format with Layout
Summary: Exporting Text format with Layout
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.3 release
Hardware: Other All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Save-Text
  Show dependency treegraph
 
Reported: 2012-05-13 17:26 UTC by Matthew B
Modified: 2017-06-17 19:43 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Illustration of problem with converting richtext to plaintext. Possible Solution (14.29 KB, application/vnd.oasis.opendocument.text)
2012-05-13 17:26 UTC, Matthew B
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew B 2012-05-13 17:26:33 UTC
Created attachment 61591 [details]
Illustration of problem with converting richtext to plaintext.  Possible Solution

http://ask.libreoffice.org/question/2615/preserve-paragraphline-spacing-when-converting

Is there a way to convert a document with rich text (.doc, .odx, .rtf) to plain text (.txt) AND keep the original paragraph alignment?

[document attached explaining this next paragraph:
It doesn't have to be perfect. Keeping only the alignment of the paragraphs first line by means of spaces or tabs would be ok.  Preferably, however, there would be a way to tell the script the anticipated dimensions of where the softline breaks fall in the text (ie 1" margins for a 8.5x11 paper, the program would calculate where to throw the hard line breaks, which would be at 7.5 inches.)   The bad thing, though, is if the user changes the font size or font, this will throw off the formatting.  So, it might be best to stick with only inserting spaces for the first line relative to the indentation in the rich text document.]

If using spaces/tabs to preserve some part of the rich text layout, would there be a method to distinguish between the tabs/spaces which are part of the original document and those which are merely used to preserve the original alignment?

For example, assume a rich text document has indentations created NOT using spaces or tabs, but only the indentation settings on the ruler.  It would convert to this:

Line one with no indent. And some text that wraps: "sed ut perspiciatis unde omnis iste natus error sit..."
     Line 2 with .5" indent. And some text that wraps: "sed ut perspiciatis unde 
     omnis iste natus error sit..."
         Line 3 with 1" indent. And some text that wraps: "sed ut perspiciatis 
         unde omnis iste natus error sit..."

As alluded to above, one possible solution would be to write a script that identifies the indentation of each line (ie .5", 1", 1.5"...) and places the correct number of spaces to fill that length.
Comment 1 Thomas Hackert 2013-06-24 15:48:52 UTC
Hello Matthew, *,
I am not sure, if I understand you right, but opening your attachment in Writer, using "Save as..." to convert it to txt and reopening the txt file in kate 3.8.4 (from KDE 4.8.4) under Debian Testing AMD64, the look of the txt seems nearly identical in both editors.

Would you be so kind to test it with a newer version of LO than 3.5.3, please? Which OS/architecture and which viewer/editor are you using for the text file? I have used LO Version: 4.1.0.1 Build ID: 1b3956717a60d6ac35b133d7b0a0f5eb55e9155 under Debian Testing AMD64 and kate 3.5.3, as written before ... ;) If you have done the conversion in a different way, it would be nice, if you can give us a clearer step-by-step description ... ;)

But as cited on http://en.wikipedia.org/wiki/Plain_text#Plain_text.2C_the_Unicode_definition:

<quote>
«Plain text represents character content only, not its appearance. »
<quote>

and

<quote>
«If the same plain text sequence is given to disparate rendering processes, there is no expectation that rendered text in each instance should have the same appearance. »
</quote>

, I am not really sure, if our developers could do anything about it ... :(
Sorry for the inconvenience
Thomas.
Comment 2 QA Administrators 2014-02-02 02:07:09 UTC
Dear Bug Submitter,

Please read the entire message before proceeding.

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team
Comment 3 QA Administrators 2014-02-26 19:31:37 UTC Comment hidden (obsolete)
Comment 4 Urmas 2014-02-27 07:45:24 UTC
An enhancement request.
Comment 5 Joel Madero 2014-11-06 00:33:31 UTC
Never confirmed so moving to UNCONFIRMED for QA team to evaluate. Thanks!
Comment 6 Robinson Tryon (qubit) 2014-12-22 05:23:51 UTC
(In reply to Matthew B from comment #0)
> Is there a way to convert a document with rich text (.doc, .odx, .rtf) to
> plain text (.txt) AND keep the original paragraph alignment?
> ... 
> For example, assume a rich text document has indentations created NOT using
> spaces or tabs, but only the indentation settings on the ruler.  It would
> convert to this:
> 
> Line one with no indent. And some text that wraps: "sed ut perspiciatis unde
> omnis iste natus error sit..."
>      Line 2 with .5" indent. And some text that wraps: "sed ut perspiciatis
> unde 
>      omnis iste natus error sit..."
>          Line 3 with 1" indent. And some text that wraps: "sed ut
> perspiciatis 
>          unde omnis iste natus error sit..."
> 
> As alluded to above, one possible solution would be to write a script that
> identifies the indentation of each line (ie .5", 1", 1.5"...) and places the
> correct number of spaces to fill that length.

Seems like one plausible approach to providing plaintext output that retains higher-fidelity to the original document than what the filter currently provides.

Status -> NEW