49889 – Exporting Text format with Layout

Bug 49889 - Exporting Text format with Layout

Summary: Exporting Text format with Layout

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	3.5.3 release
Hardware:	Other All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Save-Text
	Show dependency tree / graph

Reported:	2012-05-13 17:26 UTC by Matthew B
Modified:	2023-07-28 22:14 UTC (History)
CC List:	2 users (show)

See Also:	152469
Crash report or crash signature:

Attachments
Illustration of problem with converting richtext to plaintext. Possible Solution (14.29 KB, application/vnd.oasis.opendocument.text) 2012-05-13 17:26 UTC, Matthew B	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthew B 2012-05-13 17:26:33 UTC

Created attachment 61591 [details]
Illustration of problem with converting richtext to plaintext.  Possible Solution

http://ask.libreoffice.org/question/2615/preserve-paragraphline-spacing-when-converting

Is there a way to convert a document with rich text (.doc, .odx, .rtf) to plain text (.txt) AND keep the original paragraph alignment?

[document attached explaining this next paragraph:
It doesn't have to be perfect. Keeping only the alignment of the paragraphs first line by means of spaces or tabs would be ok.  Preferably, however, there would be a way to tell the script the anticipated dimensions of where the softline breaks fall in the text (ie 1" margins for a 8.5x11 paper, the program would calculate where to throw the hard line breaks, which would be at 7.5 inches.)   The bad thing, though, is if the user changes the font size or font, this will throw off the formatting.  So, it might be best to stick with only inserting spaces for the first line relative to the indentation in the rich text document.]

If using spaces/tabs to preserve some part of the rich text layout, would there be a method to distinguish between the tabs/spaces which are part of the original document and those which are merely used to preserve the original alignment?

For example, assume a rich text document has indentations created NOT using spaces or tabs, but only the indentation settings on the ruler.  It would convert to this:

Line one with no indent. And some text that wraps: "sed ut perspiciatis unde omnis iste natus error sit..."
     Line 2 with .5" indent. And some text that wraps: "sed ut perspiciatis unde 
     omnis iste natus error sit..."
         Line 3 with 1" indent. And some text that wraps: "sed ut perspiciatis 
         unde omnis iste natus error sit..."

As alluded to above, one possible solution would be to write a script that identifies the indentation of each line (ie .5", 1", 1.5"...) and places the correct number of spaces to fill that length.

Comment 1 Thomas Hackert 2013-06-24 15:48:52 UTC

Hello Matthew, *,
I am not sure, if I understand you right, but opening your attachment in Writer, using "Save as..." to convert it to txt and reopening the txt file in kate 3.8.4 (from KDE 4.8.4) under Debian Testing AMD64, the look of the txt seems nearly identical in both editors.

Would you be so kind to test it with a newer version of LO than 3.5.3, please? Which OS/architecture and which viewer/editor are you using for the text file? I have used LO Version: 4.1.0.1 Build ID: 1b3956717a60d6ac35b133d7b0a0f5eb55e9155 under Debian Testing AMD64 and kate 3.5.3, as written before ... ;) If you have done the conversion in a different way, it would be nice, if you can give us a clearer step-by-step description ... ;)

But as cited on http://en.wikipedia.org/wiki/Plain_text#Plain_text.2C_the_Unicode_definition:

<quote>
«Plain text represents character content only, not its appearance. »
<quote>

and

<quote>
«If the same plain text sequence is given to disparate rendering processes, there is no expectation that rendered text in each instance should have the same appearance. »
</quote>

, I am not really sure, if our developers could do anything about it ... :(
Sorry for the inconvenience
Thomas.

Comment 2 QA Administrators 2014-02-02 02:07:09 UTC

Dear Bug Submitter,

Please read the entire message before proceeding.

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team

Comment 3 QA Administrators 2014-02-26 19:31:37 UTC Comment hidden (obsolete)

Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INVALID due to inactivity and a lack of information which is needed in order to accurately reproduce and confirm the problem. We encourage you to retest your bug against the latest release. If the issue is still present in the latest stable release, we need the following information (please ignore any that you've already provided):

a) Provide details of your system including your operating system and the latest version of LibreOffice that you have confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED and we will attempt to reproduce the issue. 
Please do not:
a) respond via email 
b) update the version field in the bug or any of the other details on the top section of FDO

Comment 4 Urmas 2014-02-27 07:45:24 UTC

An enhancement request.

Comment 5 Joel Madero 2014-11-06 00:33:31 UTC

Never confirmed so moving to UNCONFIRMED for QA team to evaluate. Thanks!

Comment 6 Robinson Tryon (qubit) 2014-12-22 05:23:51 UTC

(In reply to Matthew B from comment #0)
> Is there a way to convert a document with rich text (.doc, .odx, .rtf) to
> plain text (.txt) AND keep the original paragraph alignment?
> ... 
> For example, assume a rich text document has indentations created NOT using
> spaces or tabs, but only the indentation settings on the ruler.  It would
> convert to this:
> 
> Line one with no indent. And some text that wraps: "sed ut perspiciatis unde
> omnis iste natus error sit..."
>      Line 2 with .5" indent. And some text that wraps: "sed ut perspiciatis
> unde 
>      omnis iste natus error sit..."
>          Line 3 with 1" indent. And some text that wraps: "sed ut
> perspiciatis 
>          unde omnis iste natus error sit..."
> 
> As alluded to above, one possible solution would be to write a script that
> identifies the indentation of each line (ie .5", 1", 1.5"...) and places the
> correct number of spaces to fill that length.

Seems like one plausible approach to providing plaintext output that retains higher-fidelity to the original document than what the filter currently provides.

Status -> NEW