Bug 105274 - Idea: Add Support to Save Draw (and PDF) file to ODT format (without text boxes, PDF Reflow)
Summary: Idea: Add Support to Save Draw (and PDF) file to ODT format (without text box...
Status: RESOLVED DUPLICATE of bug 32249
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: needsUXEval
Depends on:
Blocks:
 
Reported: 2017-01-12 06:40 UTC by Kevin Suo
Modified: 2017-01-13 17:50 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
simple.pdf (15.07 KB, application/pdf)
2017-01-12 06:40 UTC, Kevin Suo
Details
complicated.pdf (289.81 KB, application/pdf)
2017-01-12 06:40 UTC, Kevin Suo
Details
sample "complicated" pdf directly imported to Writer ODT (236.73 KB, application/vnd.oasis.opendocument.text)
2017-01-12 16:40 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Suo 2017-01-12 06:40:27 UTC
Created attachment 130339 [details]
simple.pdf

Currently Draw can import PDF files very well. The idea is to add support to save the Draw file as an ODT file directly.

-------------------
Steps to reproduce:
-------------------

1. Open an PDF file with Draw;
2. In File menu, there should be an submenu item  "Save as Text Document".

As most PDF files are very complicated, when users hit this submenu, he or she should be warned that some formatting may not reserved in the exported text document.

-----------------------------
More Details in a Simple Way
-----------------------------

The attached "simple.pdf" file is created using Writer. There are simple two paragraphs: "I love LibreOffice!" and "I love LibreOffice very much!", with some simple formatting.

Now you open this PDF with Draw. You will see the perfect same text with formatting, except that the text is in text boxes. Remember in Writer the text is in paragraphs.
Save it as "simple.odg", then unzip, in content.xml you will find those text, both in draw:text-box node. The idea is to assembly the text back to form the same paragraphs as it is in Writer, while retaining as much formatting as possible.

----------------------------------
More Complicated Real Case Example
----------------------------------

The attached "complicated.pdf" is an Issues and Decision Memo issued by the U.S. Department of Commerce regarding an Anti-dumping case. Assume that I open this file in Draw, then want to click "File - Save as Text Document" menu to save it as a ODT file so that I can do some editing.

If you open this file in Draw, you see many many text boxes which hold the real text. When unzip the odg file you will see thousands of <draw:text-box> in content.xml. 
Currently if I want to make this file to an ODT file, I have to do it manually by copy-and-paste the texts in each text boxes from Draw to Writer. The idea is to assembly the text in text-boxes into Writer, so that we have nice formatted text in paragraphs, while retaining as much formatting as possible.

-------------

I know this feature needs a lot of work, but it worth a try as it may be very useful, and may make LibreOffice more competitive than other products.
Comment 1 Kevin Suo 2017-01-12 06:40:55 UTC
Created attachment 130340 [details]
complicated.pdf
Comment 2 V Stuart Foote 2017-01-12 16:40:41 UTC
Created attachment 130365 [details]
sample "complicated" pdf directly imported to Writer ODT

@Kevin, *

I'm not sure we really need this. Kind of out of scope for LibreOffice as we make no claims to being a PDF editor.

Believe the existing import filter to handle PDF import directly to Writer [1] rather than passing through draw does an acceptable rendering of the document. The converted ODF is attached.

Yes there are times that having the ability to edit a PDF would be handy. But we already provide our HybridPDF format to embed an ODF document within the PDF.

Even Adobe own Acrobat does not provide ability to directly edit structure of the PDF content--just touch-up. Why would we want to drag LibreOffice into that at all?

=-ref-=
[1] by setting Tools -> Options -> General: Open/Save Dialogs "Use LibreOffice dialogs" and then selecting the "PDF - Portable Document Format (Writer)(*.pdf)" import filter.
Comment 3 Kevin Suo 2017-01-13 00:17:30 UTC
Your attached converted ODT file is in good format, but it is far away from a editable ODT file, as each of the lines are within a textbox.

The key of my idea is to remove the textboxes.

This feature is important for business use. One use case is that, for example, the Department of Commerce issued a Questionnaire to a Company requesting information, but the Queasionnaire is in PDF format (actually the Department is always issuing Questionnaires in PDF format). When the Company receives the Questionnair they have to convert it to an editable Text document in order to respond to the questions. As far as I know, people are using Office 365 to do this conversion, as start from Word 2013 it has a feature called PDF Reflow.
Comment 5 m_a_riosv 2017-01-13 09:30:38 UTC

*** This bug has been marked as a duplicate of bug 32249 ***
Comment 6 Heiko Tietze 2017-01-13 09:39:34 UTC
I don't think that "export Draw as a Writer document" is the same request as "enhance PDF import filter towards a full editor". And actually I don't see a reason for the first one, and the second one is intentionally out of scope.
Comment 7 m_a_riosv 2017-01-13 16:26:48 UTC
If I'm not wrong the Kevin's target is to have the pdf editable with writer, proposing save odw as odt like a way/workaround to achieve it.
Comment 8 V Stuart Foote 2017-01-13 16:45:27 UTC
(In reply to m.a.riosv from comment #7)
> If I'm not wrong the Kevin's target is to have the pdf editable with writer,
> proposing save odw as odt like a way/workaround to achieve it.

Why? We already provide filter import of the PDF directly to the Writer canvas. But folks do not realize the capability exists. If you need the PDF content in Writer filter import it there.

As to preforming "reflow" to reparse the fixed PDF layout back into "editable" paragraphs--that is out-of-scope and is the subject of dupe bug 32249, IMHO best handled as an extension as it does not belong in LO core. Existing filter import of PDF is appropriate level of support.
Comment 9 m_a_riosv 2017-01-13 17:50:53 UTC
I know it can be imported in writer (by the way in windows can be do it with the system dialog), but there is not difference on how it is imported in draw, text in text boxes.

If someone think it is not a dup and must be reopened, no problem, do it.

If someone think resolution must be wontfix, then I think better on the duplicated bug.

Maybe today it's not one of my days, I don't know what we are discussing about:)