Bug 137469 - Text filter should have an option to hide hidden paragraphs
Summary: Text filter should have an option to hide hidden paragraphs
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.6.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard: target:7.2.0
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-14 07:29 UTC by Oleg Shchelykalnov
Modified: 2021-06-09 09:11 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
PDF generated by script (11.45 KB, application/pdf)
2020-10-14 08:43 UTC, Oleg Shchelykalnov
Details
Text file generated by script (219 bytes, text/plain)
2020-10-14 08:43 UTC, Oleg Shchelykalnov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Shchelykalnov 2020-10-14 07:29:54 UTC
When I choose to export writer document as Text hidden paragraphs which was hidden in PDF are shown in TXT file:

Example template: https://ask.libreoffice.org/upfiles/15929228703192809.odt

Example python3 code: https://pastebin.com/i7u7X9gB

PDF contains:

Simple paragraphs with variable abc
Hidden paragraph if abc
Simple paragraph with variable 3.65
Hidden paragraph if greater 5
Numeric user field 123,458.67

but TXT file contains:

Simple paragraphs with variable abc
Hidden paragraph if abc
Hidden paragrap if not abc
Simple paragraph with variable 3.65
Hidden paragraph if less 5
Hidden paragraph if greater 5
Numeric user field 123,458.67

To generate TXT file replace last script line with

model.storeToURL("file:///tmp/output/test1.txt", [PropertyValue("FilterName", -1, "Text", 0)])
Comment 1 Mike Kaganski 2020-10-14 08:41:36 UTC Comment hidden (obsolete)
Comment 2 Oleg Shchelykalnov 2020-10-14 08:43:10 UTC
Created attachment 166361 [details]
PDF generated by script
Comment 3 Oleg Shchelykalnov 2020-10-14 08:43:43 UTC
Created attachment 166362 [details]
Text file generated by script
Comment 4 Oleg Shchelykalnov 2020-10-14 08:44:34 UTC Comment hidden (obsolete)
Comment 5 Mike Kaganski 2020-10-14 08:48:48 UTC
Anyway, all the complexity with python scripts and code modifications in not needed to see the problem (why would people not try doing simple things in GUI when filing bugs to simplify reproduction steps?)

Open the document in Writer, and save as Text.

Repro with Version: 7.0.2.2 (x64)
Build ID: 8349ace3c3162073abd90d81fd06dcfb6b36b994
CPU threads: 12; OS: Windows 10.0 Build 19041; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: en-US
Calc: CL
Comment 6 Oleg Shchelykalnov 2020-10-14 09:56:34 UTC
I suppose sw/qa/python/var_fields.py test could check content of exported text to check this issue.
Comment 7 Mike Kaganski 2020-10-14 10:05:29 UTC
I still think it's a bug, but OTOH there might be a consideration/distinction related to difference between export-only PDF, and editable document format like TXT: the former should keep visual representation, while the latter is expected to keep as much information as possible ... keeping all the text, and removing all text's unsupported properties.

Miklos, Michael: what do you think: Is it better to keep text and drop its "hidden" attribute, or to keep the "hidden" attribute value by dropping the text in Text filter?
Comment 8 Miklos Vajna 2020-10-14 10:16:25 UTC
Whatever you do unconditionally, somebody will be upset. So I guess keeping the status quo makes sense, so at least the ones who are happy already are not disturbed.

You could add an option for this to make everyone happy, but then you have the cost of one more option. :-)
Comment 9 Oleg Shchelykalnov 2020-10-14 14:36:19 UTC
Could FilterOptions be used to enable new behavior?
Comment 10 Mike Kaganski 2020-10-14 14:52:01 UTC
(In reply to Oleg Shchelykalnov from comment #9)

Yes, but which filter should that be?
We have a "Text" filter without any options, and we have "Text (encoded)" filter, with encoding-related settings.

Personally I'd love to see the two filters become one, with only one option in file dialogs, with the new setting added to the current encoding-related dialog (and to the FilterOptions). But...

Anyway, shouldn't the setting (if needed) be currently added to already configurable "Text (encoded)"?
Comment 11 Oleg Shchelykalnov 2020-10-27 09:21:03 UTC
I've tried to look at it but cannot find where "Text (encoded)" filter sources are placed.
Comment 12 Oleg Shchelykalnov 2020-11-11 08:02:09 UTC
I found SwAsciiOptions and SwASCWriter classes which used here.

Should SwAsciiOptions accept sixth option to include or not hidden text and set it yes by default?
Comment 13 Mike Kaganski 2020-11-11 08:43:16 UTC
(In reply to Oleg Shchelykalnov from comment #12)
> Should SwAsciiOptions accept sixth option to include or not hidden text and
> set it yes by default?

Yes, that's needed if you want this implemented of course. I don't see a problem in this, if you keep backward compatibility (so it should by default behave as before).
Comment 14 Oleg Shchelykalnov 2020-11-12 08:29:38 UTC
I've prepared unit tests for it and now I stuck how hidden paragraphs and text represented in LibreOffice code?

New option and unittest for it: https://gerrit.libreoffice.org/c/core/+/105625

Also I found out HTML export also includes hidden paragraps.
Comment 15 Oleg Shchelykalnov 2020-11-12 10:01:06 UTC
Hidden paragraph appeared to be simple. I managed to do it in https://gerrit.libreoffice.org/c/core/+/105631 but somehow gerrit shows merge conflict while branch was rebased against master.

Also I tried to include hidden text in test but it looks like it's broken other way, it isn't included in output file if hidden condition is true either way.
Comment 16 Commit Notification 2021-05-21 07:57:20 UTC
Oleg Shchelykalnov committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/aafe21d8765158d223dd359e6737b64ed1b34549

tdf#137469 Add option to disable hidden text in text filter

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Oleg Shchelykalnov 2021-05-26 20:25:40 UTC
I've rebased and fixed other changesets that fix this issue.
Comment 18 Commit Notification 2021-06-09 09:10:18 UTC
Oleg Shchelykalnov committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c96b61f86ef3f4cdc34f84043fed2724b6d9732b

tdf#137469 Prepare tests for encoded text filter

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2021-06-09 09:11:29 UTC
Oleg Shchelykalnov committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b5e07b1339f73841664b28c65639f1638bd7edf4

tdf#137469 Implement and test excluding hidden text in text filter

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.