Bug 130008 - PDF export: Whitespaces in contents removed when copying from Preview on macOS (comment 35)
Summary: PDF export: Whitespaces in contents removed when copying from Preview on macO...
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
6.3.4.2 release
Hardware: All All
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2020-01-15 08:57 UTC by christoph_egger
Modified: 2023-07-03 16:08 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
a single slide powerpoint with a list of contents (58.67 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2020-01-15 08:59 UTC, christoph_egger
Details
PDF from macOS LO (11.53 KB, application/pdf)
2020-01-16 16:55 UTC, Roman Kuznetsov
Details
PDF from Powerpoint (30.70 KB, application/pdf)
2020-01-16 17:28 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side) (406.83 KB, image/png)
2020-02-19 07:39 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side) (431.06 KB, image/png)
2020-02-19 08:02 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side) (477.10 KB, image/png)
2020-02-19 08:26 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side) with all spaces prefixed with a slash (426.19 KB, image/png)
2020-03-25 07:47 UTC, christoph_egger
Details
powerpoint presentation with all spaces prefixed with a slash (61.68 KB, application/vnd.openxmlformats-officedocument.presentationml.presentation)
2020-04-06 10:39 UTC, christoph_egger
Details
PDF from Windows LO 6.4 (22.28 KB, application/pdf)
2020-05-03 08:33 UTC, eisa01
Details
PDF from macOS LO 6.4 Export as PDF (18.77 KB, application/pdf)
2020-05-03 08:34 UTC, eisa01
Details
screenshot of "pdiff express 2" showing the diff between pdf from attachment 160267 (left side) and pdf from attachment 160268 (right side) (370.62 KB, image/png)
2020-05-04 06:43 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the diff between pdf from ppt 2008 (left side) and pdf from ppt 2019 (right side) (453.12 KB, image/png)
2020-08-14 08:27 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint 2008 (right side) (447.40 KB, image/png)
2020-11-02 09:23 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint 2008 (right side) with all spaces prefixed with a slash (472.00 KB, image/png)
2020-11-02 09:34 UTC, christoph_egger
Details
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint 2019 (right side) with all spaces prefixed with a slash (454.64 KB, image/png)
2020-11-02 09:37 UTC, christoph_egger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description christoph_egger 2020-01-15 08:57:42 UTC
Description:
I export a presentation as pdf. In the pdf when I copy the text from contents into clipboard the Whitespaces are removed.

Steps to Reproduce:
1. Open attached presentation in Impress.
2. Export presentation as pdf.
3. Open pdf.
4. Select text from contents and copy to clipboard.
5. Text in the clipboard has no whitespaces.

Actual Results:
When I paste the text from the clipboard somewhere whitespaces are removed.

Example:
In step 4 I select "The attributes of the 144000 (9 slides)"
After step 5 the clipboard contains "Theattributesofthe144000(9slides)"

Expected Results:
The whitespaces should be in the clipboard.

Example:
In step 4 I select "The attributes of the 144000 (9 slides)"
After step 5 the clipboard should contain:
"The attributes of the 144000 (9 slides)"



Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.3.4.2
Build ID: 60da17e045e08f1793c57c00ba83cdfce946d0aa
CPU threads: 8; OS: Mac OS X 10.14.6; UI render: default; VCL: osx; 
Locale: en-US (en_DE.UTF-8); UI-Language: en-US
Calc: threaded
Comment 1 christoph_egger 2020-01-15 08:59:16 UTC
Created attachment 157160 [details]
a single slide powerpoint with a list of contents
Comment 2 Roman Kuznetsov 2020-01-16 14:58:34 UTC
what PDF viewer do you use? 

I can't repro it on Windows in 

Версия: 6.5.0.0.alpha0+ (x64)
ID сборки: 147af9e2cf7f937ed83ab00574b6a418a2cb629e
Потоков ЦП: 4; ОС:Windows 10.0 Build 18362; Отрисовка ИП: по умолчанию; VCL: win; 
Локаль: ru-RU (ru_RU); Язык интерфейса: ru-RU
Calc: threaded

and I use PDF-Xchange viewer there

so possibly it's mac only or problem with only some PDF viewer?
Comment 3 christoph_egger 2020-01-16 15:16:12 UTC
I can export the pdf in two ways:

1. File -> Export As -> Export Directly as PDF
2. File -> Print -> PDF -> Save PDF as

Both ways give the same result as reported.

PDF viewers I use:

- Preview App
- PDiff Express 2 App

I can reproduce this with both pdf viewers.

I found this issue with "PDiff Express 2" when I compared
the pdf produced with powerpoint with the pdf produced with Impress
because it showed me a diff I didn't expect.

The Preview App shows whitespaces but misses them when I copy to clipboard,
the "PDiff Express 2" does not show whitespaces.

It looks like the text-layer in the pdf has no whitespaces?
Comment 4 christoph_egger 2020-01-16 15:36:26 UTC Comment hidden (obsolete)
Comment 5 Roman Kuznetsov 2020-01-16 16:46:46 UTC
repro in

Версия: 6.5.0.0.alpha0+
ID сборки: b6295e4a1b7735c148174f44f6d28221f4f52302
Потоков ЦП: 4; ОС:Mac OS X 10.15.2; Отрисовка ИП: GL; VCL: osx; 
Локаль: ru-RU (ru_RU.UTF-8); Язык интерфейса: ru-RU
Calc: threaded

I used the standard PDF viewer in macOS
Comment 6 Roman Kuznetsov 2020-01-16 16:54:42 UTC
I tested PDF from macOS's LibreOffice on Linux machine with Okular PDF viewer. Text pasted from that PDF into document looks fine, it has all spaces.
I think it isn't a problem in LibreOffice, but it's a macOS PDF viewer's problem.

not a bug? Xisco, any opinions?
Comment 7 Roman Kuznetsov 2020-01-16 16:55:13 UTC
Created attachment 157193 [details]
PDF from macOS LO
Comment 8 christoph_egger 2020-01-16 17:28:54 UTC
Created attachment 157194 [details]
PDF from Powerpoint
Comment 9 christoph_egger 2020-01-16 17:37:03 UTC
(In reply to Roman Kuznetsov from comment #6)
> I tested PDF from macOS's LibreOffice on Linux machine with Okular PDF
> viewer. Text pasted from that PDF into document looks fine, it has all
> spaces.
> I think it isn't a problem in LibreOffice, but it's a macOS PDF viewer's
> problem.
> 
> not a bug? Xisco, any opinions?

I think it is not a problem with macOS PDF viewer, because the whitespaces are there when I use the PDF from Powerpoint which I attached.

Notice the difference of the file size:
PDF from Powerpoint has 30.7KB

and the PDF from LO (see attachment) is 11.53 KB
respectively 25 KB when I create it from LO.
Comment 10 eisa01 2020-02-18 21:17:26 UTC
For me neither the LO or PowerPoint produced PDF has whitespace included when I copy from Preview

In Acrobat Reader both work

Although yes, your PDF from PowerPoint shows correctly. How did you produce it? I'm on 16.34

So at the face of it I would call this maybe not our bug

Version: 7.0.0.0.alpha0+
Build ID: 0cb4f304abf6f8dd6b40eb800788d2fe80581813
CPU threads: 4; OS: Mac OS X 10.14.6; UI render: default; VCL: osx; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded
Comment 11 christoph_egger 2020-02-19 07:39:50 UTC
Created attachment 157996 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side)
Comment 12 christoph_egger 2020-02-19 08:02:23 UTC
Created attachment 157997 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side)
Comment 13 christoph_egger 2020-02-19 08:05:34 UTC
(In reply to eisa01 from comment #10)
> For me neither the LO or PowerPoint produced PDF has whitespace included
> when I copy from Preview
> 
> In Acrobat Reader both work
> 
> Although yes, your PDF from PowerPoint shows correctly. How did you produce
> it? I'm on 16.34
> 

> How did you produce it?

File -> Print -> Save PDF

> I'm on 16.34

What do you mean? I don't understand.

> So at the face of it I would call this maybe not our bug
> 

I attached a screenshot (see attachment 157996 [details]) from PDF comparison with
"Pdiff Express 2" that shows the difference of
the produced PDF's from LO (= left side)
and from Powerpoint (= right side).
There you clearly see, the whitespaces are missing in the PDF from LO.

When "Pdiff Express 2" says both PDF's are equal then I am happy.

I created both PDF's via File -> Print -> Save PDF
(I rotated the PDF from LO in Preview before comparison due bug 130007)

For the second screenshot (see attachment 157997 [details]) I did it vice versa:
I rotated the PDF from Powerpoint before comparison to show that the rotation
operation in Preview does not touch the text layer in the PDF.

You can get "Pdiff Express 2" from the Apple Store for Mac.
The trial version allows to compare the first three pages which is enough for testing purpose of this issue.
Comment 14 christoph_egger 2020-02-19 08:26:12 UTC
Created attachment 157998 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side)

The PDF from LO is rotated due bug 130007.
Comment 15 christoph_egger 2020-02-19 08:31:32 UTC
(In reply to christoph_egger from comment #13)
> 
> I attached a screenshot (see attachment 157996 [details]) from PDF
> comparison with "Pdiff Express 2" that shows the difference of
> the produced PDF's from LO (= left side)
> and from Powerpoint (= right side).
> There you clearly see, the whitespaces are missing in the PDF from LO.
> 
> When "Pdiff Express 2" says both PDF's are equal then I am happy.
> 
> I created both PDF's via File -> Print -> Save PDF
> (I rotated the PDF from LO in Preview before comparison due bug 130007)
> 
> For the second screenshot (see attachment 157997 [details]) I did it vice
> versa:
> I rotated the PDF from Powerpoint before comparison to show that the rotation
> operation in Preview does not touch the text layer in the PDF.
> 
> You can get "Pdiff Express 2" from the Apple Store for Mac.
> The trial version allows to compare the first three pages which is enough
> for testing purpose of this issue.

I created Screenshot (see attachment 157998 [details]) of "pdiff express 2" *without* use of Preview before comparison.
Comment 16 christoph_egger 2020-02-19 08:51:10 UTC
In case you ask what is the relation of "pdiff express 2" to the clipboard (in relation to Comment 0):

Answer: The content of the clipboard matches with "pdiff express 2" saying what is in the text layout of the pdf.

Examples:

"pdiff express 2" says "Introduction (1 slide)" and I have "Introduction (1 slide)" in the clipboard.
"pdiff express 2" says "Introduc/on(1slide)" and I have "Introduc/on(1slide)" in the clipboard.
Comment 17 Oliver Grimm 2020-03-24 22:45:17 UTC
Since specifically the "ti" ligature in the word "introduction" is broken in the example this bug might be related to incorrect parsing of escape sequences in one of the programs or incorrect forwarding of escape sequenced text between them. 

How do the spaces in an export look like if you prefix all spaces in the original document with a dash "/"?
Comment 18 christoph_egger 2020-03-25 07:47:53 UTC
Created attachment 158968 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint (right side) with all spaces prefixed with a slash

Screenshot made per request in Comment 17
Comment 19 christoph_egger 2020-03-25 08:01:10 UTC
(In reply to Oliver Grimm from comment #17)
> Since specifically the "ti" ligature in the word "introduction" is broken in
> the example this bug might be related to incorrect parsing of escape
> sequences in one of the programs or incorrect forwarding of escape sequenced
> text between them. 

If you think that parsing of the PDF is incorrect how do you explain
that parsing of the PDF from Powerpoint is correct?

What data is necessary in the PDF export that *any* program
has a chance for correct parsing?

> How do the spaces in an export look like if you prefix all spaces in the
> original document with a dash "/"?

See attachment 158968 [details]
FYI: I made the requested modification in Powerpoint.
Comment 20 christoph_egger 2020-04-06 10:39:42 UTC
Created attachment 159358 [details]
powerpoint presentation with all spaces prefixed with a slash

Same as attachment 157160 [details] but with all spaces prefixed with a slash
Comment 21 eisa01 2020-05-03 08:32:59 UTC
Ok,
first of all you can use Export -> Export as PDF to avoid the rotation bug.

However, that does not avoid the issue

I also did an export test from LO on Windows, and that resulting file also misses spaces when shown in macOS Preview on Mojave. As such, this is not a mac bug

However, in e.g., the Edge PDF viewer I'm able to copy paste the output just fine with spaces.

I guess this is a question whether LO produce valid PDF output or not, or if this is a bug in Preview? Second question is then if we should fix the output to be able to be read by Preview or not

(the now save as PDF in print dialog produce quite broken output when copy pasted due to the rotation, lots of extra spaces)
Comment 22 eisa01 2020-05-03 08:33:45 UTC
Created attachment 160267 [details]
PDF from Windows LO 6.4
Comment 23 eisa01 2020-05-03 08:34:24 UTC
Created attachment 160268 [details]
PDF from macOS LO 6.4 Export as PDF
Comment 24 christoph_egger 2020-05-04 06:38:21 UTC
(In reply to eisa01 from comment #21)
> Ok,
> first of all you can use Export -> Export as PDF to avoid the rotation bug.
> 

That sounds like you can reproduce bug 130007.
Please make a note there.
Comment 25 christoph_egger 2020-05-04 06:43:23 UTC
Created attachment 160307 [details]
screenshot of "pdiff express 2" showing the diff between pdf from attachment 160267 [details] (left side) and pdf from attachment 160268 [details] (right side)
Comment 26 christoph_egger 2020-05-04 06:49:50 UTC
(In reply to eisa01 from comment #21)
> Ok,
> first of all you can use Export -> Export as PDF to avoid the rotation bug.
> 
> However, that does not avoid the issue
> 
> I also did an export test from LO on Windows, and that resulting file also
> misses spaces when shown in macOS Preview on Mojave. As such, this is not a
> mac bug
> 
> However, in e.g., the Edge PDF viewer I'm able to copy paste the output just
> fine with spaces.
> 
> I guess this is a question whether LO produce valid PDF output or not, or if
> this is a bug in Preview? Second question is then if we should fix the
> output to be able to be read by Preview or not
> 
> (the now save as PDF in print dialog produce quite broken output when copy
> pasted due to the rotation, lots of extra spaces)

I attached a screenshot from "pdiff express 2" comparing your two pdf's
(see screenshot from attachment 160307 [details]).
It shows that both pdf's have *no* whitespaces in the text layer.

Question: Where does "Edge PDF viewer" get the whitespaces from when you copy the text?
Comment 27 christoph_egger 2020-05-04 06:52:43 UTC
(In reply to eisa01 from comment #21)
> Ok,
> first of all you can use Export -> Export as PDF to avoid the rotation bug.
> 
> (the now save as PDF in print dialog produce quite broken output when copy
> pasted due to the rotation, lots of extra spaces)

When I do that from PowerPoint on Mac then the resulting PDF is correct.
Can you confirm that?
Comment 28 christoph_egger 2020-08-14 08:27:38 UTC
Created attachment 164300 [details]
screenshot of "pdiff express 2" showing the diff between pdf from ppt 2008 (left side) and pdf from ppt 2019 (right side)

I recently bought MS Office 2019 for Mac.
I produced a pdf and compared it with my older MS Office 2008 version.

It is noticable that MS Office 2019 does also remove the whitespaces in the textlayer.
I don't know what is the case with MS Office 2013 and 2016 but I think
that explains why this is hard to reproduce for many people.
Comment 29 christoph_egger 2020-08-14 08:39:50 UTC
(In reply to christoph_egger from comment #28)
> Created attachment 164300 [details]
> screenshot of "pdiff express 2" showing the diff between pdf from ppt 2008
> (left side) and pdf from ppt 2019 (right side)
> 
> I recently bought MS Office 2019 for Mac.
> I produced a pdf and compared it with my older MS Office 2008 version.
> 
> It is noticable that MS Office 2019 does also remove the whitespaces in the
> textlayer.
> I don't know what is the case with MS Office 2013 and 2016 but I think
> that explains why this is hard to reproduce for many people.


I restested with LO version

Version: 7.0.0.3
Build ID: 8061b3e9204bef6b321a21033174034a5e2ea88e
CPU threads: 8; OS: Mac OS X 10.14.6; UI render: default; VCL: osx
Locale: en-US (en_DE.UTF-8); UI: en-US
Calc: threaded

It still does remove the whitespaces in the textlayer
and it still produces "Introdu/on" instead of "Introduction".


MS Office 2008 produces "Introduction (1 slide)"
MS Office 2019 produces "Introduction(1slide)"
LO 7.0.0 produces "Introdu/on(1slide)"


The question that rises up:
What is the correct behavior according to PDF specification?

I figured out LO has the same issue with the word "Revelation":
It produces "Revela/on".
Comment 30 christoph_egger 2020-08-14 09:23:43 UTC
(In reply to christoph_egger from comment #29)
> (In reply to christoph_egger from comment #28)
> > Created attachment 164300 [details]
> > screenshot of "pdiff express 2" showing the diff between pdf from ppt 2008
> > (left side) and pdf from ppt 2019 (right side)
> > 
> > I recently bought MS Office 2019 for Mac.
> > I produced a pdf and compared it with my older MS Office 2008 version.
> > 
> > It is noticable that MS Office 2019 does also remove the whitespaces in the
> > textlayer.
> > I don't know what is the case with MS Office 2013 and 2016 but I think
> > that explains why this is hard to reproduce for many people.
> 
> 
> I restested with LO version
> 
> Version: 7.0.0.3
> Build ID: 8061b3e9204bef6b321a21033174034a5e2ea88e
> CPU threads: 8; OS: Mac OS X 10.14.6; UI render: default; VCL: osx
> Locale: en-US (en_DE.UTF-8); UI: en-US
> Calc: threaded
> 
> It still does remove the whitespaces in the textlayer
> and it still produces "Introdu/on" instead of "Introduction".
> 
> 
> MS Office 2008 produces "Introduction (1 slide)"
> MS Office 2019 produces "Introduction(1slide)"
> LO 7.0.0 produces "Introdu/on(1slide)"
> 
> 
> The question that rises up:
> What is the correct behavior according to PDF specification?
> 
> I figured out LO has the same issue with the word "Revelation":
> It produces "Revela/on".

Restested with version

Version: 7.1.0.0.alpha0+
Build ID: <buildversion>
CPU threads: 8; OS: Mac OS X 10.14.6; UI render: default; VCL: osx
Locale: en-US (en_DE.UTF-8); UI: en-US
Calc: threaded

I downloaded it from:

https://dev-builds.libreoffice.org/daily/master/MacOSX-x86_64@tb81-TDF/2020-08-14_07.26.35/LibreOfficeDev_7.1.0.0.alpha0_MacOS_x86-64.dmg

Same result as with 7.0.0.3 above
Comment 31 christoph_egger 2020-11-02 09:23:36 UTC
Created attachment 166928 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint 2008 (right side)

Replace screenshot from attachment 157998 [details] with a newer version now that bug 130007 is addressed
Comment 32 christoph_egger 2020-11-02 09:34:29 UTC
Created attachment 166929 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint 2008 (right side) with all spaces prefixed with a slash

Replace screenshot from attachment 158968 [details] with a new screenshot now that bug 130007 is addressed
Comment 33 christoph_egger 2020-11-02 09:37:49 UTC
Created attachment 166930 [details]
screenshot of "pdiff express 2" showing the difference between pdf from lo (left side) and pdf from powerpoint 2019 (right side) with all spaces prefixed with a slash

Same screenshot as attachment 166929 [details] but against powerpoint 2019
Comment 34 Timur 2020-11-02 13:27:52 UTC
This bug is unreadable. I tried but I gave up. If there's a bug, looks like Mac only, or I didn't see where it was confirmed for other OS.

christoph, please do not quote yourself. With new info like "Repro 7.0" you should obsolete same previous "Repro 6.3".
Instead, make a complete comment that minimally summarizes the issue (you may refer to attachment 123), explain why this is not NotOurBug (use Adobe Reader for PDF), then please go from beginning, leave just valuable comments and mark all other "obsolete".
Comment 35 christoph_egger 2020-11-02 15:57:00 UTC
Let's restart the bug report that people can follow again:

Description:
I export a presentation as pdf. In the pdf when I copy the text from contents into clipboard the Whitespaces are removed or content changed.

Steps to Reproduce:
1. Open attached presentation in Impress.
2. Export presentation as pdf.
3. Open pdf.
4. Select text from contents and copy to clipboard.
5. Text in the clipboard has no whitespaces.

Actual Results:
When I paste the text from the clipboard somewhere whitespaces are removed or content changed.

Example:
In step 4 I select "Introduction (1 slide)"
After step 5 the clipboard contains "Introduc/on (1 slide)"
or "Introduction(1slide)"

Expected Results:
The whitespaces should be in the clipboard and content unchanged.

Example:
In step 4 I select "Introduction (1 slide)"
After step 5 the clipboard should contain:
"Introduction (1 slide)"


Reproducible: Always


User Profile Reset: No


Which PDF viewers do I use?
1. Preview App
2. PDiff Express 2
3. Adobe Acrobat Reader for Mac


When I create the PDF with Powerpoint 2008 for Mac and follow the steps to reproduce then I get "Introduction (1 slide)" into the clipboard with all three PDF viewers. This is the expected result.

When I create the PDF with Powerpoint 2019 for Mac and follow the steps to reproduce then I get "Introduction(1slide)" into the clipboard with Preview App and PDiff Express 2. With Acrobat Reader I get "Introduction (1 slide)" into the clipboard. The removal of the whitespace content is unexpected.

When I create the PDF with LO 7.1+ and follow the steps to reproduce
then I get "Introdu/on(1slide)" into the clipboard with Preview App and PDiff Express 2. With Adobe Acrobat Reader for Mac I get "Introduc/on (1 slide)" into the clipboard. This content change is unexpected.




In Comment 17 there is the question what happens with a slash prefix to each whitespace.

I created Attachment 159385 [details],  Attachment 166929 [details] and attachment 166930 [details] in response to that question.

The test results concerning the question in Comment 17 are these:

When I create the PDF with Powerpoint 2008 for Mac and follow the steps to reproduce then I get "Introduction/ (1/ slide)" into the clipboard with all three PDF viewers. This is the expected result.

When I create the PDF with Powerpoint 2019 for Mac and follow the steps to reproduce then I get "Introduction/(1/slide)" into the clipboard with Preview App and PDiff Express 2. With Adobe Acrobat I get "Introduction/ (1/ slide)" into the clipboard. The removal of the whitespace content is unexpected.

When I create the PDF with LO 7.1+ and follow the steps to reproduce
then I get "Introduc/on(1slide)" with Preview App and with "PDiff Express 2" into the clipboard. With Adobe Acrobat I get "Introduc/on/ (1/ slide)" into the clipboard. This content change is unexpected.
Comment 36 Buovjaga 2021-07-27 12:23:19 UTC
(In reply to christoph_egger from comment #35)
> Let's restart the bug report that people can follow again:
> 
> Description:
> I export a presentation as pdf. In the pdf when I copy the text from
> contents into clipboard the Whitespaces are removed or content changed.
> 
> Steps to Reproduce:
> 1. Open attached presentation in Impress.
> 2. Export presentation as pdf.
> 3. Open pdf.
> 4. Select text from contents and copy to clipboard.
> 5. Text in the clipboard has no whitespaces.
> 
> Actual Results:
> When I paste the text from the clipboard somewhere whitespaces are removed
> or content changed.
> 
> Example:
> In step 4 I select "Introduction (1 slide)"
> After step 5 the clipboard contains "Introduc/on (1 slide)"
> or "Introduction(1slide)"

Tested with attachment 157160 [details] and Okular and I don't see the problem

NixOS
Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 67e47070a7580a17804adce812cc2f98bfe7b51f
CPU threads: 16; OS: Linux 5.13; UI render: default; VCL: x11
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Comment 37 christoph_egger 2021-07-31 13:52:45 UTC
(In reply to Buovjaga from comment #36)
> (In reply to christoph_egger from comment #35)
> > Let's restart the bug report that people can follow again:
> > 
> > Description:
> > I export a presentation as pdf. In the pdf when I copy the text from
> > contents into clipboard the Whitespaces are removed or content changed.
> > 
> > Steps to Reproduce:
> > 1. Open attached presentation in Impress.
> > 2. Export presentation as pdf.
> > 3. Open pdf.
> > 4. Select text from contents and copy to clipboard.
> > 5. Text in the clipboard has no whitespaces.
> > 
> > Actual Results:
> > When I paste the text from the clipboard somewhere whitespaces are removed
> > or content changed.
> > 
> > Example:
> > In step 4 I select "Introduction (1 slide)"
> > After step 5 the clipboard contains "Introduc/on (1 slide)"
> > or "Introduction(1slide)"
> 
> Tested with attachment 157160 [details] and Okular and I don't see the
> problem
> 
> NixOS
> Version: 7.3.0.0.alpha0+ / LibreOffice Community
> Build ID: 67e47070a7580a17804adce812cc2f98bfe7b51f
> CPU threads: 16; OS: Linux 5.13; UI render: default; VCL: x11
> Locale: fi-FI (fi_FI.UTF-8); UI: en-US
> Calc: threaded

Noone on windows can reproduce either.
This really is Mac OS X only.


I can still reproduce on

Version: 7.1.5.2 / LibreOffice Community
Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5
CPU threads: 12; OS: Mac OS X 10.15.7; UI render: default; VCL: osx
Locale: en-GB (en_DE.UTF-8); UI: en-US
Calc: threaded

and on

Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 4677345e3695bac158bb04048b4d5c608ed764b4
CPU threads: 12; OS: Mac OS X 10.15.7; UI render: default; VCL: osx
Locale: en-US (en_DE.UTF-8); UI: en-US
Calc: threaded
Comment 38 Denys Prokhorov 2021-08-01 10:04:45 UTC
When opening the PDF in the LibreOffice, PDF Expert 2.5.18, Crome 92.0.4515.107 the error is not reproduced.

I was able to reproduce the bug in OS X 11.0 (999.4) standard viewer when copying text missing spaces.
If you remove the numbering and go through all the steps again, the error disappears.

Additional Info:
Version: 7.3.0.0.alpha0+ / LibreOffice Community
Build ID: 246f24ebc8227345f13784d9e2f055d813f3d24f
CPU threads: 12; OS: Mac OS X 10.15.7; UI render: default; VCL: osx
Locale: ru-RU (ru_UA.UTF-8); UI: en-US
Calc: threaded

Version: 7.1.5.2 / LibreOffice Community
Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5
CPU threads: 12; OS: Mac OS X 10.15.7; UI render: default; VCL: osx
Locale: ru-RU (ru_UA.UTF-8); UI: en-US
Calc: threaded
Comment 39 christoph_egger 2021-08-01 11:08:55 UTC
(In reply to Denys Prokhorov from comment #38)
> When opening the PDF in the LibreOffice, PDF Expert 2.5.18, Crome
> 92.0.4515.107 the error is not reproduced.
> 
> I was able to reproduce the bug in OS X 11.0 (999.4) standard viewer when
> copying text missing spaces.
> If you remove the numbering and go through all the steps again, the error
> disappears.

This is interesting discovery. I can confirm:

With the numbering I get:

Introduc/on(1slide)

Without the numbering I get:

Introduc/on (1 slide)

Indeed, the problem with the whitespaces disappears.

This problem with whitespaces is related to the numbering format.
The other problem with the 'ti' -> '/' change is unrelated to it.
Comment 40 eisa01 2023-03-18 20:42:50 UTC
(In reply to Buovjaga from comment #36)
> Tested with attachment 157160 [details] and Okular and I don't see the
> problem
> 
> NixOS
> Version: 7.3.0.0.alpha0+ / LibreOffice Community
> Build ID: 67e47070a7580a17804adce812cc2f98bfe7b51f
> CPU threads: 16; OS: Linux 5.13; UI render: default; VCL: x11
> Locale: fi-FI (fi_FI.UTF-8); UI: en-US
> Calc: threaded

As per comment #21 this is present when exported from Windows and opened in Preview on macOS. It did not have the problem in Edge, so I guess Okular also doesn't have the problem

It's a question whether it's rather a bug in Preview though, as e.g., opening the PDF in Firefox for macOS and copying out works fine

Still present with PDFs produced in 7.5 if anyone wonders
Comment 41 ⁨خالد حسني⁩ 2023-07-03 16:08:28 UTC
All the spaces are there in the PDF, LibreOffice is exporting them correctly.

Text extraction from PDF is mess and every PDF viewer has its own undocumented behavior and little bugs, there is not much we can do about it.

The same goes for the ti ligature, all information needed to gets its textual representation is in the PDF, but the PDiff tool is ignoring it.

The PDF from PowerPoint is behaving better by cheer luck, and I don’t see anything specific in its output that makes it behave differently. Unless someone  can point to a specific PDF feature present in PowerPoint output and missing from ours, there is nothing to be done here.