Bug 159525 - Make Simple HTML paste format the preferred instead of RTF
Summary: Make Simple HTML paste format the preferred instead of RTF
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
24.8.0.0 alpha0+
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Paste-From-MSO
  Show dependency treegraph
 
Reported: 2024-02-02 15:58 UTC by Gabor Kelemen (allotropia)
Modified: 2024-11-07 03:15 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
HTML paste from Word in Free Clipboard Viewer (55.34 KB, image/png)
2024-02-02 15:58 UTC, Gabor Kelemen (allotropia)
Details
Result of RTF paste from Word to PP (73.74 KB, image/png)
2024-02-02 15:59 UTC, Gabor Kelemen (allotropia)
Details
Result of HTML paste from Word to PP (76.44 KB, image/png)
2024-02-02 16:00 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gabor Kelemen (allotropia) 2024-02-02 15:58:34 UTC
Created attachment 192347 [details]
HTML paste from Word in Free Clipboard Viewer

This is a followup to bug 157363

In cases when an external app (in our cases: MSO apps) puts copied content in HTML and RTF format to the clipboard, it could be useful to have an option to prefer the HTML format as default paste format over RTF, as some apps (such as MSO) may put more content into HTML compared to RTF.

1. Open attachment 192289 [details] from bug 159478 (but really any complicated document should do) in Word
2. Copy everything
3. In PowerPoint go to Paste - Paste Special
4.a Insert the content as HTML
-> There is a table pasted. The font formatting is however lost (e.g. FAX is not in large bold font)
4.b Insert the content as RTF
-> No table, only text is pasted. Font formatting is retained.

So both source formats have pros and cons, it may make sense to:
(1) make it configurable via a config key that can be tweaked per deployment.

(2) Another option may be to actually look inside the clipboard content and check for something like this:
<meta name=Generator content="Microsoft Word 15">
<meta name=Generator content="Microsoft PowerPoint 15">
and if this is found in the HTML clipboard content, prefer this automatically.
Comment 1 Gabor Kelemen (allotropia) 2024-02-02 15:59:44 UTC
Created attachment 192348 [details]
Result of RTF paste from Word to PP

Just to demonstrate how it looks elsewhere. In LO, interpretation of the content may be different.
Comment 2 Gabor Kelemen (allotropia) 2024-02-02 16:00:16 UTC
Created attachment 192349 [details]
Result of HTML paste from Word to PP
Comment 3 Mike Kaganski 2024-11-06 07:14:38 UTC
This is based on wrong premises.
An RTF is opened, containing a table; so - indeed, everything in the file is representable in the format.

It is copied to the clipboard, and then pasted as RTF into another program (here: PowerPoint). It seems, that the assumption was: it's a MS program - hence, we can be sure that what we get is the true clipboard content. No, this is wrong assumption: what we see is what MS imported from RTF into PowerPoint, using their code; and indeed, we can see that RTF has the table - by pasting as RTF to another program (e.g., Writer), to see that the PowerPoint test was completely uninformative.

The formats may have pros and contras; but the description is off the point by 100%.

Additionally: there is the *rule* that MS created for its clipboard management, designed *exactly* to resolve the issue raised here. Quoting a quote from bug 156214 comment 8:

> Clipboard formats that contain the most information should be placed on the clipboard
> first, followed by less descriptive formats. A window pasting information from the
> clipboard typically retrieves a clipboard object in the first format it recognizes.
> Because clipboard formats are enumerated in the order they are placed on the clipboard,
> the first recognized format is also the most descriptive.

And indeed, as I mention there, too, it's not that simple. The format that the *source* application may consider "more descriptive", may be supported to a lesser degree by the *receiving* application (the MS documentation seem to assume always-perfect-support case). See e.g. bug 74801, for a case where pasting HTML into Writer would produce worse result than RTF, even though the two formats would produce descent results in Word.

In general, it is safe to assume, that (1) HTML (a Web format) wouldn't have as much page-description information as RTF (with a specific exception for MS-created HTML, having that as an MS-specific HTML extension); and (2) Writer's support for HTML is many times poorer than RTF (we only have support for some *subset* of HTML4 (!), and some tiny subset of CSS; while for RTF, even if not perfect, we keep improving support to have descent fidelity - both because it's one of office file formats, and because it's a *standard* clipboard format used by many different programs, at least on Windows).
Comment 4 Mike Kaganski 2024-11-06 07:18:36 UTC
... and the interesting fact: MS Word puts RTF *before* HTML in the clipboard in the case of the test described in comment 0; so obviously, *assuming* MS Word follows MS Windows clipboard guidelines, they consider RTF clipboard format more descriptive in this case.