Bug 144576 - Copy a table from Writer to plain text editor or as unformatted text pastes a list instead of matrix (like Calc does)
Summary: Copy a table from Writer to plain text editor or as unformatted text pastes a...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: high normal
Assignee: Adam664
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillCpp
: 157605 (view as bug list)
Depends on:
Blocks: Writer-Tables Unify-Across-Apps Cut-Copy
  Show dependency treegraph
 
Reported: 2021-09-18 01:29 UTC by Israel Enriquez
Modified: 2024-03-27 08:53 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 9 (10.49 KB, application/vnd.oasis.opendocument.text)
2024-03-18 09:59 UTC, Michael Weghorn
Details
Sample doc for further discussion in https://gerrit.libreoffice.org/c/core/+/164833 PS 11 (11.09 KB, application/vnd.oasis.opendocument.text)
2024-03-19 08:28 UTC, Michael Weghorn
Details
Screenshot for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16 (15.41 KB, image/png)
2024-03-26 16:15 UTC, Michael Weghorn
Details
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16 (10.27 KB, application/vnd.oasis.opendocument.text)
2024-03-27 08:53 UTC, Michael Weghorn
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Israel Enriquez 2021-09-18 01:29:14 UTC
Description:
The "table" format should be tab for new column and \n (new line) for new row.
This format is correct in Calc, but not in Writer.

Example:
I've open 2 documents, one in Calc and other in Writer, both with a 2x2 table, and this is what happens when I copy these tables and paste it right here:

Calc (correct):
(1,1)	(1,2)
(2,1)	(1,2)

Writer (incorrect):
(1,1)
(1,2)
(2,1)
(1,2)



Steps to Reproduce:
1.Open Writer
2.Create a table
3.Copy the table
4.Paste it in a text editor as notepad or this text boxes

Actual Results:
A non "table-formated" string.
Cells separated only with new lines.

Expected Results:
A "table-formated" string.
Cells separated with new lines and tabs.


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.2.1.2 (x64) / LibreOffice Community
Build ID: 87b77fad49947c1441b67c559c339af8f3517e22
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL
Comment 1 m_a_riosv 2021-09-18 09:40:48 UTC Comment hidden (obsolete)
Comment 2 Israel Enriquez 2021-09-18 19:57:42 UTC
It is not a problem of copy from Calc to Writer, is a problem of copy from Writer to text.
Comment 3 QA Administrators 2021-09-21 04:54:17 UTC Comment hidden (obsolete)
Comment 4 raal 2022-01-25 21:00:43 UTC
I can confirm with Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 0c3b8792b712e939d2ad524d554f96616b4844be
CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded Jumbo
and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)


paste table 2x2 from Calc:
1	2
3	4

paste table 2x2 from Writer:
1
2
3
4
Comment 5 Timur 2022-01-26 07:45:30 UTC
Copy a table from MSO Word to Notepad pastes a matrix, as it should, not a list.
So bug correctly confirmed. 
Seems Inherited from OO.
Comment 6 Heiko Tietze 2023-10-20 09:05:29 UTC
*** Bug 157605 has been marked as a duplicate of this bug. ***
Comment 7 Stéphane Guillou (stragu) 2023-10-20 09:48:06 UTC
Copying my comment from duplicate bug 157605:

Same in OOo 3.3, so inherited.
Reproduced in recent trunk build:

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: b83f069101f1e6d8aaac09a805f02bbc4c619e7a
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Table copied from OnlyOffice gives the OP's expected results:

1	Line 1
2	Line 2

(and uses tabs to separate columns, as it should)

This is the same as copy-pasting from Calc:

1	Line 1
2	Line 2

So, to me, it's sensible to make Writer tables behave the same as Calc when copy-pasted.

Hossein, could this qualify as an easy hack? Merged cells should be tested too.
Comment 8 Hossein 2023-10-30 12:59:09 UTC
(In reply to Stéphane Guillou (stragu) from comment #7)
> Hossein, could this qualify as an easy hack? Merged cells should be tested
> too.
Yes, I think this can be an EasyHack with the difficultyMedium.

Code pointers:

There are many steps in copy/pasting, including the data/format conversion and clipboard format handling. Here, you have to know that the document is converted to plain text via "text" filter.

The plaintext (ascii) filter is located here in the LibreOffice core source code:

sw/source/filter/ascii

Therefore, to change the copy/paste output, you have to fix the ascii filter. That would also provide the benefit that plain text export will be also fixed as requested here.

In this folder, there are a few files:

$ ls sw/source/filter/ascii/
ascatr.cxx  parasc.cxx  wrtasc.cxx  wrtasc.hxx

To change the output, you have to edit this file:

sw/source/filter/ascii/wrtasc.cxx

In this file, there is a loop dedicated to create the output.

 // Output all areas of the pam into the ASC file
 do {
     bool bTstFly = true;
    ...
 }

Inside this loop, the code iterates over the nodes inside the document structure, and extracts text from them. To check for yourself, add the one line below to the code, build LO, and then test. You will see that a * is appended before each node.

 SwTextNode* pNd = m_pCurrentPam->GetPoint()->GetNode().GetTextNode();
 if( pNd )
 {
+   Strm().WriteUChar('*');
  ...
 }

For example, having this table, with 1 blank paragraph up and down:

A | B
--|--
C | D

You will get this after copy/paste into a plain text editor:

*
*a
*b
*c
*d
*

To fix the bug, you have to differentiate between table cells and other nodes. Then, you should take care of the table columns and print tab between them.

To go further, you can only add star before table cells:

 if( pNd )
 {
     SwTableNode *pTableNd = pNd->FindTableNode();
     if (pTableNd)
     {
         Strm().WriteUChar('*');
     }
     ...
 }

You can look into how other filters handled tables. For example, inside sw/source/filter/html/htmltab.cxx you will see how table is managed, first cell is tracked and appropriate functions to handle HTML table are called.

For the merged cells, I suggest the EasyHacker first checks the behavior in other software, then design and implement the appropriate behavior.

To gain a better understanding of the Writer document model / layout, please see this document:

Writer/Core And Layout
https://wiki.openoffice.org/wiki/Writer/Core_And_Layout

And also this presentation:

Introduction to Writer Development - LibreOffice 2023 Conference Workshop
Miklos Vajna
https://www.youtube.com/watch?v=oM0tB1A0JHA
Comment 9 Michael Weghorn 2024-03-18 09:59:36 UTC
Created attachment 193175 [details]
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 9
Comment 10 Michael Weghorn 2024-03-19 08:28:33 UTC
Created attachment 193189 [details]
Sample doc for further discussion in https://gerrit.libreoffice.org/c/core/+/164833 PS 11
Comment 11 Michael Weghorn 2024-03-26 16:15:41 UTC
Created attachment 193324 [details]
Screenshot for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16
Comment 12 Michael Weghorn 2024-03-27 08:53:00 UTC
Created attachment 193334 [details]
Sample doc for discussion on https://gerrit.libreoffice.org/c/core/+/164833 PS 16

Output I currently get with sample file:

aabc         defghij
ySome longer text that is only in the second row    z