Bug 109409 - TEXTJOIN and CONCAT handle array/matrix column-wise instead of row-wise
Summary: TEXTJOIN and CONCAT handle array/matrix column-wise instead of row-wise
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
5.4.0.3 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:7.1.0 target:7.0.2 target:7.0.1
Keywords:
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2017-07-26 23:24 UTC by Wolfgang Jäger
Modified: 2020-09-03 09:49 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
The demo mentioned in the bug report (16.57 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-07-26 23:25 UTC, Wolfgang Jäger
Details
A second demo also containing a workaround. May clarify even better the issue this way. (14.65 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-09-23 12:24 UTC, Wolfgang Jäger
Details
The new demo showing the different treatment of parameters passing references as compared with parameters passing calculatd arrays. (10.59 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-08-16 20:58 UTC, Wolfgang Jäger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfgang Jäger 2017-07-26 23:24:31 UTC
If an array is used to pass textpieces to TEXTJOIN the pieces are joined row by row from left to right running through the rows top down. 

If the transposition of the same array is passed to the otherwise unchanged TEXTJOIN call the result does not change. 

Expected:
The result should be the same as if the original array was first transposed to an output range elsewhere and the result passed to TEXTJOIN as parameter. 
This would return a result as if the columns of the original array were joined one by one top down running through the columns from left to right.

Respective results for CONCAT. 

See attached demo.
Comment 1 Wolfgang Jäger 2017-07-26 23:25:55 UTC
Created attachment 134884 [details]
The demo mentioned in the bug report
Comment 2 Buovjaga 2017-08-20 16:21:08 UTC
Confirmed with demo.

Arch Linux 64-bit, KDE Plasma 5
Version: 6.0.0.0.alpha0+
Build ID: 668ab2e739397e6b095372a1a468bd4f515927b6
CPU threads: 8; OS: Linux 4.12; UI render: default; VCL: kde4; 
Locale: fi-FI (fi_FI.UTF-8); Calc: group
Built on August 20th 2017
Comment 3 Wolfgang Jäger 2017-09-23 12:24:26 UTC
Created attachment 136486 [details]
A second demo also containing a workaround. May clarify even better the issue this way.

See also the new demo I made and attached to an answer in the ask.libreoffice site.
Comment 4 QA Administrators 2018-09-24 02:48:58 UTC Comment hidden (obsolete)
Comment 5 Wolfgang Jäger 2018-09-24 09:27:24 UTC
Concerning V 6.1.1.2: Behaviour unchanged.

Additional considerations:
There is no specification for the functions CONCAT and TEXTJOIN in OpenFormula_1.2. The de-facto specification is "Do as Excel 2016 does!".  

Trying to shape probable specifications in a way analogous to the specifications of numerical accumulating functions (SUM, COUNT, PRUDUCT, ...) I came to the conclusion that there would be needed (at least) additional types like 'TextSequence' and 'TextSequenceList' to do so.  

Concerning conversion to NumberSequence, NumberSequenceList and related types (See OpenFormula_1.2. subchapters 6.3.7 through 6.3.9 and 6.3.11) there is no specification regarding order. Since all the operations to perform in these cases are (extended) commutative in math (NOT exactly in machine arithmetic!) such a specification may be assumed obsolete. Concerning concatenation there is no commutativity.  

Anyway, regarding the term, a sequence is not the same as a set, and any technical accumulation will be a serial process in time. Therefore conversion (of results gotten by range references e.g.) to sequence should explicitly specify an order explicitly.
Comment 6 Wolfgang Jäger 2019-06-03 10:46:45 UTC
Issue still present in V 6.3.0.0.alpha1

[Again emphasizing that for functions accumulating texts the order of processing any given or converted-to sequence should (must) be specified. 

The relevant numerical operations being used in accumulating functions are addition and multiplication which both are associative and commutative. Thus a specification of the order in processing may be omitted - though processor arithmetic (floating point with fix bit-width) has its issues insofar.]
Comment 7 Wolfgang Jäger 2020-08-16 20:58:56 UTC
Created attachment 164361 [details]
The new demo showing the different treatment of parameters passing references as compared with parameters passing calculatd arrays.

It looks as if CONCAT and TEXTJOIN internally distinguish the cases of 
1) a parameter being a reference
2) the parameter being a calculated array
and treat the first case "one row at a time, rows top down",
but the second case "one column at a time, columns left to right"?
This results in a diffrenece like applying TRANSPOSE() would cause it.

Since the specifications for sequences is lacking a decision which one of the orders should be used, I cannot tell what's actually the bug. 
However, the treatment should be consistent, of course.
Comment 8 Eike Rathke 2020-08-17 11:05:03 UTC
Taking a look at the implementation it indeed iterates all rows for each column for the array/matrix case. I think this is simply an implementation error, unless someone can spot the same behaviour in Excel..
Comment 9 Wolfgang Jäger 2020-08-17 12:24:20 UTC
(In reply to Eike Rathke from comment #8)
> Taking a look at the implementation it indeed iterates all rows for each
> column for the array/matrix case. I think this is simply an implementation
> error, unless someone can spot the same behaviour in Excel..

I well remember that TEXTJOIN and CONCAT originally were dedicated to compatibility with Excel, and that there were a few months of discussions about the doubtable implementations owed to that fact.

The idea of sticking to the original implementation in case it was proven to behave as the functions did in Exel looked absurd to me then, because what was claimed to be returned by Excel under array-evaluation was so absurd. 
 
Now the issue is clearly less serious. Nonetheless the question if "compatible" should be preferred over "good and consistent" deserves thorough consideration in every case.
Comment 10 Eike Rathke 2020-08-17 18:28:41 UTC
(In reply to Wolfgang Jäger from comment #9)
> I well remember that TEXTJOIN and CONCAT originally were dedicated to
> compatibility with Excel, and that there were a few months of discussions
> about the doubtable implementations owed to that fact.
I doubt the current behaviour was intentional, the more because that column-wise handling is also applied to external range references as those are obtained as a matrix.
Comment 11 Eike Rathke 2020-08-19 18:20:32 UTC
Looking at bug 97831 that discussed the implementation of TEXTJOIN() and CONCAT() there's nothing that explicitly states the handling of order in array/matrix cases. Also the Excel function help pages https://support.microsoft.com/en-us/office/textjoin-function-357b449a-ec91-49d0-80c3-0e8fc845691c and https://support.microsoft.com/en-us/office/concat-function-9b1a9a3f-94ff-41af-9736-694cbd6b4ca2 do not explicitly mention any different than general behaviour.

I still think our column-wise handling is an implementation error.

Bug 99625 illustrates that not even the simple Excel online and full Excel 365/2016 agreed on array formulas (not related to array arguments though), so any test case for this should be produced in the full-fledged Excel.
Comment 12 Eike Rathke 2020-08-19 18:21:34 UTC
@Mike:
Can you please test this in Excel?
Values in A1:B2 (a1,b1,a2,b2)
Formula: =TEXTJOIN(",";;TRANSPOSE(A1:B2))
Result is
1) "a1,b1,a2,b2"
or
2) "a1,a2,b1,b2"

My guess is #2.

Similarly, an inline array
=TEXTJOIN(",";;{"a1","b1";"a2","b2"})
where the array column separator is , comma and the array row separator is ; semicolon IMHO should return "a1,b1,a2,b2" (which in Calc currently returns "a1,a2,b1,b2").
Comment 13 Mike Kaganski 2020-08-19 19:03:00 UTC
Unfortunately, I only have Excel 2016 (TEXTJOIN appeared only in Excel 2019).

But Office Online (available in my profile) gives this:

=TEXTJOIN(",";;TRANSPOSE(A1:B2)) => a1,a2,b1,b2
=TEXTJOIN(",";;{"a1","b1";"a2","b2"}) => gives unspecified error (just marks the cell with red border, without any hints anywhere, possibly due to Online version limitations).
Comment 14 Mike Kaganski 2020-08-19 19:25:33 UTC
(In reply to Eike Rathke from comment #11)
> Bug 99625 illustrates that not even the simple Excel online and full Excel
> 365/2016 agreed on array formulas (not related to array arguments though),
> so any test case for this should be produced in the full-fledged Excel.

heh, I see how my previous comment had no value...
Comment 15 Eike Rathke 2020-08-19 20:33:22 UTC
=TEXTJOIN(",";;TRANSPOSE(A1:B2)) => a1,a2,b1,b2
is enough confirmation for me.
Thanks Mike!
Comment 16 Commit Notification 2020-08-20 00:15:38 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ff3955db7161b8644699d7a0128ec4a6e7e525ec

Resolves: tdf#109409 TEXTJOIN() CONCAT() handle array/matrix row-wise

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Eike Rathke 2020-08-20 11:56:33 UTC
Pending review
https://gerrit.libreoffice.org/c/core/+/101058 for 7-0-1
https://gerrit.libreoffice.org/c/core/+/101052 for 7-0
Comment 18 Commit Notification 2020-08-20 20:51:42 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-0":

https://git.libreoffice.org/core/commit/2f4653f69103ff38324b26e7ba146de8121ad4d5

Resolves: tdf#109409 TEXTJOIN() CONCAT() handle array/matrix row-wise

It will be available in 7.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Commit Notification 2020-08-27 18:33:55 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-7-0-1":

https://git.libreoffice.org/core/commit/cd7243cd2a1d7790832fbd7cfd43d1870b6c544a

Resolves: tdf#109409 TEXTJOIN() CONCAT() handle array/matrix row-wise

It will be available in 7.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Eike Rathke 2020-08-28 12:00:23 UTC
Added to 7.0 release notes
https://wiki.documentfoundation.org/ReleaseNotes/7.0#Changed_spreadsheet_functions
Comment 21 Wolfgang Jäger 2020-08-30 10:05:46 UTC
Verified with 7.0.1.2
Comment 22 Commit Notification 2020-09-03 09:49:31 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/da157e5e723cfa329b7f149c37c4c36c1444ca6f

tdf#109409: sc_subsequent_filters_test: Add unittest

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.