Bug 56036 - DATALOSS IMPORT XLSX: CRLF (newline) in formula garbles the formula
Summary: DATALOSS IMPORT XLSX: CRLF (newline) in formula garbles the formula
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium major
Assignee: Mike Kaganski
URL:
Whiteboard: target:5.0.0
Keywords:
: 84616 88634 89350 91929 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-10-16 16:20 UTC by ckickoff
Modified: 2015-11-09 03:16 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
attachment-4252-0.html (3.28 KB, text/html)
2013-06-03 11:29 UTC, ckickoff
Details
attachment-4252-1.dat (1 bytes, multipart/alternative)
2013-06-03 11:29 UTC, ckickoff
Details
Bux OpenOffice.xlsx (8.71 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2013-06-03 11:29 UTC, ckickoff
Details
LO Newline Test.xlsx (8.50 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2015-01-21 02:16 UTC, Ed
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ckickoff 2012-10-16 16:20:55 UTC
In Excel, the following formula has been entered with CRLF to facilitate visualization.

=IF(G9=1;(SI(G27=1;G21;
                       IF(G27=2;G22;
                       IF(G27=3;G23;
                       IF(G27=4;G24;
                       IF(G27=5;G25;
                       IF(G27>5;G26;"Jahr?")))))));IF(G9=2;"";"blabla"))

After importing into Libreoffice / Openoffice, the formula is:

=IF(G9=1,(IF(G27=1,G21,0)))

CRLF's are considered as "end of formula". The same may or may not be true for other formulas.
Comment 1 Joel Madero 2012-10-19 17:18:59 UTC
Please provided a test case for us so that we can reproduce.

Marking as NEEDINFO. Once you upload a test case mark as UNCONFIRMED and I'll confirm it and prioritize. Thanks for your help
Comment 2 Joel Madero 2013-05-30 16:17:50 UTC
Dear Bug Submitter,

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team
Comment 3 ckickoff 2013-06-03 11:29:17 UTC
Created attachment 80214 [details]
attachment-4252-0.html

Sorry I didn't see the previous email. The bug appears in the attachment, a
file created with Excel 2003 saved in 2007 format, with CRLF and spaces
within nested IFS. If you open the file in Excel, it works, if you open it
in OpenOffice and recalc, all cells B2:B5 turn to 0.

Good luck.

Chris


On Thu, May 30, 2013 at 6:17 PM, <bugzilla-daemon@freedesktop.org> wrote:

>  Joel Madero <jmadero.dev@gmail.com> changed bug 56036<https://bugs.freedesktop.org/show_bug.cgi?id=56036>
>  What Removed Added  QA Contact   qa-admin@libreoffice.org
>
>  *Comment # 2 <https://bugs.freedesktop.org/show_bug.cgi?id=56036#c2> on bug
> 56036 <https://bugs.freedesktop.org/show_bug.cgi?id=56036> from Joel
> Madero <jmadero.dev@gmail.com> *
>
> Dear Bug Submitter,
>
> This bug has been in NEEDINFO status with no change for at least 6 months.
> Please provide the requested information as soon as possible and mark the bug
> as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in
> NEEDINFO status with no change in 30 days the QA team will close the bug as
> INVALID due to lack of needed information.
>
> For more information about our NEEDINFO policy please read the wiki located
> here: https://wiki.documentfoundation.org/QA/FDO/NEEDINFO
>
> If you have already provided the requested information, please mark the bug as
> UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.
>
>
> Thank you for helping us make LibreOffice even better for everyone!
>
>
> Warm Regards,
> QA Team
>
>  ------------------------------
> You are receiving this mail because:
>
>    - You are on the CC list for the bug.
>    - You reported the bug.
>
>
Comment 4 ckickoff 2013-06-03 11:29:17 UTC
Created attachment 80215 [details]
attachment-4252-1.dat
Comment 5 ckickoff 2013-06-03 11:29:17 UTC
Created attachment 80216 [details]
Bux OpenOffice.xlsx
Comment 6 Mike Kaganski 2013-09-18 12:46:57 UTC
Reproducible in 3.3.0.4 - 4.1.2.1 and AOO 4.0 under Win7x64.

I must say, that starting with 4.0.0.0.beta1, there is a progress: the spreadsheet opens with all cells having correct values; and recalculation (e.g. F9) works as if there are correct formulas in those cells (maybe internally, calc does have correct formulas for them). But the formula bar shows the same truncated formula, and if the file is saved and reopened, the cells start to show wrong values, "as expected".

This is data loss on import. And the progress, that I mentioned, may become a problem as well, as a person could open the file, check that everything is displayed correctly, edit and save the document, and suddenly it will become corrupt, with good chances to overlook this next time the document is open. Thus, I raise the severity, despite the apparent lack of duplicates.
Comment 7 Mike Kaganski 2014-03-18 08:32:24 UTC
Since 4.2.0.0.beta1, all formulas containing newline character result in Err:501 (Invalid Character).

==Some information==
MS Excel allows to use newline in formulas (using Alt+Enter).
LO allows to insert newline into a formula (using Ctrl+Enter), but automatically converts it to space character upon finished editing the cell.

OASIS OpenFormula specification v.1.2, clause 5.14 "Whitespace", allows using line feed and carriage return as whitespace characters:
> Whitespace ::= #x20 | #x09 | #x0a | #x0d
(see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1017970_715980110)

So, this bug is not an ODF limitation, but a serious defect in import filter.
Comment 8 Mike Kaganski 2014-03-18 09:53:23 UTC
Also, importing such formulas (with newlines) from XLS result in correct formulas - newlines are simply converted to spaces. So, XLSX (OOXML) specific.
Comment 9 Urmas 2014-10-03 03:53:24 UTC
*** Bug 84616 has been marked as a duplicate of this bug. ***
Comment 10 m.a.riosv 2015-01-20 23:24:22 UTC
*** Bug 88634 has been marked as a duplicate of this bug. ***
Comment 11 Ed 2015-01-21 00:12:21 UTC
I reported bug 88634.  I don't care if you mark 88634 as a duplicate or not.  However, for ME, this bug is a showstopper and must be fixed.  I don't care if you replace the ALT-ENTER with a space so long as the formulas work.

As it is, if I receive an Excel spreadsheet from someone else that has a Newline in the formula it is unusable.

In the best case LO reports an error and I have to ask the author (who is hopefully still around) or someone with a copy of Excel to provide me with the contents of the cell(s) in error.

In the worst case LO does NOT report the error and the spreadsheet provides incorrect answers.  I can't give you a specific case for this, but it did appear that LO automatically provided a closing parenthesis that had been truncated, along with many parts of a formula, after a newline.

I would GUESS that the following Excel formula would NOT provide an error in LO

=SUM(1 + 2
+ 3 + 4
+ 5)

In LO it would may appear as =SUM(1 + 2)

If you want I can try and reproduce this.  I'm just pressed for time right now.  Let me know.
Comment 12 Ed 2015-01-21 02:16:40 UTC
Created attachment 112578 [details]
LO Newline Test.xlsx

Simple Excel sheet that shows how confusing this bug is in the real world.
Comment 13 Ed 2015-01-21 02:17:47 UTC
FWIW I created an Excel sheet whose A1 cell has this formula:

=(1 + 2
+ 3 + 4 +5)

When opened with LO A1 displays Err:501, which at least tells you that something is wrong.  However, when you click on the cell its contents are:

=(1 + 2)

As a user I would go nuts trying to figure out what was wrong with "=(1 + 2"
Comment 14 Ed 2015-01-21 02:19:27 UTC
Whoops.  Missed a closing quote in the prior comment.
The LO cell's contents are "=(1 + 2)"
Comment 15 Joel Madero 2015-01-21 21:14:15 UTC
@Ed - feel free to submit a patch or pay a developer to have it resolved. Else - you'll have to wait like everyone else for a VOLUNTEER to CHOOSE to resolve it.
Comment 16 Ed 2015-01-22 02:54:30 UTC
Joe,
What is the standard rate for something such as this?

Any pointers on how to set up a low-cost development environment?  It's been a few years since I've done any serious programming.

Just for curiosity, does any of my donation money go toward bug fixes?

I've still got functional MS office, so I'm in no hurry.

I'm just a newbie who is trying to understand all this.
Comment 17 Joel Madero 2015-01-22 21:29:49 UTC
Hey Ed - 

Standard rate is pretty hard to define - for one single individual it is probably too expensive. My understanding is that developers charge between $40-$100/hr and that it can take several hours just to locate the code that is causing the problem (we have about 10 million lines of code).

As for crowd sourcing - TDF does not endorse any but there are a few that exists (I am yet to see it be very successful, everyone thinks their own bug is so serious that everyone will contribute to fix it and later find out that hardly anyone is really willing to donate ... and those that do typically donate a few bucks...well short of the needed amount). One possibility is freedom sponsor.

Lastly, donations do not pay for developers. TDF has no developers on staff for a variety of reasons. There are reasons why we should have them, but the Board has addressed this a number of times and the negatives far outweigh the positives. I should write a blog about this (I've done a similar one about crowd sourcing generally (https://joelmadero.wordpress.com/2014/09/12/why-tdf-doesnt-do-crowd-funding/).

Donations pay for a few paid staff (part time/full time) that maintain the infrastructure and coordinate projects (to some extent). It also pays for events (such as hackfests where people get together and do a "blitz" of hacking, our experience is that these are highly productive). Lastly, it pays for the actual infrastructure - tens of millions of downloads along with all the fancy tools, bug trackers, etc... require a lot of infrastructure.
Comment 18 m.a.riosv 2015-02-12 22:23:41 UTC
*** Bug 89350 has been marked as a duplicate of this bug. ***
Comment 19 Mike Kaganski 2015-04-29 13:59:44 UTC
A patch submitted to gerrit: https://gerrit.libreoffice.org/15566
Comment 20 Commit Notification 2015-04-29 16:44:15 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1b4ae9a937e25c371a6cbfcbeeb9d7c93563acc7

tdf#56036: allow calc compiler treat tabs and newlines as spacers

It will be available in 5.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 21 m.a.riosv 2015-06-08 14:18:52 UTC
*** Bug 91929 has been marked as a duplicate of this bug. ***
Comment 22 dmusil 2015-06-09 15:53:56 UTC
Looks like that in version (5.0.0 beta 1) this fix is NOT included. Works like before.
Comment 23 Mike Kaganski 2015-06-09 20:55:08 UTC
(In reply to dmusil from comment #22)
> Looks like that in version (5.0.0 beta 1) this fix is NOT included. Works
> like before.

I have just tested with beta1 - the test documents attached here work as they should (the newlines are converted to spaces on import, formulas work OK). The fix does not retain the newlines, so the formula formatting is lost, but that's to be expected - keeping newlines (and tabs) is expected to be fixed sometime later in the bug 76310.

Could you share a file that fails to work for you, along with expected and actual results you see with 5.0.0.beta1?
Comment 24 Mike Kaganski 2015-11-09 03:16:09 UTC
Checked fixed with Version: 5.0.3.2 (x64)
Build ID: e5f16313668ac592c1bfb310f4390624e3dbfb75
Locale: ru-RU (ru_RU)