Bug 53325 - FILEOPEN "CSV" import not correctly recognizing space and text field delimiters, i.e. web log file
Summary: FILEOPEN "CSV" import not correctly recognizing space and text field delimite...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.6.0.4 release
Hardware: All All
: medium normal
Assignee: Eike Rathke
URL:
Whiteboard: target:3.7.0 target:3.6.1
Keywords: regression
Depends on:
Blocks: mab3.6
  Show dependency treegraph
 
Reported: 2012-08-10 08:29 UTC by Sean Carlos
Modified: 2012-08-21 07:47 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sean Carlos 2012-08-10 08:29:43 UTC
Apache web server log files contain text data separated by spaces; data fields containing spaces is enclosed in double quotes.

LibreOffice 3.4.x properly parses this data, 3.6.0.4 does not seem to want to do so: it only sees the first and last double quotes and dumps everything in between into one column instead of splitting it across columns.

I don't know about 3.5.x nor have I tried to confirm this on other platforms.

Example data line (w/o wrapping!)

66.249.73.206 - - [10/Aug/2012:10:03:45 +0200] "GET /news/tema/anteprima-istantanea HTTP/1.1" 500 7055 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Comment 1 Sean Carlos 2012-08-17 07:23:22 UTC
I've verified that 3.5.6.2 does not have this bug - and I noticed that a lot of work on CSV import went into 3.6, so this appears to be a regression due to work in this area.
Comment 2 Roman Eisele 2012-08-20 09:07:32 UTC
REPRODUCIBLE with LibreOffice 3.6.1.1 (Build ID: 4db6344), German langpack installed, on MacOS X 10.6.8 (Intel).

NOT reproducible with LibreOffice 3.5.6.2, therefore added keyword "regression".

What I did to test:

Using a text editor, I created a new text file containing just 4 times the example data line from comment #0 (line wrapping removed!). I saved this file as "Sample.csv". When I try to open this file with LibO, both LibO 3.5.6.2 and 3.6.1.1 recognize the file type correctly and show the "Text Import" dialog window. In the section "Separator Options", I select "Separated by" and check the check box "Space". In the field "Text delimiter", I leave the (pre-selected) ". Then I click "OK".

LibO 3.5.6.2 handles the quoted items correctly, i.e. puts
  "GET /news/tema/anteprima-istantanea HTTP/1.1"
  "-"
  "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
each in a single cell of its own.

LibO 3.6.1.1 puts all quoted items together, i.e.:
   GET /news/tema/anteprima-istantanea HTTP/1.1" 500 7055 "-" "Mozilla/5.0 
   (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
in a single cell.

To describe the problem using Regular Expression terminology, I would say that LibO 3.6.1.1 handles the " too greedy; i.e., it searches for the contents of a quoted item using
  "(.*)"
instead of using
  "([^"]*)"
Comment 3 Roman Eisele 2012-08-20 09:12:55 UTC
@Calc Team:
Hello Kohei, Markus, and Eike,
please take a look at this nasty bug. It is a regression probably introduced during the work on CSV import for LibreOffice 3.6. I hope that it should be rather easy to fix this issue -- just a simple oversight here or there ...

Thank you very much in advance!
Comment 4 Eike Rathke 2012-08-20 10:05:30 UTC
Taking over.
Comment 5 Not Assigned 2012-08-20 12:56:16 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b44a402d5a05dd32aa2e1ab80c9ea75b560dc3b9

resolved fdo#53325 CSV space delimiter and quoted field
Comment 6 Roman Eisele 2012-08-20 13:54:17 UTC
(In reply to comment #5)
> Eike Rathke committed a patch related to this issue.

Hello Eike,
thank you very much for fixing this issue so fast!

If the patch works as intended, you will backport it to 3.6.x, won’t you? ;-)
Thank you again!
Comment 8 Not Assigned 2012-08-20 15:30:42 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0e176a7411beced06ce27c5f059aa97e7de4212d&g=libreoffice-3-6

resolved fdo#53325 CSV space delimiter and quoted field


It will be available in LibreOffice 3.6.2.
Comment 9 Not Assigned 2012-08-20 22:31:15 UTC
Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-3-6-1":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=76ae3173bb16f5ce4899026bb2bed109ecee6ce4&g=libreoffice-3-6-1

resolved fdo#53325 CSV space delimiter and quoted field


It will be available already in LibreOffice 3.6.1.
Comment 10 Roman Eisele 2012-08-21 07:47:34 UTC
(In reply to comment #9)
> It will be available already in LibreOffice 3.6.1.

@Eike:
Thank you again!