Bug 46180 - FILEOPEN - LO Base fails to connect to "*.DBF" files but connects OK to "*.dbf"
Summary: FILEOPEN - LO Base fails to connect to "*.DBF" files but connects OK to "*.dbf"
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyMedium, easyHack, skillCpp
: 58722 (view as bug list)
Depends on:
Blocks: Database-Connectivity
  Show dependency treegraph
 
Reported: 2012-02-16 08:26 UTC by Aleksey
Modified: 2023-12-08 15:21 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
The example of odb and dbf files. The odb is a connection to dbase. LO recogizes table sample_base.dbf but doesn't see sample_base_bug.DBF. (331.30 KB, application/x-tar)
2012-02-19 02:58 UTC, Aleksey
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksey 2012-02-16 08:26:59 UTC
If the dBase file is like "*.DBF" LO Base doesn't recoginze it. If the dBase file is like "*.dbf" it opens it OK.
Comment 1 Julien Nabet 2012-02-19 01:17:06 UTC
Did you try to open LO from console to see if there were some messages (error or warn messages) ?
Could you attach a dbf file ?
Comment 2 Aleksey 2012-02-19 02:58:23 UTC
Created attachment 57271 [details]
The example of odb and dbf files. The odb is a connection to dbase. LO recogizes table sample_base.dbf but doesn't see sample_base_bug.DBF.

Here's an example odb file in attachment.

It seems the my terminal doesn't contain anything connected to dbase:

(soffice:23687): Gtk-WARNING **: /opt/libreoffice3.5/program/../ure-link/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /usr/lib64/gtk-2.0/2.10.0/engines/liboxygen-gtk.so)

(soffice:23687): Gtk-WARNING **: /opt/libreoffice3.5/program/../ure-link/lib/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by /usr/lib64/gtk-2.0/2.10.0/engines/liboxygen-gtk.so)
Comment 3 Julien Nabet 2012-02-19 05:46:18 UTC
I reproduced the pb on pc Debian x86-64 on master.

Then, trying to understand, I thought first it was a pb here in file dbaccess/source/ui/dlg/dbfindex.cxx, function ODbaseIndexDialog::Init()
so I changed the comparison so it ignores upper or lower case but no result.
(however perhaps it should nevertheless ignore upper or lower case) 

Then I found thanks to Opengrok and gdb that the pb was in FDatabaseMetaData.cxx, function ODatabaseMetaData::getTables.
line 288 : if ( sThisContentExtension == aFilenameExtension )
is false because sThisContentExtension = "DBF"
and aFilenameExtension = "dbf"

I must recognize that for the moment, I haven't tried to understand the boolean variables : bKnowCaseSensivity and bKnowCaseSensivity which would permit to change sThisContentExtension to have dbf.
Comment 4 Lionel Elie Mamane 2012-02-19 15:50:12 UTC
Why is this a bug in the first place? If the proper extension for a dbase table file is ".dbf", why should LibreOffice recognise any other extension? On a case-sensitive filesystem, ".dbf" and ".DBF" are not the same, and could point to two different files.

Julien: bKnowCaseSensivity seems to be meant as "we know whether this filesystem is case-sensitive (or not)" and bCaseSensitiveDir "this directory is on a case-sensitive filesystem".

But indeed, this check seems to be buggy anyway. Even on a case-insensitive filesystem (on Unix, e.g. a CIFS mount of a Windows server), it still sets bCaseSensitiveDir to sal_True, and that's a bug and a real-world problem because e.g. a Windows/CIFS mount may have arbitrary mixed case.

It comes down to line 166 of connectivity/source/drivers/file/FDatabaseMetaData.cxx, function isCaseSensitiveParentFolder:

  if ( 0 == xProvider->compareContentIds( xID1, xID2 ) )

this should return 0 (*only* on a case-insensitive filesystem), but returns non-zero. Stepping it in GDB, it ends up in fileaccess::FileProvider::compareContentIds, which just (case-sensitively) compares URLs... It should rather do a stat() call and compare the respective st_dev and st_ino fields. That is, on Unix. I don't know what it should do on Windows... There is a _stat, but its st_ino is useless (always zero). A few quickly googled links:

http://stackoverflow.com/questions/1866454/unique-file-identifier-in-windows
http://stackoverflow.com/questions/3892592/unique-file-identifiers-on-ntfs-and-object-id
http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=42340
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952%28v=vs.85%29.aspx

The GetFileInformationByHandle / dwVolumeSerialNumber, nFileIndexHigh, nFileIndexLow route seems the most promising?
Comment 5 Lionel Elie Mamane 2012-02-20 04:55:34 UTC
I had an illumination: dBase files are actually reliably content-sniffable (one can look at a small part of the *contents* of the file and know for nearly sure if whether is a dBase file or not), so we should actually do content sniffing instead of mucking with filename extensions at all.

But the described problem will remain for CSV-style files anyway, so we still need a solution.

(The code involved here, in connectivity/source/drivers/file/FDatabaseMetaData.cxx, is shared between dBase and CSV-style databases.)
Comment 6 Robert Großkopf 2012-12-29 09:08:21 UTC
*** Bug 58722 has been marked as a duplicate of this bug. ***
Comment 7 Julien Nabet 2014-10-12 09:04:37 UTC
No one is assigned or at least has answered for more than 2 years, resetting "Assigned to" field.
Comment 8 Alex Thurgood 2015-01-03 17:39:03 UTC Comment hidden (no-value)
Comment 9 QA Administrators 2016-01-17 20:03:58 UTC Comment hidden (obsolete)
Comment 10 Robert Großkopf 2017-01-10 19:35:25 UTC
(In reply to QA Administrators from comment #9)

Bug still appears in
Version: 5.4.0.0.alpha0+
Build ID: a3cf075880db31f77cd0550e0ee25eca931c6a40
CPU Threads: 4; OS Version: Linux 4.1; UI Render: default; VCL: kde4; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2017-01-05_01:21:50
Locale: de-DE (de_DE.UTF-8); Calc: group
Comment 11 Julien Nabet 2017-06-11 15:26:39 UTC
Rereading this, I wonder why caring about case sensitiveness?

There are interesting parts in sc about detection, see http://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/scdetect.cxx#detect and http://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/scdetect.cxx#lcl_MayBeDBase
But since we tell LO we're going to deal with Dbase files why bothering about it?

Indeed, if I create a file EXAMPLE.DBF on Windows. Then I put the file on usb key and put in on Linux. LO Base with DBase option should just open the file.

Searching about DBase, it's been created on CP/M and about this one:
"The CPM CPP module converts commands into upper case before they are executed which leads many to believe that the CPM file system is not case sensitive, when in fact the CPM file system is case sensitive."
see http://www.shaels.net/index.php/cpm80-22-documents/using-cpm/3-file-names
(sorry I didn't find more official)
So "dbf" and "DBF" should be considered as a single option.
Comment 12 QA Administrators 2018-06-12 02:33:33 UTC Comment hidden (obsolete)
Comment 13 Julien Nabet 2018-06-12 18:29:55 UTC
This bug concerns dbf but what about the other extensions?
I mean how csv/CSV files are managed? Indeed I suppose we encounter the same issues:
- open csv/CSV files in Calc or Base
- csv/CSV files may be on Windows, Linux, network shares, etc.

Eike: I noticed the function lcl_MayBeAscii in https://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/scdetect.cxx#154 but it's disabled.
Any thoughts or any idea who may help here?
Comment 14 Eike Rathke 2018-06-13 09:03:04 UTC
(In reply to Julien Nabet from comment #13)
> This bug concerns dbf but what about the other extensions?
> I mean how csv/CSV files are managed? Indeed I suppose we encounter the same
> issues:
> - open csv/CSV files in Calc or Base
> - csv/CSV files may be on Windows, Linux, network shares, etc.
There's no problem at least with Calc opening .CSV (on Linux); also, .DBF opens fine in Calc, i.e. the attached sample_base_bug.DBF
If there's still a problem it seems to be Base-only.

> Eike: I noticed the function lcl_MayBeAscii in
> https://opengrok.libreoffice.org/xref/core/sc/source/ui/unoobj/scdetect.
> cxx#154 but it's disabled.
> Any thoughts or any idea who may help here?

It was disabled when moving to the new format detection framework, see https://cgit.freedesktop.org/libreoffice/core/commit/?id=e69aa9572bb2206313cd2aa7edd13da91460f2c4

Note that was *after* this old bug was reported and this bug has nothing to do with Calc. Also, lcl_MayBeAscii() attempts to detect Unicode|Binary vs possible ASCII (or other one-byte encoded) text and isn't related here. More of interest could be the lcl_MayBeDBase() right below that tries to identify whether file content could be a dBase format.
Comment 15 Andreas Säger 2018-08-25 12:47:04 UTC
(In reply to Julien Nabet from comment #13)
> This bug concerns dbf but what about the other extensions?
> I mean how csv/CSV files are managed? Indeed I suppose we encounter the same
> issues:

Instead of studying the source code you could call File>New>Database...
[X] Connect to existing
Type: Text
In the next step you have to define the file name extension which identifies your text tables.

Why *.DBF is a problem? Because applications produce *.DBF files. Every night I copy a bunch of DBF files to a directory of dbf files so I can use data from our business application with LibreOffice.
Comment 16 QA Administrators 2019-10-08 02:27:57 UTC Comment hidden (obsolete)
Comment 17 QA Administrators 2021-10-08 03:52:48 UTC Comment hidden (obsolete)
Comment 18 Vasudev 2022-01-17 12:26:40 UTC
Bug still appears in
Version: 7.2.5.2 / LibreOffice Community
Build ID: 20(Build:2)
CPU threads: 4; OS: Linux 5.11; UI render: default; VCL: kf5 (cairo+xcb)
Locale: en-US (en_IN); UI: en-US
Ubuntu package version: 1:7.2.5~rc2-0ubuntu0.20.04.1~lo1
Calc: threaded
Comment 19 Mike Kaganski 2023-04-30 18:33:51 UTC
I do not quite understand the discussion here. This is an old bug, with a known code pointer; it may only be qualified "notabug" from a purist point of view; it is completely unclear which use case would break if the comparison were case-insensitive.
Comment 20 Robert Großkopf 2023-05-01 13:08:26 UTC
(In reply to Mike Kaganski from comment #19)
> I do not quite understand the discussion here. This is an old bug, with a
> known code pointer; it may only be qualified "notabug" from a purist point
> of view; it is completely unclear which use case would break if the
> comparison were case-insensitive.

Don't know why it isn't fixed. But it is the same behavior as *.csv. Both could be opened in Base only if written in lower case. But: Both could be opened in lower case and upper case if I try to open them with Calc.

I would prefer to get the same behavior in Base as it is in Calc.
Comment 21 Mike Kaganski 2023-05-01 13:43:03 UTC
FTR: various "language reference" documentations on http://www.dbase.com/dbasesql/dbase-documentation-download/ mention *both* .DBF *and* .dbf variants; v.2.8 mentions DBF more often than dbf.
Comment 22 Mike Kaganski 2023-05-02 05:40:30 UTC
So based on comment 3, this is an easyhack. It needs to use OUString's equalsIgnoreAsciiCase unconditionally instead of creating lowercase copies of original extensions; removal of checks for case-insensitivity of the filesystems; and changes to the connectivity::file::OConnection, which would likely not need setCaseSensitiveExtension and respective functionality. Whenever file paths are used, as usual, they must not change case when used/stored; but the file type checks should use case-insensitive comparison of extensions.

If this later reveals some breakage of an existing scenario, this could be converted to case-sensitive comparison *based on user preference* (a setting stored in ODB).
Comment 23 Adam664 2023-06-28 17:26:10 UTC
Confirming that the bug still exists in: 

Version: 7.5.3.2 (X86_64)
Build ID: 50(Build:2)
CPU threads: 1; OS: Linux 6.3; UI render: default; VCL: kf5 (cairo+xcb)
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

and

Version: 7.7.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 5a2c6f4df7149f8c1f543f120fe19bd66abfc189
CPU threads: 1; OS: Linux 6.3; UI render: default; VCL: kf5 (cairo+xcb)
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 24 kolAflash 2023-12-08 13:30:16 UTC
For a first test I only replaced
  if ( sThisContentExtension == aFilenameExtension )
with
  if ( sThisContentExtension.equalsIgnoreAsciiCase(aFilenameExtension) )
here:
https://git.libreoffice.org/core/+/55124c3dbde79ce35e0c90c0d8271117f59b93ce/connectivity/source/drivers/file/FDatabaseMetaData.cxx#267

Now the sample_base_bug.DBF from comment 2 shows up in the Tables view. But when trying to open it I get this error:

  The data content could not be loaded. at [...]/connectivity/source/commontools/dbtools.cxx:746
  The query cannot be executed. It contains no valid columns. at [...]/connectivity/source/commontools/dbexception.cxx:403

Is my code modification not OK?

Or could it be, that the DB driver internally also doesn't allow lower-case file name extensions?