Bug Hunting Session
Bug 46480 - Calc 3.5 Cannot Filter Date/Time Fields from Data Sources
Summary: Calc 3.5 Cannot Filter Date/Time Fields from Data Sources
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: high major
Assignee: Lionel Elie Mamane
URL:
Whiteboard: target:3.7.0 target:3.6.2
Keywords: regression
Depends on:
Blocks: mab3.5
  Show dependency treegraph
 
Reported: 2012-02-22 13:42 UTC by Steven Shelton
Modified: 2012-08-30 03:51 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven Shelton 2012-02-22 13:42:01 UTC
Recreate thusly:

1. Register a data source (say, a Base file).
2. Launch Calc.
3. Hit F4 to open the Data Sources panel.
4. Select a table from your data source.
5. Click on the "Standard Filter" icon on the Data Sources toolbar.
6. Enter search criteria (i.e., "Date >= 02/01/2012").
7. Hit OK.

Expected result: Only those rows in the table matching the criteria are displayed.

Actual result: All of the data is displayed, as if no filter was run at all. If you click on the Standard Filter button again, the dialog will show that the changes you made to the filter when you last opened the dialog did not "stick".

Interestingly, the data source filter seems to work fine in LibO 3.4.5 and works in Writer in 3.5; as far as I can tell, only Calc is affected. This bug also affects Ubuntu installs, so I am assuming it's a problem on all operating systems.

I've debated whether this was a "Major", "Critical", or "Blocker" bug. I decided on the lower of the three levels--since it affects only one component of the suite (Calc) and still allows functionality of a great deal of the rest of the suite--but it's enough of a blocker for my use of the software that I have to remove 3.5 and revert to 3.4.5 (where this problem did not occur).
Comment 1 Urmas 2012-02-22 17:20:33 UTC
Which driver/database is used?
Comment 2 Steven Shelton 2012-02-22 17:28:02 UTC
Yeah, that would probably be important. Connecting to a HSQL database via the java connector.
Comment 3 Steven Shelton 2012-03-08 17:50:12 UTC
Can this be changed to confirmed? I've seen at least three other reports from people about this bug now on the listservs.
Comment 4 sasha.libreoffice 2012-04-04 09:16:38 UTC
reproduced in 3.3.4 and 3.5.2 on Fedora 64 bit
used build in database Bibliography, column "Pages", filters <300 and >300
Filter almost works, but with some errors. May be it is different bug
Comment 5 sasha.libreoffice 2012-04-04 22:15:41 UTC
@ Lionel
Sorry for interrupt. What do You think about this bug?
Comment 6 Lionel Elie Mamane 2012-04-04 22:41:05 UTC
Setting to NEW (confirmed) as per comment 4.

I'm about to go to vacation in an hour and a half, I'll try to take a look after I come back (16 April).
Comment 7 sasha.libreoffice 2012-04-05 01:45:53 UTC
Thanks for answer and I wish You happy vacations
Comment 8 Steven Shelton 2012-07-30 20:54:43 UTC
Long vacation!   :-)
Comment 9 Steven Shelton 2012-08-14 16:22:19 UTC
I should add that I have confirmed this bug is present in LibO 3.6 as well.
Comment 10 Noel Power 2012-08-16 12:03:08 UTC
interestingly it seems to work for me on master ( beware I am clueless about base but was looking at another bug that provided an odb document ) I registered that doc ( very simple a single table with text and integer fields with just 2 rows ) as a datasource. On master I was able to follow the instructions from comment #1 ( filtered the integer field with '> than' and the table was filtered as expected, also the filter seems to 'remember' the filter details, I see the same from 3.6.0.4 e.g. works as expected. You say "Connecting to a HSQL database via the java connector." Is there some way to set that up ( or is it how I understand that this is the what is used anyway when using an odb file ( e.g. internal db ) ) 
Or perhaps I am just doing the completely wrong thing, the example file I used was  https://bugs.freedesktop.org/attachment.cgi?id=49130
Comment 11 Steven Shelton 2012-08-17 12:20:30 UTC
I think your database was too small for the problem to arise. You won't have much to filter with only two rows. I tested it with the built-in Bibliography database in the latest 3.6.1 release (I can't tell if it's RC1 or RC2), and THAT one worked . . . but again, it's a smaller database and it's not an external database that has been registered. I will do further testing with the 3.6.1 release candidate when I am at my office and have access to my real databases.
Comment 12 Steven Shelton 2012-08-17 14:22:32 UTC
Okay. Checked at my office. Here's what I discovered in the latest RC of 3.6.1:
the filter works in Calc for most things. However, certain types of data cannot
be filtered. For instance, dates and times cannot be filtered. You get the
exact behaviour I noted in the first report. It may actually be that the bug
was related only to date/time filtering in 3.5 as well . . . to be honest, that
was what I was testing and I don't know if I ever thought to test another type
of field and maybe never noticed I hadn't tested other types.
Comment 13 Noel Power 2012-08-17 15:19:21 UTC
(In reply to comment #12)
> Okay. Checked at my office. Here's what I discovered in the latest RC of 3.6.1:
> the filter works in Calc for most things. However, certain types of data cannot
> be filtered. For instance, dates and times cannot be filtered. You get the
> exact behaviour I noted in the first report. It may actually be that the bug
> was related only to date/time filtering in 3.5 as well . . . to be honest

certainly dates seem to be a problem here, I can reproduce the problem with the simple test db doc attached earlier ( I just added a new date field and created a couple of entries to filter )

I am taking this bug for the moment, if it's happening in the c++ code I reckon I can have a good crack at fixing it, if I have no luck then I will return the bug to the pool again.
Comment 14 Noel Power 2012-08-17 17:37:16 UTC
hmm ok, seems like when the filter query is formed ( for a query of date > 1/1/1999 ) we get

"( "mytable"."date" > '{D ''1999-01-01'' }' )"

in 3.4 the query is 

"( "mytable"."date" > {D '1999-01-01' } )"

notice the extra '(s) ( presumably these are used to escape ? ) I only speculate as I am not that familiar with query stuff. Anyway, hardcoding the old type filter string and it works.
So.. I'll try and take a look on monday to find where query is built to see if I can spot anything obviously bogus there
Comment 15 Lionel Elie Mamane 2012-08-17 17:42:23 UTC
(In reply to comment #14)
> hmm ok, seems like when the filter query is formed ( for a query of date >
> 1/1/1999 ) we get
> 
> "( "mytable"."date" > '{D ''1999-01-01'' }' )"
> 
> in 3.4 the query is 
> 
> "( "mytable"."date" > {D '1999-01-01' } )"
> 
> notice the extra '(s) ( presumably these are used to escape ? )

Yes, the single quotes are the SQL string delimiters; they are doubled inside a string to be escaped. It looks like the date literal is constructed (as a LibreOffice internal string), but then treated as a string (so encoded as an SQL string).

The {D ...} is the ODBC syntax for date literals.
Comment 16 Noel Power 2012-08-20 20:52:00 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > hmm ok, seems like when the filter query is formed ( for a query of date >
> > 1/1/1999 ) we get
> > 
> > "( "mytable"."date" > '{D ''1999-01-01'' }' )"
> > 
> > in 3.4 the query is 
> > 
> > "( "mytable"."date" > {D '1999-01-01' } )"
> > 
> > notice the extra '(s) ( presumably these are used to escape ? )
> 
> Yes, the single quotes are the SQL string delimiters; they are doubled inside a
> string to be escaped. It looks like the date literal is constructed (as a
> LibreOffice internal string), but then treated as a string (so encoded as an
> SQL string).
> 
> The {D ...} is the ODBC syntax for date literals.

http://opengrok.libreoffice.org/xref/core/dbaccess/source/core/api/SingleSelectQueryComposer.cxx#1474

   
   1496                     else
   1497                     {
   1498                         sValue = i_aPredicateInputController.getPredicateValue(pAndIter->Name,sValue,sal_True);
   1499                     }

the else clause here seems largely responsible for the double quoting. Note the whole if/else clause has been added new since 3.4. We double up with the because the sValue has already been quoted from a previous getPredicateValue from http://opengrok.libreoffice.org/xref/core/dbaccess/source/ui/dlg/queryfilter.cxx#360

removing that else clause of course helps in this scenario but I have no idea what implications there might be for doing that, perhaps some heuristics [*] could be used to prevent the code in the else block running. But I have little idea.

@lionel would such heuristics work or is there a better 'real' solution

[*] some checking if sValue starts with '{' and ends with '}' or starts with '{D' and ends with '}'
Comment 17 Lionel Elie Mamane 2012-08-21 06:02:36 UTC
(In reply to comment #16)
> (In reply to comment #15)
>> (In reply to comment #14)
>>> hmm ok, seems like when the filter query is formed ( for a query of date >
>>> 1/1/1999 ) we get
>>> 
>>> "( "mytable"."date" > '{D ''1999-01-01'' }' )"
>>> 
>>> in 3.4 the query is 
>>> 
>>> "( "mytable"."date" > {D '1999-01-01' } )"
>>> 
>>> notice the extra '(s) ( presumably these are used to escape ? )

> http://opengrok.libreoffice.org/xref/core/dbaccess/source/core/api/SingleSelectQueryComposer.cxx#1474
> 
> 
>    1496                     else
>    1497                     {
>    1498                         sValue =
> i_aPredicateInputController.getPredicateValue(pAndIter->Name,sValue,sal_True);
>    1499                     }
> 
> the else clause here seems largely responsible for the double quoting. Note the
> whole if/else clause has been added new since 3.4. We double up with the
> because the sValue has already been quoted from a previous getPredicateValue
> from
> http://opengrok.libreoffice.org/xref/core/dbaccess/source/ui/dlg/queryfilter.cxx#360

> removing that else clause of course helps in this scenario but I have no idea
> what implications there might be for doing that, perhaps some heuristics [*]
> could be used to prevent the code in the else block running.

> @lionel would such heuristics work or is there a better 'real' solution
> 
> [*] some checking if sValue starts with '{' and ends with '}' or starts with
> '{D' and ends with '}'

The cleanest (and maybe only "true") solution is to be clear as to when things are escaped.
I see three solutions:

 - When lcl_getCondition is called, all values are already SQL-encoded/escaped,
   it should *never* encode/escape things again.
   From looking at it for 5 minutes, this looks like the general design
   of the stuff; it would be interesting to see why the encoding again
   was added here in 3.4.

 - When lcl_getCondition is called, nothing is SQL-encoded/escaped,
   it should *always* encode/escape things.
   Also, the way it currently goes at it, it seems to me that
   it loses information by first extracting a string from the value,
   and then SQL-encoding that string: it lost the information of the
   type of the value. If the original value was a date, then the ODBC
   SQL-encoding of it is "{D YYYY-MM-DD}", and that's it.
   If it was a string to begin with, it should be quote-escaped and all that.

 - If for some deep/weird reason none of the above is possible,
   then some boolean flag that tells whether it should encode/escape.

I'll try to take a look at it today.
Comment 18 Lionel Elie Mamane 2012-08-21 06:17:58 UTC
Also, the implementation of OPredicateInputController::getPredicateValue
http://opengrok.libreoffice.org/xref/core/connectivity/source/commontools/predicateinput.cxx#332
looks highly suspicious in part:

    343         if ( nType == DataType::OTHER || sField.isEmpty() )
    344         {
    345             // first try the international version
    346             ::rtl::OUString sSql;
    347             sSql += ::rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("SELECT * "));
    348             sSql += ::rtl::OUString(RTL_CONSTASCII_USTRINGPARAM(" FROM x WHERE "));
    349             sSql += sField;
    350             sSql += _rPredicateValue;
    351             ::std::auto_ptr<OSQLParseNode> pParseNode( const_cast< OSQLParser& >( m_aParser ).parseTree( sError, sSql, sal_True ) );
    352             nType = DataType::DOUBLE;
    353             if ( pParseNode.get() )
    354             {
    355                 OSQLParseNode* pColumnRef = pParseNode->getByRule(OSQLParseNode::column_ref);
    356                 if ( pColumnRef )
    357                 {
    358                 }
    359             }
    360         }

The whole outer if here seems to no nothing useful, the inner if has *no* body (and no side effect in the condition).
Comment 19 Noel Power 2012-08-21 08:55:51 UTC
(In reply to comment #17)
> The cleanest (and maybe only "true") solution is to be clear as to when things
> are escaped.
> I see three solutions:
> 
>  - When lcl_getCondition is called, all values are already SQL-encoded/escaped,
>    it should *never* encode/escape things again.
>    From looking at it for 5 minutes, this looks like the general design
>    of the stuff; it would be interesting to see why the encoding again
>    was added here in 3.4.
see http://cgit.freedesktop.org/libreoffice/core/commit/?id=897ade45672a36ffd278b1cb9ba21694c0ee45c7
> 
>  - When lcl_getCondition is called, nothing is SQL-encoded/escaped,
>    it should *always* encode/escape things.
my gut feeling is that this the above is the case, it appears some effort ( in addition to the commit above ) was probably made at some stage to amalgamate the code in dbaccess and here ( of course I may be completely wrong ) and that that there remain some 'holes' in that rework 
>  - If for some deep/weird reason none of the above is possible,
>    then some boolean flag that tells whether it should encode/escape.
the uno api documentation ( since this stuff is triggered from the uno api ) didn't throw any light onto things for me.

Looking at this again I *think* that the desired query result might be obtained by multiple calls to uno method 'appendFilterByColumn' which seems to it might more or less do the same thing as what DlgFilterCrit::BuildWherePart() is attempting to achieve. However at this stage I think I will respectfully bow out of this bug ( I only dropped by because it was a regression marked against spreadsheet ) Since I started looking I just continued but I am sorry I didn't really progress it much further. Although I can understand the what the code does I don't understand the why :-) I am woefully non-base enabled and feel I just lack to problem domain knowledge to make the calls here. I am reassigning to you Lionel, I am sure you will have the best idea for a solution. Thanks!!
Comment 20 Lionel Elie Mamane 2012-08-21 13:36:40 UTC
(In reply to comment #16)

> We double up with the
> because the sValue has already been quoted from a previous getPredicateValue
> from
> http://opengrok.libreoffice.org/xref/core/dbaccess/source/ui/dlg/queryfilter.cxx#360

> @lionel would such heuristics work or is there a better 'real' solution
 
> [*] some checking if sValue starts with '{' and ends with '}' or starts with
> '{D' and ends with '}'

My testing shows that one will there see the database-specific date format:

1) for ODBC it will be as you described
2) for PostgreSQL-SDBC 'YYYY-MM-DD'
3) for embedded HSQLDB, it will be #DD/MM/YYY# (or is it #MM/DD/YYYY#? Not sure)


The whole structure of that code is mightily weird:

It will parse the string (putting a ColumnName = Value, even if the operator is not =, but e.g. >=), only to convert it back to a string again (just the Value part). Not sure why that is. Is this intended as some kind of canonicalisation? These functions clearly expect already canonicalised data, so <shrug>. One also clearly sees in the UI that the data is canonicalised on entry, immediately after focus moves out of the text entry area.

The value is always treated as a string instead of keeping its native type (it is an Any! you can shove a date in there!), which smells fishy.

I'm trying very hard to find a way to touch this mess without completely rewriting it, in a way that would be OK for the stable branch.
Comment 21 Not Assigned 2012-08-21 17:35:24 UTC
Lionel Elie Mamane committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a598600a494b9fe9548a6cd909b453376ead0b04

fdo#46480 do not double-escape (e.g. date) literals as strings
Comment 22 Noel Power 2012-08-22 08:44:12 UTC
(In reply to comment #21)
> Lionel Elie Mamane committed a patch related to this issue.
> It has been pushed to "master":
> 
> http://cgit.freedesktop.org/libreoffice/core/commit/?id=a598600a494b9fe9548a6cd909b453376ead0b04
> 
> fdo#46480 do not double-escape (e.g. date) literals as strings

urgh! it turned out to be that simple in the end :( ( I did see that boolean influenced things, but the name of the '_bForStatementUse' made me think that should/could not be changed ) anyway, thanks for fixing
Comment 23 Not Assigned 2012-08-22 15:33:26 UTC
Lionel Elie Mamane committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0ada624c20b44bdc2bea751a6c2f6e47af0ae4f1&g=libreoffice-3-6

fdo#46480 do not double-escape (e.g. date) literals as strings


It will be available in LibreOffice 3.6.2.