Bug 46859 - Incorrect "str" type checks in some Python uno wrapper classes
Summary: Incorrect "str" type checks in some Python uno wrapper classes
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.5.1 RC1
Hardware: All All
: medium normal
Assignee: Michael Stahl (CIB)
URL:
Whiteboard: target:3.6.0 target:3.5.2
Keywords: regression
Depends on:
Blocks:
 
Reported: 2012-03-01 18:51 UTC by David Bolen
Modified: 2012-03-12 07:03 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Patch to revert to 3.4.5 unicode type checking (1.97 KB, patch)
2012-03-01 18:51 UTC, David Bolen
Details
patch should work both with Python 2 and 3 (2.10 KB, patch)
2012-03-05 10:23 UTC, Michael Stahl (CIB)
Details
Updated Python 3 compatible patch (2.33 KB, patch)
2012-03-05 11:59 UTC, David Bolen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Bolen 2012-03-01 18:51:42 UTC
Created attachment 57890 [details]
Patch to revert to 3.4.5 unicode type checking

It appears that some of the Python 3 compatibility changes (I believe from a09ce46818fd4d5e08b3af9a478501cd8ef5b4fe) break compatibility with the included 2.6.1 interpreter.

It looks like the bug was present in any 3.5.x release but I was affected by the the python uno issues in 3.5.0 (bug 45696) so this is the first 3.5 series libreoffice I've run the script with.  I've attached a small patch essentially returning to the 3.4.5 tests that corrects the issue, with the trade-off of no longer being Python 3 compatible.

For example, the Char class in uno.py in 3.5.1RC1 has:

    # @param value pass a Unicode string with length 1
    def __init__(self,value):
        assert isinstance(value, str)
        assert len(value) == 1
        self.value=value

Now, in Python 3, "str" will work for a unicode string, but not in Python 2.6.1.  I ran into this with an existing Writer script that that copies the DecimalChar from an existing tab stop to a new tab stop.  Even just referencing the prior DecimalChar value fails at this point, since it is a unicode string.  So you can no longer retrieve or round trip Char values, for example.

In 3.4.5 this code tested against "unicode" (or in some cases the tuple of str and unicode).

The Char class also has similar failure points in its __eq__ method as does Bool (the latter when checking for literal true/false strings).  In general, the switch from unicode to string doesn't appear safe with Python 2.x, especially as the Libreoffice internals are returning unicode.  From looking at the changeset reference above, it seems that these issues are localized to uno.py.  I'm also not sure if all needed changes were covered anyway, since ByteString is still using "str" but I don't think that's accurate for Python 3.

I'm not positive of the best way to be compatible with 2.x and 3.x with a single source easily, as neither unicode nor basestring is in 3.  Perhaps just defining a dummy unicode class in Py 3 and then always checking either str or unicode.  Or perhaps more cleanly, just separating out the classes that need to care into separate modules that are imported into the uno wrapper namespace appropriately.

But even without an immediately solution for Py3, I'd think that if given the choice at the moment, of the two, compatibility with the included Python 2.6.1 interpreter should be given preference.

-- David
Comment 1 Michael Stahl (CIB) 2012-03-05 10:23:49 UTC
Created attachment 58037 [details]
patch should work both with Python 2 and 3
Comment 2 Michael Stahl (CIB) 2012-03-05 10:28:40 UTC
hi David,

thanks for your patch, but it seems to me that it would break Python 3,
because that has no "unicode" type.

i've tried to change it in a way that it works with both Python 2 and 3,
see attachment.

tested with a little program on the bundled Python 2 (but i think
my program never used the broken features since it worked before);
please tell me if this fixes the problem for you.
Comment 3 David Bolen 2012-03-05 10:42:00 UTC
Yes, agreed, as submitted the patch just reverted to LO 3.4.5 behavior but wasn't compatible with Python 3.

With respect to the new patch, it is a little more permissive under Python 2 than LO 3.4.5 was, in that it will let non-unicode strings through Char's constructor whereas before it wouldn't.  Not sure how big a deal that is since the extension module re-checks.

Actually, having been poking at the source since opening this issue, given the
checks occurring within the extension module, it's not entirely clear to me why
the type checking in needed in the uno.py wrapper in the first place.  It seems
like PyChar2Unicode (in pyuno_type.cxx) is doing exactly the same checks in C, so another alternative would be to just remove the assertion entirely.

-- David
Comment 4 David Bolen 2012-03-05 11:59:30 UTC
Created attachment 58044 [details]
Updated Python 3 compatible patch

I did some testing with the second patch (and with no checks at all) and found I can pass through regular (non-Unicode) strings in Python 2 without the errors I expected, and the internal "value" field remains a "str", so I'm not exactly sure what is happening at the uno C++ layer without further investigation.

It's probably more conservative to keep the same type checks (enforce only unicode on construction) for now as existed in 3.4.5.

The attached patch leaves the checks the same as in 3.4.5/3.5.0, but is Python 3 compatible by aliasing unicode to str in that case.  (It's essentially the same
as my first patch but with the unicode definition)

-- David
Comment 5 Not Assigned 2012-03-07 02:21:12 UTC
David Bolen committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4634cbc237239da771e0f6a81f78171ecec726ba

fdo#46859: adapt string type checks to work with both Python 2 and 3
Comment 6 Björn Michaelsen 2012-03-11 06:17:43 UTC
@mst: The commit suggests this is RESOLVED/FIXED, please update if that is not a misconception.
Comment 7 Not Assigned 2012-03-12 06:51:52 UTC
David Bolen committed a patch related to this issue.
It has been pushed to "libreoffice-3-5":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=fc290187f08981c734d1f2d3f6649c94e3ac6f99&g=libreoffice-3-5

fdo#46859: adapt string type checks to work with both Python 2 and 3


It will be available in LibreOffice 3.5.2.