Bug 109241 - python: Win32: urllib on https URLs fails due to loading wrong OpenSSL libraries
Summary: python: Win32: urllib on https URLs fails due to loading wrong OpenSSL libraries
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
5.3.4.2 release
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.0.0 target:5.3.7 target:5.4.2
Keywords:
: 108316 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-07-20 20:12 UTC by kiloran.public+bugzilla
Modified: 2017-09-21 10:10 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Dependency Walker view of _ssl.pyd (24.40 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-08-24 10:12 UTC, kiloran.public+bugzilla
Details
Macro to demonstrate the problem (18.85 KB, application/vnd.oasis.opendocument.spreadsheet)
2017-08-30 19:37 UTC, kiloran.public+bugzilla
Details
Procmon results (264.38 KB, text/plain)
2017-08-30 19:38 UTC, kiloran.public+bugzilla
Details
ProcMon results (26.18 KB, text/plain)
2017-08-30 19:38 UTC, kiloran.public+bugzilla
Details

Note You need to log in before you can comment on or make changes to this bug.
Description kiloran.public+bugzilla 2017-07-20 20:12:17 UTC
I'm having problems web scraping https sites using LibreOffice python. I have Libreoffice 5.3.4.2 (x86) on Windows 7, and can demonstrate the problem with this simple script:

import urllib.request
myUrl = 'https://ask.libreoffice.org/en/questions/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib.request.Request(url=myUrl, headers=hdr)
response = urllib.request.urlopen(req)

This fails immediately with "urlopen error unknown url type: https". It works fine with an http url, but fails with any https url.

I tried the above in a LibreOffice Calc document with this embedded script and it failed. It also failed when I tried running it in a terminal window from C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\bin\python.exe

The script works fine with my standalone Python 3.3.2 running from a terminal window.

I've also tried various LibreOffice Portable installations I have:

4.0.2.2: Works OK
5.3.1.2: Fails
5.3.2.2: Fails
I've tried uninstalling and reinstalling 5.3.4.2 more times than I can count and cannot get it to work. Yet installing it on Windows 10 on the same PC using a VM machine, it works fine.

I tried the Safe Mode in LibreOffice 5 and the script works fine. Went back to normal mode and it failed again. Uninstalled LibreOffice 5.4.3.2 and then deleted everything I could find relating to LibreOffice. Reinstalled 5.4.3.2 x86 and the behaviour is unchanged... works OK in Safe Mode and fails in normal mode.

I did find a fix/workaround:

I renamed _ssl.pyd in C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\lib\ to _ssl.pyd(old).

I then copied _ssl.pyd from my standalone Python installation at C:\Program Files (x86)\Python\DLLs\ and pasted it into the above folder.

LibreOffice now works OK, even though the original _ssl.pyd was just 48kB and the replacement is 1162kB so they are very different.

Any idea why I am getting this problem on Windows 7?
Comment 1 kiloran.public+bugzilla 2017-07-21 10:51:38 UTC
I also tried LibreOffice Portable 5.3.4 and have the same problem
Comment 2 Michael Stahl (allotropia) 2017-08-08 11:01:19 UTC
do you perhaps have some files "ssleay32.dll" and "libeay32.dll"
somewhere in C:/Windows ? that is known to ruin one's day.

you could try to verify that the correct dlls (from the LO
installation's "program" directory) are loaded by
running Dependency Walker from http://www.dependencywalker.com/
and looking at the _ssl.pyd file - that should display
*errors* about ssleay32.dll and libeay32.dll and python35.dll
not being found - if these libraries *are* found somewhere
that's a problem.

(when you run LO, they are found via %PATH%, which has a lower
priority than system directories)
Comment 3 Buovjaga 2017-08-08 11:31:07 UTC
Tip from L. Godard: https://bugs.documentfoundation.org/show_bug.cgi?id=77354#c2
Comment 4 kiloran.public+bugzilla 2017-08-24 10:12:15 UTC
Created attachment 135759 [details]
Dependency Walker view of _ssl.pyd
Comment 5 kiloran.public+bugzilla 2017-08-24 10:12:36 UTC
I did not find any occurrences of "ssleay32.dll" or "libeay32.dll" in "C:/Windows". I can only find them in subfolders of "C:/Program Files" and "C:/Program Files (x86)" or "My Documents/Portable Apps/"

I tried Dependency Walker (though I confess that I don't really understand the detail of its operation). I used Dependency Walker to open file _ssl.pyd and the results are shown in the attached Calc document "DependencyWalker Bug 109241.ods"
Comment 6 Michael Stahl (allotropia) 2017-08-25 14:10:39 UTC
so Dependency Walker found these dlls:

c:\program files (x86)\intel\icls client\SSLEAY32.DLL
c:\program files (x86)\intel\icls client\LIBEAY32.DLL

that is interesting...

now the question is, does the soffice.bin process also
find these libraries, or the ones bundled with LO;
presumably the directory
"c:\program files (x86)\intel\icls client"
is contained in the $PATH variable?

... actually it looks like the soffice.exe PATH extension code was removed with commit 827430c8c0417396b3c1d2a049ccddb818c89646 (which removed "URE/bin") and earlier commit b786a33cfdca2e8a4114ddef0340e0e0628dd09c (which removed "program"), so the PATH is passed unchanged to soffice.bin ...

there is this call in sal_detail_initialize()
which is one of the first things main() does:

        p = GetProcAddress(h, "SetDllDirectoryW");
        if (p != nullptr) {
            reinterpret_cast< BOOL (WINAPI *)(LPCWSTR) >(p)(L"");
        }
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686203(v=vs.85).aspx
claims that that will remove the current directory from the search order, but that contradicts experimental evidence on Windows 7; the effect is that the current directory is moved up in the search order.

the LO osl_loadModule functions will do 2 calls to load a library:

    h = LoadLibraryW(reinterpret_cast<LPCWSTR>(Module->buffer));
    if (h == nullptr)
        h = LoadLibraryExW(reinterpret_cast<LPCWSTR>(Module->buffer), nullptr,
                                  LOAD_WITH_ALTERED_SEARCH_PATH);


the first one uses the default search order, which is:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx

"If SafeDllSearchMode is disabled, the search order is as follows:
1.    The directory from which the application loaded.
2.    The current directory.
3.    The system directory. Use the GetSystemDirectory function to get the path of this directory.
4.    The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched.
5.    The Windows directory. Use the GetWindowsDirectory function to get the path of this directory.
6.    The directories that are listed in the PATH environment variable. Note that this does not include the per-application path specified by the App Paths registry key. The App Paths key is not used when computing the DLL search path."

however, when CPython loads a .pyd module, it does it differently:

workdir/UnpackedTarball/python3/Python/dynload_win.c

        /* We use LoadLibraryEx so Windows looks for dependent DLLs
            in directory of pathname first. */
        hDLL = LoadLibraryExW(wpathname, NULL,
                              LOAD_WITH_ALTERED_SEARCH_PATH);

this doesn't use default search order but this one:

"If SafeDllSearchMode is disabled, the alternate search order is as follows:
1.    The directory specified by lpFileName.
2.    The current directory.
3.    The system directory. Use the GetSystemDirectory function to get the path of this directory.
4.    The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched.
5.    The Windows directory. Use the GetWindowsDirectory function to get the path of this directory.
6.    The directories that are listed in the PATH environment variable. Note that this does not include the per-application path specified by the App Paths registry key. The App Paths key is not used when computing the DLL search path."

notably, step 1 is now the directory of _ssl.pyd, not the directory of soffice.exe or python.exe ("program"); at first glance it appears that the latter is not searched at all, but in that case it would never work...

aha, it works because soffice.exe forces the current working directory of soffice.bin to "program" via the 8th parameter of CreateProcess, so it hits step 2.

if i try this out i can see in Process Monitor that libeay32.lib is first searched in the "program/python-core-3.5.4/lib" directory, and then in "program", where it is found.

but: with all of this, it's still a mystery how the file "c:\program files (x86)\intel\icls client" could be found via PATH, because the "program" directory comes before it in both search orders.
Comment 7 Michael Stahl (allotropia) 2017-08-25 15:47:51 UTC
maybe this will give some insight:

download Process Monitor from:

https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

run it, enable tracing with the "magnifying glass" button and then run your "urllib" script, then disable tracing again

this will log a huge number of events, the interesting ones are from the first one that mentions "_ssl.pyd" until the last one that mentions "libeay32.dll" (about 25 lines here);
use Edit->Find to search, select them, use Edit->Copy and paste them into a text file, attach it here.
Comment 8 Michael Stahl (allotropia) 2017-08-29 10:47:11 UTC
Stephan points out that i somehow missed this bit in the description:

"I tried the Safe Mode in LibreOffice 5 and the script works fine. Went back to normal mode and it failed again."

some highly creative speculation:
* you have an extension installed that bundles its own OpenSSL DLLs
* your OpenGL driver bundles its own OpenSSL DLLs

both of these would be disabled by "Safe Mode" and could lead
to the OpenSSL DLLs already being loaded before the _ssl Python
module is loaded, in which case Windows will not load LO's
bundled OpenSSL DLLs.

using "Process Monitor" as described in comment #7 could clarify
the situation.
Comment 9 kiloran.public+bugzilla 2017-08-30 19:35:56 UTC
I've downloaded Process Monitor and run some tests. I'll attached 3 files:
1...Demo_of_bug_109241.ods Contains a macro which exhibits my reported problem
2...ProcMon_for_bug109241_normal mode.txt which contains the ProcMon records for the standard version 5.3.4.2
3...ProcMon_for_bug109241_modified.txt which contains the ProcMon records for 5.3.4.2 with the normal _ssl.pyd replaced by that from LibreOffice Portable 4.0.2.2

In each case, I had Demo_of_bug_109241.ods open, started the ProcMon capture, ran the macro and then stopped the ProcMon capture. I then extracted the ProcMon records from the first occurrence of _ssl.pyd to the last occurrence of LIBEAY32.DLL 

As previously reported, the standard version of 5.3.4.2 fails with the macro, but the modified version works OK.

From my limited understanding, it seems to me that the correct instance of _ssl.pyd is being used. However, I do wonder about LIBEAY32.DLL. in both ProcMon reports, there seems to be many reference to LIBEAY32.DLL "NAME NOT FOUND". Is this normal?

I tried getting ProcMon records for the same process when running LibreOffice in safe mode. The macro ran fine, but I could not see any reference to _ssl.pyd or LIBEAY32.DLL

The only non-standard extension I have installed is this: https://extensions.libreoffice.org/extensions/apso-alternative-script-organizer-for-python
Comment 10 kiloran.public+bugzilla 2017-08-30 19:37:23 UTC
Created attachment 135882 [details]
Macro to demonstrate the problem

See comment 9
Comment 11 kiloran.public+bugzilla 2017-08-30 19:38:02 UTC
Created attachment 135883 [details]
Procmon results

See comment 9
Comment 12 kiloran.public+bugzilla 2017-08-30 19:38:56 UTC
Created attachment 135884 [details]
ProcMon results

See comment 9
Comment 13 Michael Stahl (allotropia) 2017-08-31 21:21:21 UTC
the important part from attachment https://bugs.documentfoundation.org/attachment.cgi?id=135883

soffice.bin	7992	CreateFile	C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\lib\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Windows\SysWOW64\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Windows\system\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Windows\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Program Files (x86)\LibreOffice 5\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\ProgramData\Oracle\Java\javapath\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Program Files (x86)\PC Connectivity Solution\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Perl64\site\bin\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Perl64\bin\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Program Files\Common Files\Microsoft Shared\Windows Live\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Program Files (x86)\Common Files\microsoft shared\Windows Live\LIBEAY32.dll	NAME NOT FOUND
soffice.bin	7992	CreateFile	C:\Program Files (x86)\Intel\iCLS Client\libeay32.dll	SUCCESS

not sure how this entry got in there "C:\Program Files (x86)\LibreOffice 5\LIBEAY32.dll" - it shouldn't be the current dir?

this is what you would expect if the call to SetDllDirectoryW() from sal_detail_initialize() worked as documented.

... which it does now, for me, too.

no, really: last week as described in comment #6 i saw the current directory being searched before the windows directories (which weren't searched at all until i disabled that SetDllDirectory call), but today i see a very similar results to what you see, and the "program" directory is searched at the end of the PATH.

i haven't changed anything (other than pull the latest LO master) - no idea why this changed.

but, well, good to learn that windows at least sometimes behaves as documented :)
Comment 14 Michael Stahl (allotropia) 2017-09-01 21:57:56 UTC
okay i think i've fixed this on master, at least as far as $PATH is concerned.

you can still screw it up by putting OpenSSL in C:/Windows/System/ and such but hopefully nobody does that.
Comment 15 Commit Notification 2017-09-01 21:58:54 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9990e98d67bf14003cde8f0138d2dcfa804406ac

tdf#109241 desktop: Win32: prepend "program" dir to $PATH

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Noel Grandin 2017-09-02 06:00:17 UTC
(In reply to Michael Stahl from comment #14)
> 
> you can still screw it up by putting OpenSSL in C:/Windows/System/ and such
> but hopefully nobody does that.

There is in fact another bugzilla where exactly that happens.
Some other idiot program installs it's SSL DLLS into the System folde.
Comment 17 Commit Notification 2017-09-04 07:26:48 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=4ce1f36e6f4fd7ea923cf2ae81895f6e45919ba6&h=libreoffice-5-3

tdf#109241 desktop: Win32: prepend "program" dir to $PATH

It will be available in 5.3.7.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2017-09-04 07:26:59 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-5-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e510fbc21f6dec877cda04e17f1433f09fa00066&h=libreoffice-5-4

tdf#109241 desktop: Win32: prepend "program" dir to $PATH

It will be available in 5.4.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Aron Budea 2017-09-21 10:10:37 UTC
*** Bug 108316 has been marked as a duplicate of this bug. ***