I'm having problems web scraping https sites using LibreOffice python. I have Libreoffice 5.3.4.2 (x86) on Windows 7, and can demonstrate the problem with this simple script: import urllib.request myUrl = 'https://ask.libreoffice.org/en/questions/' hdr = {'User-Agent': 'Mozilla/5.0'} req = urllib.request.Request(url=myUrl, headers=hdr) response = urllib.request.urlopen(req) This fails immediately with "urlopen error unknown url type: https". It works fine with an http url, but fails with any https url. I tried the above in a LibreOffice Calc document with this embedded script and it failed. It also failed when I tried running it in a terminal window from C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\bin\python.exe The script works fine with my standalone Python 3.3.2 running from a terminal window. I've also tried various LibreOffice Portable installations I have: 4.0.2.2: Works OK 5.3.1.2: Fails 5.3.2.2: Fails I've tried uninstalling and reinstalling 5.3.4.2 more times than I can count and cannot get it to work. Yet installing it on Windows 10 on the same PC using a VM machine, it works fine. I tried the Safe Mode in LibreOffice 5 and the script works fine. Went back to normal mode and it failed again. Uninstalled LibreOffice 5.4.3.2 and then deleted everything I could find relating to LibreOffice. Reinstalled 5.4.3.2 x86 and the behaviour is unchanged... works OK in Safe Mode and fails in normal mode. I did find a fix/workaround: I renamed _ssl.pyd in C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\lib\ to _ssl.pyd(old). I then copied _ssl.pyd from my standalone Python installation at C:\Program Files (x86)\Python\DLLs\ and pasted it into the above folder. LibreOffice now works OK, even though the original _ssl.pyd was just 48kB and the replacement is 1162kB so they are very different. Any idea why I am getting this problem on Windows 7?
I also tried LibreOffice Portable 5.3.4 and have the same problem
do you perhaps have some files "ssleay32.dll" and "libeay32.dll" somewhere in C:/Windows ? that is known to ruin one's day. you could try to verify that the correct dlls (from the LO installation's "program" directory) are loaded by running Dependency Walker from http://www.dependencywalker.com/ and looking at the _ssl.pyd file - that should display *errors* about ssleay32.dll and libeay32.dll and python35.dll not being found - if these libraries *are* found somewhere that's a problem. (when you run LO, they are found via %PATH%, which has a lower priority than system directories)
Tip from L. Godard: https://bugs.documentfoundation.org/show_bug.cgi?id=77354#c2
Created attachment 135759 [details] Dependency Walker view of _ssl.pyd
I did not find any occurrences of "ssleay32.dll" or "libeay32.dll" in "C:/Windows". I can only find them in subfolders of "C:/Program Files" and "C:/Program Files (x86)" or "My Documents/Portable Apps/" I tried Dependency Walker (though I confess that I don't really understand the detail of its operation). I used Dependency Walker to open file _ssl.pyd and the results are shown in the attached Calc document "DependencyWalker Bug 109241.ods"
so Dependency Walker found these dlls: c:\program files (x86)\intel\icls client\SSLEAY32.DLL c:\program files (x86)\intel\icls client\LIBEAY32.DLL that is interesting... now the question is, does the soffice.bin process also find these libraries, or the ones bundled with LO; presumably the directory "c:\program files (x86)\intel\icls client" is contained in the $PATH variable? ... actually it looks like the soffice.exe PATH extension code was removed with commit 827430c8c0417396b3c1d2a049ccddb818c89646 (which removed "URE/bin") and earlier commit b786a33cfdca2e8a4114ddef0340e0e0628dd09c (which removed "program"), so the PATH is passed unchanged to soffice.bin ... there is this call in sal_detail_initialize() which is one of the first things main() does: p = GetProcAddress(h, "SetDllDirectoryW"); if (p != nullptr) { reinterpret_cast< BOOL (WINAPI *)(LPCWSTR) >(p)(L""); } https://msdn.microsoft.com/en-us/library/windows/desktop/ms686203(v=vs.85).aspx claims that that will remove the current directory from the search order, but that contradicts experimental evidence on Windows 7; the effect is that the current directory is moved up in the search order. the LO osl_loadModule functions will do 2 calls to load a library: h = LoadLibraryW(reinterpret_cast<LPCWSTR>(Module->buffer)); if (h == nullptr) h = LoadLibraryExW(reinterpret_cast<LPCWSTR>(Module->buffer), nullptr, LOAD_WITH_ALTERED_SEARCH_PATH); the first one uses the default search order, which is: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682586(v=vs.85).aspx "If SafeDllSearchMode is disabled, the search order is as follows: 1. The directory from which the application loaded. 2. The current directory. 3. The system directory. Use the GetSystemDirectory function to get the path of this directory. 4. The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched. 5. The Windows directory. Use the GetWindowsDirectory function to get the path of this directory. 6. The directories that are listed in the PATH environment variable. Note that this does not include the per-application path specified by the App Paths registry key. The App Paths key is not used when computing the DLL search path." however, when CPython loads a .pyd module, it does it differently: workdir/UnpackedTarball/python3/Python/dynload_win.c /* We use LoadLibraryEx so Windows looks for dependent DLLs in directory of pathname first. */ hDLL = LoadLibraryExW(wpathname, NULL, LOAD_WITH_ALTERED_SEARCH_PATH); this doesn't use default search order but this one: "If SafeDllSearchMode is disabled, the alternate search order is as follows: 1. The directory specified by lpFileName. 2. The current directory. 3. The system directory. Use the GetSystemDirectory function to get the path of this directory. 4. The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched. 5. The Windows directory. Use the GetWindowsDirectory function to get the path of this directory. 6. The directories that are listed in the PATH environment variable. Note that this does not include the per-application path specified by the App Paths registry key. The App Paths key is not used when computing the DLL search path." notably, step 1 is now the directory of _ssl.pyd, not the directory of soffice.exe or python.exe ("program"); at first glance it appears that the latter is not searched at all, but in that case it would never work... aha, it works because soffice.exe forces the current working directory of soffice.bin to "program" via the 8th parameter of CreateProcess, so it hits step 2. if i try this out i can see in Process Monitor that libeay32.lib is first searched in the "program/python-core-3.5.4/lib" directory, and then in "program", where it is found. but: with all of this, it's still a mystery how the file "c:\program files (x86)\intel\icls client" could be found via PATH, because the "program" directory comes before it in both search orders.
maybe this will give some insight: download Process Monitor from: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon run it, enable tracing with the "magnifying glass" button and then run your "urllib" script, then disable tracing again this will log a huge number of events, the interesting ones are from the first one that mentions "_ssl.pyd" until the last one that mentions "libeay32.dll" (about 25 lines here); use Edit->Find to search, select them, use Edit->Copy and paste them into a text file, attach it here.
Stephan points out that i somehow missed this bit in the description: "I tried the Safe Mode in LibreOffice 5 and the script works fine. Went back to normal mode and it failed again." some highly creative speculation: * you have an extension installed that bundles its own OpenSSL DLLs * your OpenGL driver bundles its own OpenSSL DLLs both of these would be disabled by "Safe Mode" and could lead to the OpenSSL DLLs already being loaded before the _ssl Python module is loaded, in which case Windows will not load LO's bundled OpenSSL DLLs. using "Process Monitor" as described in comment #7 could clarify the situation.
I've downloaded Process Monitor and run some tests. I'll attached 3 files: 1...Demo_of_bug_109241.ods Contains a macro which exhibits my reported problem 2...ProcMon_for_bug109241_normal mode.txt which contains the ProcMon records for the standard version 5.3.4.2 3...ProcMon_for_bug109241_modified.txt which contains the ProcMon records for 5.3.4.2 with the normal _ssl.pyd replaced by that from LibreOffice Portable 4.0.2.2 In each case, I had Demo_of_bug_109241.ods open, started the ProcMon capture, ran the macro and then stopped the ProcMon capture. I then extracted the ProcMon records from the first occurrence of _ssl.pyd to the last occurrence of LIBEAY32.DLL As previously reported, the standard version of 5.3.4.2 fails with the macro, but the modified version works OK. From my limited understanding, it seems to me that the correct instance of _ssl.pyd is being used. However, I do wonder about LIBEAY32.DLL. in both ProcMon reports, there seems to be many reference to LIBEAY32.DLL "NAME NOT FOUND". Is this normal? I tried getting ProcMon records for the same process when running LibreOffice in safe mode. The macro ran fine, but I could not see any reference to _ssl.pyd or LIBEAY32.DLL The only non-standard extension I have installed is this: https://extensions.libreoffice.org/extensions/apso-alternative-script-organizer-for-python
Created attachment 135882 [details] Macro to demonstrate the problem See comment 9
Created attachment 135883 [details] Procmon results See comment 9
Created attachment 135884 [details] ProcMon results See comment 9
the important part from attachment https://bugs.documentfoundation.org/attachment.cgi?id=135883 soffice.bin 7992 CreateFile C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\lib\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Windows\SysWOW64\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Windows\system\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Windows\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Program Files (x86)\LibreOffice 5\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\ProgramData\Oracle\Java\javapath\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Program Files (x86)\PC Connectivity Solution\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Perl64\site\bin\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Perl64\bin\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Program Files\Common Files\Microsoft Shared\Windows Live\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Program Files (x86)\Common Files\microsoft shared\Windows Live\LIBEAY32.dll NAME NOT FOUND soffice.bin 7992 CreateFile C:\Program Files (x86)\Intel\iCLS Client\libeay32.dll SUCCESS not sure how this entry got in there "C:\Program Files (x86)\LibreOffice 5\LIBEAY32.dll" - it shouldn't be the current dir? this is what you would expect if the call to SetDllDirectoryW() from sal_detail_initialize() worked as documented. ... which it does now, for me, too. no, really: last week as described in comment #6 i saw the current directory being searched before the windows directories (which weren't searched at all until i disabled that SetDllDirectory call), but today i see a very similar results to what you see, and the "program" directory is searched at the end of the PATH. i haven't changed anything (other than pull the latest LO master) - no idea why this changed. but, well, good to learn that windows at least sometimes behaves as documented :)
okay i think i've fixed this on master, at least as far as $PATH is concerned. you can still screw it up by putting OpenSSL in C:/Windows/System/ and such but hopefully nobody does that.
Michael Stahl committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=9990e98d67bf14003cde8f0138d2dcfa804406ac tdf#109241 desktop: Win32: prepend "program" dir to $PATH It will be available in 6.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Michael Stahl from comment #14) > > you can still screw it up by putting OpenSSL in C:/Windows/System/ and such > but hopefully nobody does that. There is in fact another bugzilla where exactly that happens. Some other idiot program installs it's SSL DLLS into the System folde.
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-5-3": http://cgit.freedesktop.org/libreoffice/core/commit/?id=4ce1f36e6f4fd7ea923cf2ae81895f6e45919ba6&h=libreoffice-5-3 tdf#109241 desktop: Win32: prepend "program" dir to $PATH It will be available in 5.3.7. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-5-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=e510fbc21f6dec877cda04e17f1433f09fa00066&h=libreoffice-5-4 tdf#109241 desktop: Win32: prepend "program" dir to $PATH It will be available in 5.4.2. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
*** Bug 108316 has been marked as a duplicate of this bug. ***