Bug 127648 - LO on Linux crashes when accessing opend/locked File on SAMBA network share
Summary: LO on Linux crashes when accessing opend/locked File on SAMBA network share
Status: RESOLVED NOTOURBUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
5.1 all versions
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: wantBacktrace
Depends on:
Blocks: Network
  Show dependency treegraph
 
Reported: 2019-09-19 21:59 UTC by Jan
Modified: 2020-05-31 22:24 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
libreoffice 6.3.1~rc2-0ubuntu0.19.04.1~lo1 crashes immediatly after accessing a locked file (73.10 KB, image/png)
2019-09-19 22:04 UTC, Jan
Details
gdbtrace-6.0.7-0ubuntu0.18.04.9.log (17.75 KB, text/plain)
2019-09-20 17:34 UTC, Jan
Details
backtrace 6.4.0 alpha 2019-09-20_10.11.28 (146.83 KB, text/plain)
2019-09-20 21:21 UTC, Jan
Details
this shows LO on Linux when file is locked by LO in Windows (37.35 KB, image/png)
2019-09-21 14:45 UTC, Jan
Details
debug build backtrace (27.03 KB, text/x-log)
2019-10-31 22:12 UTC, Jan
Details
gdb output of nBytesToRead and nrc (998 bytes, text/plain)
2019-11-01 15:30 UTC, Jan
Details
strace with debug build (7.36 MB, text/x-log)
2019-11-01 19:26 UTC, Jan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan 2019-09-19 21:59:46 UTC
Description:
after crash, LO restarts and all open documents have to be recovered if anything already opened.

Steps to Reproduce:
1.setup SMB Server smb v1 or v2 or v3 with oplocking and smb2leases (tested against Samba Version 4.4.16 synology)
2. Mount shared folder for 2 users eg.
sudo mount -t cifs -o "vers=3.1.1,uid=1000,gid=1000,username=test,file_mode=0600,dir_mode=0700" //NASIP/SHAREDFOLDER         /mnt

2. User1 starts to edit file on shared Folder 
3. at the same time User2 opens same file

Actual Results:
User2 LO crashes, "loosing" all open documents.

Expected Results:
User2 should get warning message, when trying to access a locked file on server.


Reproducible: Always


User Profile Reset: Yes



Additional Info:
clientside WORKAROUND: Set all LibreOffice clients to UseDocumentSystemFileLocking=false
Comment 1 Jan 2019-09-19 22:04:18 UTC
Created attachment 154302 [details]
libreoffice 6.3.1~rc2-0ubuntu0.19.04.1~lo1 crashes immediatly after accessing a locked file
Comment 2 Julien Nabet 2019-09-20 07:22:08 UTC
Could you provide LO version? Indeed, you put 5.1 whereas the screenshot shows 6.3
Is it 6.3.0, 6.3.1?

Also, if you installed last LO version (6.3.1), would it be possible you provide a stacktrace? (see https://wiki.documentfoundation.org/QA/BugReport/Debug_Information#GNU.2FLinux:_How_to_get_a_backtrace)
Comment 3 Jan 2019-09-20 17:34:51 UTC
Created attachment 154332 [details]
gdbtrace-6.0.7-0ubuntu0.18.04.9.log

I followed this guide to get all ubuntu debug symbols for Build-ID: 1:6.0.7-0ubuntu0.18.04.9:
https://wiki.ubuntu.com/Debug%20Symbol%20Packages
sudo apt install libreoffice-*-dbgsym

Then i opened ODS file by 2nd user
soffice --backtrace /mnt/locked\ for\ crash.ods
Comment 4 Julien Nabet 2019-09-20 19:49:03 UTC
Would it be possible you upgrade to 6.3.1 by using LO ppa and provide a backtrace with this version?
Indeed, investigating on an obsolete version is not very relevant.
Comment 5 Jan 2019-09-20 21:21:46 UTC
Created attachment 154334 [details]
backtrace 6.4.0 alpha 2019-09-20_10.11.28

could not firgure out how to get ppa:libreoffice debug symbols.
I hope this helps.
Comment 6 Julien Nabet 2019-09-20 22:24:23 UTC
Thank you for your feedback.

Even if there are not argument values passed to the functions, your bt helps.

I noticed this function:
    117 sal_Int32 SAL_CALL
    118 XInputStream_impl::readBytes(
    119                  uno::Sequence< sal_Int8 >& aData,
    120                  sal_Int32 nBytesToRead )
    121 {
    122     if( ! m_nIsOpen ) throw io::IOException( THROW_WHERE );
    123 
    124     aData.realloc(nBytesToRead);
    125         //TODO! translate memory exhaustion (if it were detectable...) into
    126         // io::BufferSizeExceededException
    127 
    128     sal_uInt64 nrc(0);
    129     if(m_aFile.read( aData.getArray(),sal_uInt64(nBytesToRead),nrc )
    130        != osl::FileBase::E_None)
    131         throw io::IOException( THROW_WHERE );
    132 
    133     // Shrink aData in case we read less than nBytesToRead (XInputStream
    134     // documentation does not tell whether this is required, and I do not know
    135     // if any code relies on this, so be conservative---SB):
    136     if (sal::static_int_cast<sal_Int32>(nrc) != nBytesToRead)
    137         aData.realloc(sal_Int32(nrc));
    138     return static_cast<sal_Int32>(nrc);
    139 }

If nBytesToRead > max sal_Int32, we may get a negative value for nrc.

Noel/Stephan: considering git history of ucb/source/ucp/file/filinpstr.cxx, thought you might be interested in this one.

It may be a dup of tdf#113099
Comment 7 Jan 2019-09-21 14:45:47 UTC
Created attachment 154344 [details]
this shows LO on Linux when file is locked by LO in Windows

Thanks a lot, it seems my samba blocks access to LO-documents when they are opened/locked by LO on Linux. Access to this Document is denied and not even readable by other programs like commandline cp.

When i open/lock the same file on the same SAMBA server by LO on windows, the file is read only and 2nd LO on Linux client works as expected without crash.(see screenshot) 

Locking procedure on Linux should be checked ? 
Anyway LO on Linux should not crash even when access is denied.

Summary:
open/lock file by LO on Linux
  -> SAMBA denies access
     -> 2nd LO on linux crashs
     -> 2nd LO on Windows does not crash but says "read error"
open/lock file by LO on Windows
  -> SAMBA allows read only access
     -> 2nd LO on linux shows correct dialog (see png screenshot)
     -> 2nd LO on Windows shows correct dialog
Comment 8 Stephan Bergmann 2019-09-23 09:54:28 UTC
(In reply to Julien Nabet from comment #6)
> I noticed this function:

(in ucb/source/ucp/file/filinpstr.cxx)

>     117 sal_Int32 SAL_CALL
>     118 XInputStream_impl::readBytes(
>     119                  uno::Sequence< sal_Int8 >& aData,
>     120                  sal_Int32 nBytesToRead )
>     121 {
>     122     if( ! m_nIsOpen ) throw io::IOException( THROW_WHERE );
>     123 
>     124     aData.realloc(nBytesToRead);
>     125         //TODO! translate memory exhaustion (if it were
> detectable...) into
>     126         // io::BufferSizeExceededException
>     127 
>     128     sal_uInt64 nrc(0);
>     129     if(m_aFile.read( aData.getArray(),sal_uInt64(nBytesToRead),nrc )

The data provided so far in this issue seems to imply that m_aFile.read unexpectedly returned nrc > nBytesToRead (and large enough to overflow to a negative value with the below cast to sal_Int32).  (XInputStream_impl::readBytes being called with a negative nBytesToRead, which could presumably also lead to trouble, is ruled out by the fact that the above aData.realloc(nBytesToRead) didn't fire the "### new size must be at least 0!" assert, which only the below aData.realloc(sal_Int32(nrc)); fires.)

>     130        != osl::FileBase::E_None)
>     131         throw io::IOException( THROW_WHERE );
>     132 
>     133     // Shrink aData in case we read less than nBytesToRead
> (XInputStream
>     134     // documentation does not tell whether this is required, and I
> do not know
>     135     // if any code relies on this, so be conservative---SB):
>     136     if (sal::static_int_cast<sal_Int32>(nrc) != nBytesToRead)
>     137         aData.realloc(sal_Int32(nrc));
>     138     return static_cast<sal_Int32>(nrc);
>     139 }
> 
> If nBytesToRead > max sal_Int32, we may get a negative value for nrc.
Comment 9 Stephan Bergmann 2019-09-25 06:32:54 UTC
Unfortunately, I cannot reproduce this issue.  If somebody who can reproduce it has a debug build, it would be interesting to learn the values of nBytesToRead and nrc in the crashing call to XInputStream_impl::readBytes (in ucb/source/ucp/file/filinpstr.cxx).
Comment 10 Pasaiako Udala 2019-10-30 11:10:03 UTC
(In reply to Stephan Bergmann from comment #9)
> Unfortunately, I cannot reproduce this issue.  If somebody who can reproduce
> it has a debug build, it would be interesting to learn the values of
> nBytesToRead and nrc in the crashing call to XInputStream_impl::readBytes
> (in ucb/source/ucp/file/filinpstr.cxx).

we are having this issue all the time. How can we send you the info you need? what can we do?
Comment 11 Stephan Bergmann 2019-10-30 12:05:13 UTC
(In reply to Pasaiako Udala from comment #10)
> we are having this issue all the time. How can we send you the info you
> need? what can we do?

Do a build of LO (see <https://www.libreoffice.org/about-us/source-code/>) configured with --enable-debug, then run the built LO in the debugger (`gdb instdir/program/soffice.bin`) and provide the requested information.
Comment 12 Jan 2019-10-31 22:12:04 UTC
Created attachment 155436 [details]
debug build backtrace
Comment 13 Jan 2019-10-31 22:15:24 UTC
Comment on attachment 155436 [details]
debug build backtrace

i managed to compile a debug build, any new info in this backtrace?
How to get the values of nBytesToRead and nrc in the crashing call?
Comment 14 Stephan Bergmann 2019-10-31 22:55:45 UTC
(In reply to Jan from comment #13)
> Comment on attachment 155436 [details]
> debug build backtrace
> 
> i managed to compile a debug build, any new info in this backtrace?
> How to get the values of nBytesToRead and nrc in the crashing call?

Do `thread 1` and `frame 6` to get to the fileaccess::XInputStream_impl::readBytes call, then `print nBytesToRead` and `print nrc` to get the values.
Comment 15 Jan 2019-11-01 15:30:24 UTC
Created attachment 155452 [details]
gdb output of nBytesToRead and nrc
Comment 16 Stephan Bergmann 2019-11-01 17:44:24 UTC
(In reply to Jan from comment #15)
> Created attachment 155452 [details]
> gdb output of nBytesToRead and nrc

  (gdb) print nrc
  $2 = 4294967283

is clearly bogus.  Maybe an strace log helps pinpoint what exactly is going wrong (otherwise, I'll need to prepare a patch that logs the relevant places in the code, and which you would need to include in your LO build):  With strace installed on your machine, please run

  instdir/program/soffice --strace

and reproduce the issue.  This should leave a (large) strace.log file in the current working dir, which you please attach here.
Comment 17 Jan 2019-11-01 19:26:20 UTC
Created attachment 155456 [details]
strace with debug build
Comment 18 Stephan Bergmann 2019-11-02 10:22:18 UTC
(In reply to Jan from comment #17)
> Created attachment 155456 [details]
> strace with debug build

So that's a kernel bug:

[...]
> 2085  20:13:46.936734 openat(AT_FDCWD, "/mnt/locktest.odt", O_RDONLY) = 22
> 2085  20:13:46.938557 fstat(22, {st_mode=S_IFREG|0600, st_size=8305, ...}) = 0
[...]
> 2085  20:13:46.942594 pread64(22, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4294967283
[...]

So /mnt/locktest.odt is reported to have a size of 8305 bytes, and when we want to read 4096 bytes from the start, pread64 claims it read 4294967283 bytes.  Whatever filesystem is used by that mount point appears to have a bug (smells like an EACESS = 13 is internally reported as -13 as a 32 bit value (i.e., 4294967283 = 0xFFFFFFF3) and then erroneously interpreted as a positive 64-bit value, and propagated out from the kernel as that).
Comment 19 Jan 2019-11-02 11:15:20 UTC
(In reply to Stephan Bergmann from comment #18)
> (In reply to Jan from comment #17)
> > Created attachment 155456 [details]
> > strace with debug build
> 
> So that's a kernel bug:

Great investigation !
All right, so you mean it is a client side kernel bug of cifs kernel module ?
Or Server side bug samba4 / btrfs?
Comment 20 Mike Kaganski 2019-11-02 11:27:44 UTC
Worth reporting to the problematic project (which?).

Does that value reach LO code? In which case, it could be worth a workaround checking if the reported read size is greater than asked size?
Comment 22 Stephan Bergmann 2019-11-02 17:37:47 UTC
(In reply to Jan from comment #19)
> All right, so you mean it is a client side kernel bug of cifs kernel module ?
> Or Server side bug samba4 / btrfs?

Hard to tell.  I would start by reporting it to the relevant client-side kernel part.

(In reply to Mike Kaganski from comment #20)
> Does that value reach LO code? In which case, it could be worth a workaround
> checking if the reported read size is greater than asked size?

I would only want to do that as a very last resort, in case the the assumed bug outside of LO can't be fixed for whatever reason.  Why not wait for an analysis of the assumed kernel bug first?

(Jan, I assume you pass this on as appropriate?  In which case, thank you, and would be great if you could add a link here or report back any findings.)
Comment 23 Roman Kuznetsov 2019-11-03 07:15:04 UTC
someone already sent it into linux kernel bugzilla?
Comment 24 Jan 2019-11-03 13:00:22 UTC
(In reply to Stephan Bergmann from comment #22)
> (Jan, I assume you pass this on as appropriate?  In which case, thank you,
> and would be great if you could add a link here or report back any findings.)
Did not report to bugzilla.kernel.org yet. I am building a testbed with own samba for logfiles...

I noticed -1 EACCES (Permission denied) in my strace.
Meanwhile maybe its possible to tell Libreoffice for Linux to lock files as read only, just like LibreOffice for Windows does?
Comment 25 Pasaiako Udala 2019-11-08 07:08:14 UTC
Thanks Jan for your work, can you post the bugzilla url to track it when you report it? We are very interested to know when this bug will be fixed
Comment 26 Sebastian 2020-03-05 16:48:46 UTC
I am wondering if anyone reported this to bugzilla.kernel.org or one of the developers directly yet? Couldn't find anything related on bugzilla.
Comment 27 Theofilos Intzoglou 2020-03-29 12:03:48 UTC
I created a sample c program to test if the samba server or client reports something wrong when a file is open:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>

void printerr(void) {
        printf("%s\n", strerror(errno));
        exit(errno);
}

int main(void) {
    int fd;
    struct stat statbuf;
    int res;
    char buf[4096];

    fd = openat(AT_FDCWD, "/mnt/disk/cv.doc", O_RDONLY);
    if (fd == -1)
        printerr();

    res = fstat(fd, &statbuf);
    if (res == -1)
        printerr();
    printf("st_mode=%d\nst_size=%ld\n", statbuf.st_mode, statbuf.st_size);
    res = pread64(fd, &buf, 4096, 0);
    printf("%d\n", res);
    close(fd);
    exit(0);
}

I have created a simple public samba share with just the cv.doc file and mounted the share in /mnt/disk with the following command:

mount -t cifs //pc/public /mnt/disk -o guest,rw,_netdev,uid=user,vers=1.0

If libreoffice is closed and I run the above program, I get:
st_mode=33261
st_size=11398
4096

where 4096 is the number of bytes read. If I open cv.doc with libreoffice and rerun the program I get:
st_mode=33261
st_size=11398
-13

The error is reported successfully as it seems. nrc is an unsigned int as it is declared as an sal_uInt64 so the 4294967283 value is justified. Is it really not a bug of libreoffice? Unfortunately I cannot fully understand the libreoffice code to detect the problem :-(
Comment 28 Julien Nabet 2020-03-29 12:39:35 UTC
(In reply to Theofilos Intzoglou from comment #27)
> ...
> The error is reported successfully as it seems. nrc is an unsigned int as it
> is declared as an sal_uInt64 so the 4294967283 value is justified. Is it
> really not a bug of libreoffice? Unfortunately I cannot fully understand the
> libreoffice code to detect the problem :-(

Please read from https://bugs.documentfoundation.org/show_bug.cgi?id=127648#c16 to https://bugs.documentfoundation.org/show_bug.cgi?id=127648#c18
For even more details, read the whole bugtracker. A lot of debug tests have been made.
Comment 29 Stephan Bergmann 2020-03-30 09:40:07 UTC
(In reply to Theofilos Intzoglou from comment #27)
> int main(void) {
>     int fd;
>     struct stat statbuf;
>     int res;
>     char buf[4096];
> 
>     fd = openat(AT_FDCWD, "/mnt/disk/cv.doc", O_RDONLY);
>     if (fd == -1)
>         printerr();
> 
>     res = fstat(fd, &statbuf);
>     if (res == -1)
>         printerr();
>     printf("st_mode=%d\nst_size=%ld\n", statbuf.st_mode, statbuf.st_size);
>     res = pread64(fd, &buf, 4096, 0);
>     printf("%d\n", res);
>     close(fd);
>     exit(0);
> }

The return type of pread64 is ssize_t, not int.
Comment 30 Theofilos Intzoglou 2020-03-30 10:15:29 UTC
(In reply to Stephan Bergmann from comment #29)
> (In reply to Theofilos Intzoglou from comment #27)
> > int main(void) {
> >     int fd;
> >     struct stat statbuf;
> >     int res;
> >     char buf[4096];
> > 
> >     fd = openat(AT_FDCWD, "/mnt/disk/cv.doc", O_RDONLY);
> >     if (fd == -1)
> >         printerr();
> > 
> >     res = fstat(fd, &statbuf);
> >     if (res == -1)
> >         printerr();
> >     printf("st_mode=%d\nst_size=%ld\n", statbuf.st_mode, statbuf.st_size);
> >     res = pread64(fd, &buf, 4096, 0);
> >     printf("%d\n", res);
> >     close(fd);
> >     exit(0);
> > }
> 
> The return type of pread64 is ssize_t, not int.

Indeed but it gives the same result. Anyway I think that this bug is a big deal for many offices as it renders libreoffice nearly unusable if you work with multiple files opened at once. One mistake and libreoffice crashes leaving some of its processes open which makes it difficult for people without much experience with dealing with processes in a state that libreoffice can't be started again. I am not the one to decide but if a simple check could solve this problem in libreoffice while we wait for the kernel module to be fixed then it would be much appreciated. Thank you for your time spend investigating this.
Comment 31 Stephan Bergmann 2020-03-30 10:24:00 UTC
(In reply to Theofilos Intzoglou from comment #30)
> (In reply to Stephan Bergmann from comment #29)
> > The return type of pread64 is ssize_t, not int.
> 
> Indeed but it gives the same result.

No, it does not, in general.

> Anyway I think that this bug is a big
> deal for many offices as it renders libreoffice nearly unusable if you work
> with multiple files opened at once.

If you are affected by this issue, please get in touch with the maintainers of whatever software component is causing the error for you.
Comment 32 Mike Kaganski 2020-03-30 10:25:09 UTC
(In reply to Theofilos Intzoglou from comment #30)
> (In reply to Stephan Bergmann from comment #29)
> > (In reply to Theofilos Intzoglou from comment #27)
> > > int main(void) {
> > >     int fd;
> > >     struct stat statbuf;
> > >     int res;
> > >     char buf[4096];
> > > 
> > >     fd = openat(AT_FDCWD, "/mnt/disk/cv.doc", O_RDONLY);
> > >     if (fd == -1)
> > >         printerr();
> > > 
> > >     res = fstat(fd, &statbuf);
> > >     if (res == -1)
> > >         printerr();
> > >     printf("st_mode=%d\nst_size=%ld\n", statbuf.st_mode, statbuf.st_size);
> > >     res = pread64(fd, &buf, 4096, 0);
> > >     printf("%d\n", res);
> > >     close(fd);
> > >     exit(0);
> > > }
> > 
> > The return type of pread64 is ssize_t, not int.
> 
> Indeed but it gives the same result.

Indeed, because even if you declare your res to be ssize_t, if you print it using `printf("%d\n", res)`, you convert the ssize_t to int again here, since "%d" (without "l" length modifier) treats the value as int [1]. The same wrong handling of integer types seems to be the reason of this problem in the kernel.

[1] https://linux.die.net/man/3/printf
Comment 33 Mike Kaganski 2020-03-30 10:33:53 UTC
FTR: https://gerrit.libreoffice.org/c/core/+/81931 is a patch to workaround this, which was abandoned because of course this should be fixed in the kernel.
Comment 34 chuck.pobanz 2020-05-31 22:24:09 UTC
Might this kernel bug be related?

bugzilla.kernel.org/show_bug.cgi?id=201893