Bug 55425 - PDF import: support encryption algorithm value 4 (AES)
Summary: PDF import: support encryption algorithm value 4 (AES)
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.0.0.0.alpha0+ Master
Hardware: All All
: medium enhancement
Assignee: Dave Gilbert
URL:
Whiteboard: target:25.8.0
Keywords: filter:pdf
: 114840 142312 143472 146187 149621 155042 (view as bug list)
Depends on:
Blocks: PDF-Import-Draw Password-Protected
  Show dependency treegraph
 
Reported: 2012-09-28 15:06 UTC by Stephan Bergmann
Modified: 2025-03-17 00:49 UTC (History)
14 users (show)

See Also:
Crash report or crash signature:


Attachments
incomplete feature branch bundle (7.07 KB, application/octet-stream)
2024-07-05 21:40 UTC, Michael Warner
Details
Patch file created from changes provided by Michael (7.07 KB, patch)
2024-07-08 07:22 UTC, Hossein
Details
Patch file created from changes provided by Michael (12.95 KB, patch)
2024-07-08 07:24 UTC, Hossein
Details
Patch files created from changes provided by Michael (16.59 KB, application/zip)
2024-07-08 07:46 UTC, Hossein
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stephan Bergmann 2012-09-28 15:06:20 UTC
For example, the .pdf file attached to <https://bugzilla.redhat.com/show_bug.cgi?id=826526> "cannot import pdf 1-5 format with encrypted sections in otherwise unprotected document" (as <https://bugzilla.redhat.com/attachment.cgi?id=587716>) contains an Encrypt dictionary of

10747 0 obj
<< /Length 128
   /CF << /StdCF << /Length 16
                    /AuthEvent /DocOpen
                    /CFM /AESV2 >> >>
   /Filter /Standard
   /O (...binary...)
   /P -1052
   /R 4
   /U (...binary...)
   /V 4
   /StrF /StdCF
   /StmF /StdCF >>
endobj

whose V entry 4 specifies an en-/decryption algorithm that makes use of the CF, StmF, and StrF entries.  This was introduced with PDF 1.5 (for reference, see Table 20 "Entries common to all encryption dictionaries" in section 7.6.1 of <http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf> "Document management — Portable document format — Part 1: PDF 1.7").

But our import code currently only supports older algorithm values 1 and 2 (cf. "m_pData->m_nAlgoVersion > 2" in PDFFile::setupDecryptionData at <http://cgit.freedesktop.org/libreoffice/core/tree/sdext/source/pdfimport/pdfparse/pdfentries.cxx?id=eecaca80bdcf9060a5dd06a835a2c1752b4fec01#n1235>).  The resulting effect is that LO keeps asking for a password to open the document (bAuthenticated can never become true in the loop in checkEncryption at <http://cgit.freedesktop.org/libreoffice/core/tree/sdext/source/pdfimport/wrapper/wrapper.cxx?id=eecaca80bdcf9060a5dd06a835a2c1752b4fec01#n944>).

So, it would be nice if we also supported algorithm value 4.
Comment 1 Buovjaga 2014-11-08 15:02:32 UTC
I confirm that it can't be imported. Sounds like a good enhancement.

Win 7 64-bit Version: 4.4.0.0.alpha2+
Build ID: c989f5e0e11e295b11ffc921b0d105869e037e47
TinderBox: Win-x86@42, Branch:master, Time: 2014-11-07_22:50:48
Comment 2 Kevin Suo 2021-11-22 11:23:41 UTC Comment hidden (obsolete)
Comment 3 Michael Warner 2021-11-23 14:26:11 UTC
(In reply to Kevin Suo from comment #2)
> back to new as the patch was abandoned due to license issue.

Patch referred to here was this:
https://gerrit.libreoffice.org/c/core/+/124909
Comment 4 Kevin Suo 2021-11-23 15:29:11 UTC
A better approach is to add a --mode checkEncryption in the out-of-process xpdfimport binary in 
https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/xpdfwrapper/wrapper_gpl.cxx?r=648e4106

It should be called like this:
./xpdfimport --mode checkEncryption filename
./xpdfimport --mode checkEncryption -upw 123456 filename

it should return 0 if the file is not encrypted, or return 0 if file is encrypted and the password upw is correct.

it should exit(1) or any other error code, or print a message, if the file is encrypted but no password is provided or wrong password.


then call this to replace the encryption check in
https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/wrapper/wrapper.cxx?r=8b9e5024#1021

here, we first call the xpdfimport in checkEncryption mode without a password. If the process return code 0 then go on with the following osl_executeProcess_WithRedirectedIO process call in normal mode, otherwise call getPassword as shown in
https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/wrapper/wrapper.cxx?r=8b9e5024#928
until checkEncryption mode returns 0 or user cancels password input
Comment 5 Kevin Suo 2021-11-23 15:31:04 UTC Comment hidden (obsolete)
Comment 6 Michael Warner 2021-11-24 15:23:28 UTC
I recommend against providing the password as a command-line argument, because other processes can access it that way. For a demonstration of this you can run either Linux top or Windows Task Manager and show the Command Line column.
Comment 7 Kevin Suo 2021-11-24 15:30:40 UTC
Then we need to provide the password via the stdin (pIn) of the process. I tried but failed, xpdfimport hangs there because pIn is true while there is no stdin provided at the beginning. Maybe someone else can take a try...
Comment 8 Michael Warner 2021-11-30 03:49:58 UTC Comment hidden (obsolete)
Comment 9 Michael Warner 2022-01-07 14:46:34 UTC Comment hidden (obsolete)
Comment 10 Michael Warner 2022-05-11 13:05:42 UTC
I continue to spend a few minutes on it here and there as I get time, but I admit progress on this has been slow. I have many other interests and obligations in life and LO just isn't at the top of the list. If someone else wants to take this on, I'm fine with uploading what I have for them to use. Otherwise, I will continue grinding away, on my own schedule.
Comment 11 Timur 2022-07-06 10:10:48 UTC
*** Bug 146187 has been marked as a duplicate of this bug. ***
Comment 12 Timur 2022-07-06 10:44:18 UTC
*** Bug 114840 has been marked as a duplicate of this bug. ***
Comment 13 Timur 2022-07-06 12:16:59 UTC
*** Bug 142312 has been marked as a duplicate of this bug. ***
Comment 14 Timur 2022-07-06 12:19:02 UTC
*** Bug 143472 has been marked as a duplicate of this bug. ***
Comment 15 Michael Warner 2023-01-03 04:55:42 UTC
Just as a status update on this, I have some initial code written, but I am still testing and debugging it. I also will need to write some regression tests for it, once I have it working interactively.
Comment 16 Timur 2023-11-24 15:55:11 UTC
*** Bug 155042 has been marked as a duplicate of this bug. ***
Comment 17 Michael Warner 2024-07-05 21:40:19 UTC
Created attachment 195131 [details]
incomplete feature branch bundle

Bundle of my feature branch rebased on master. It doesn't work, I didn't even try to build it after rebasing.
Comment 18 Michael Warner 2024-07-05 21:42:01 UTC
I'm unassigning myself, because I might as well face it, I'm never going to finish this. I attached a bundle with what I got in case anyone else wants to pick that up. But it doesn't completely work.
Comment 19 Buovjaga 2024-07-08 07:06:40 UTC Comment hidden (obsolete)
Comment 20 Hossein 2024-07-08 07:22:00 UTC Comment hidden (obsolete)
Comment 21 Hossein 2024-07-08 07:24:20 UTC Comment hidden (obsolete)
Comment 22 Hossein 2024-07-08 07:46:33 UTC
Created attachment 195158 [details]
Patch files created from changes provided by Michael

I could open the above .bundle file using:

$ git bundle unbundle tdf55425.bundle
Unbundling objects: 100% (35/35), 6.89 KiB | 3.45 MiB/s, done.
Resolving deltas: 100% (23/23), completed with 9 local objects. f86d04014c0149c7df81d582cfc1cc9b43fe2807 refs/heads/tdf55425_pdf_encryption_4

One can cherry pick 4 commits from f86d04014c~3 to f86d04014c on top of master. The attached zip file contain all of them, and also squashed in a single file.
Comment 23 Dave Gilbert 2025-02-10 01:25:28 UTC
I've had a quick look through, it looks reasonable; I'll dig a bit further.
Comment 24 Dave Gilbert 2025-02-16 02:05:30 UTC
I think Michael's patch hung because the pxdfimport always read the password even with the option.

I've taken a bit of a different path and have a WIP that seems to read the original test file and a few others I have; but I need to clean it up.

See:
https://gerrit.libreoffice.org/c/core/+/181739/1

What I've done is change xpdfimport more heavily, so it now has a command loop
where it takes a password and tells you if it managed to decrypt or if not says if it was an encryption problem.

Now we don't need the separate 'checkEncryption' stage.

However, the other problem is that the checkEncryption in wrapper is not the only case; there's some separate code for hybrid PDFs I've not figured out yet

I'll tidy this up and get the tests and stuff to work and then look at hybrid later.
If we can sort hybrid out we can probably nuke that separate pdfparser code.
Comment 25 Dave Gilbert 2025-02-16 02:10:36 UTC
*** Bug 149621 has been marked as a duplicate of this bug. ***
Comment 26 Commit Notification 2025-03-03 23:55:35 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/99d7822b3932a9c74b8fc33bbbce504c33f3ee7e

tdf#55425 sdext,pdfimport: wrapper: Split out line read

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 27 Commit Notification 2025-03-03 23:56:38 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a77caf540cf4a0b97974da02362153021996d9e4

tdf#55425 sdext,pdfimport: Add a status string for opening the PDF

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 28 Commit Notification 2025-03-03 23:56:41 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/779a77d1a9a467230dd1ae98b89ea76b02b22c42

tdf#55425 sdext,pdfimport: Add a command loop to the poppler wrapper

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 Commit Notification 2025-03-03 23:58:44 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/2bbe15a2cea20d2dfb9346992a9bf8ff39b4d42c

tdf#55425 sdext,pdfimport: Replace checkEncryption

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 30 Commit Notification 2025-03-03 23:58:47 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4aba4d73969dccc983dee52581faecef78ddea2b

tdf#55425:sdext,pdfimport: Add a test

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 31 Commit Notification 2025-03-03 23:58:50 UTC
Dr. David Alan Gilbert committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/1622d672b8cc721d5f9917931f6d8d999f218f7a

tdf#55425:sdext,pdfimport: Document the new protocol

It will be available in 25.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 32 Dave Gilbert 2025-03-04 00:45:40 UTC
OK, with the set that just went in, this should be mostly fixed - except for hybrid files; I've not figured those out yet.