Created attachment 77352 [details]
PDFs created by AutoCAD 2012 cannot be imported to Draw.
Steps to reproduce.
1. Try to import "layers.pdf" or "nolayers.pdf" from the attachment.
File with a circle is imported correctly
"General I/O Error". File is not imported.
This error does not happen if this file has been processed by a PDF authoring tool, such as PDFill. The attachment contains "layers_pdfill.pdf" and "nolayers_pdfill.pdf" that have been resaved using this tool, and they are imported correctly.
Already [Reproducible] with Server Installation of "LibreOffice 3.3.3 German UI/Locale [OOO330m19 (Build:301) tag libreoffice-126.96.36.199] on German WIN7 Home Premium (64bit) and OOo with PDF Import Extension Version 1.0.4.
So this PDF import problem is inherited from OOo.
As expected it's also impossible to insert those documents as OLE objects
PDF-SAM, GS have no problems with the documents
Reproducible with LibreOffice 4.2.5 and 188.8.131.52 on Debian.
I confirm this bug also in LibreOffice 184.108.40.206 (final 4.3.1) win32 on Windows 8.1, 64 bit, both of them italian GUIed.
I also try open the AutoCAD native PDF file (PDF application: "AutoCAD 2012 - Russian 2012 (18.2s (LMS Tech))" and PDF autor: "pdfplot10.hdi 10.2.205.0") on Adobe Acrobat Reader (AAR) 11.0.08 win32 and then simply "File -> Save As..." them. This may be an easy workaround, at the moment.
Those saved as AAS file seem to import very well/perfectly on above LibreOffice instance. The layers.pdf file, e.g., step from 1.34 kbyte (AutoCAD original) to 5.42 kbyte (AAR saved as); compared to 1.49 kbyte of the PDFill version.
The problem is in pdf_string_parser::operator(): pdfparse.cxx line 119 (http://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/pdfparse/pdfparse.cxx#119)
This line is used to skip escaped braces, like
It pre-increments the scanner to swallow the backslash, and after that, the scanner is incremented again (normally) on line 124.
The operator++ for boost spirit classic scanner does two things:
1. Advances the scanner;
2. Skips whitespace.
So, if the parsed string has this form:
i.e. <left parenthesis><backslash><space><right parenthesis>
then the first increment (line 119, condition = backslash) skips TWO characters at once, and the next increment skips the normal closing parenthesis. Thus, the parsing continues. This gives "incorrect" PDF structure, i.e. the check in pdfparse.cxx line 575 gives false, thus the whole PDF load fails up to sfxbasemodel.cxx line 1929 (http://opengrok.libreoffice.org/xref/core/sfx2/source/doc/sfxbasemodel.cxx#1929), where the error is displayed.
I'll try to prepare a patch for that case.
A patch is submitted to gerrit: https://gerrit.libreoffice.org/15562
Mike committed a patch related to this issue.
It has been pushed to "master":
tdf#63054: pdf_string_parser incorrectly handles escapes
It will be available in 5.0.0.
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.