When converting a word document to a PDF the PDF symbols do not match those of the word doc.
Steps to Reproduce:
Steps to Reproduce:
1. Convert file to PDF using 'soffice --headless --nolockcheck --nodefault --nofirststartwizard --nologo --norestore --convert-to pdf --outdir /tmp /tmp/test.docx'
2. Open PDF using a viewer
3. Observe the pdf does not have the correct page numbers
Should have matching page numbers to word doc.
Doesn't have matching page numbers to word doc.
User Profile Reset: No
Created attachment 160036 [details]
Word doc - working page numbers
the word doc has been converted to use 'a' where text was to protect the original document
Created attachment 160037 [details]
Converted pdf - bad page numbers
Report is not correct and precise.
In Actual / Expected you need to write what exactly page numbers you expect and where and what you get.
Note: headless exports without blank pages so PDF is 10 pages, although DOCX shows 13 with blanks.
Also, even if you are using headless, you need to also test GUI export with/without blank pages and compare.
It'a also wrong to attach sample with all "a", so it's harder to compare, you should instead write "page 3" where you expect page 3 etc.
Just wondering if you compared both of the attached files? The reason one page is blank is simply because I wanted to preserve the integrity of the original file that had this issue. I changed all the lettering in the word document to A to preserve client confidentiality while still giving you guys everything you need, everything else is constant.
If you open up the word document, you will see that those ones are numbered in libre office writer. The word document is what gets passed in and the pdf that gets spit out is MISSING those page numbers and only contains one page number.
The ordering of the pages is the same between the two files. If there is a page number in file 1 (word document) it should appear in the pdf file 2.
Report is misleading. It's not about headless convertt, it's about LO 6.3 not opening page numbers.
Fine in master 7.0+ so I close as WFM. This is a duplicate of some fixed bug.
Actually, this fix had no bug, so I change this one to Fixed.
c462ed55e03da0e74d40eb2f0a22949c04fe6b08 is the first good commit
Author: Jenkins Build User <firstname.lastname@example.org>
Date: Tue Jan 21 13:47:21 2020 +0100
Previous source: 8f84922be15d37cb54fa592e1445fa5ab2c37f15
commit 8d58d0ef72162bbfb92cd3a894387f57c62ee8ae [log]
author Miklos Vajna <email@example.com> Fri Jan 10 16:03:43 2020 +0100
committer Michael Stahl <firstname.lastname@example.org> Tue Jan 21 12:12:01 2020 +0100
parent 8f84922be15d37cb54fa592e1445fa5ab2c37f15 [diff]
DOCX import: fix lost objects anchored to an empty linked header
This is really similar to commit
04b2310aaa094794ceedaa1bb6ff1823a2d29d3e (DOCX import: fix lost objects
anchored to the single para of a linked header, 2020-01-10), except here
the header is not just a single-paragraph one, but has no text portions.
Update text-copy.docx to have a header which is not only a single
paragraph, but also has no character content. This keeps testing the
original case, but now also tests the more strict case (single paragraph
-> single empty paragraph).