Bug 160489 - pub2xhtml produces wrong tspan font size and position
Summary: pub2xhtml produces wrong tspan font size and position
Status: NEW
Alias: None
Product: Document Liberation Project
Classification: Unclassified
Component: libmspub (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Format-Filters
  Show dependency treegraph
 
Reported: 2024-04-02 17:37 UTC by Manfredi Marceca
Modified: 2025-09-06 13:02 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
PUB test document (208.00 KB, application/octet-stream)
2024-04-02 17:39 UTC, Manfredi Marceca
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Manfredi Marceca 2024-04-02 17:37:46 UTC
Description:
Please see the attached file created by Publisher 2021. When using pub2xhtml the produced svg doesn't preserve text and tables, and some shapes are not recognized. There are some issues when converting to ODG and importing to LibreOffice too (but the conversion is generally of better quality): the table borders and background are not visible, and the position of text contained in shapes is not correct when spaces or new line characters are used. WordArt is not preserved but this has already been reported.

Steps to Reproduce:
Load attached pub file with libmspub

Actual Results:
 

Expected Results:
 


Reproducible: Always


User Profile Reset: No

Additional Info:
Comment 1 Manfredi Marceca 2024-04-02 17:39:04 UTC
Created attachment 193443 [details]
PUB test document
Comment 2 Buovjaga 2024-07-23 07:39:33 UTC
Could you attach a PDF exported from Publisher, so we can see how it's supposed to look like?

It's not clear to me how pub2xhtml is supposed to be used. If I do

pub2xhtml path/to/Publication1.pub > path/to/Publication1.html

I see the file contains multiple svg elements, but the rendering in a browser only shows text and <hr/> separators. Opening the .pub in LibreOffice I see four pages with various shapes and it makes much more sense.

How do you yourself use pub2xhtml?

Also, a bug report should only describe one issue, so you have to choose which one this will be about and create new reports for the others.
Comment 3 Manfredi Marceca 2024-07-23 11:00:49 UTC
To render properly in the browser you need to save as .xhtml rather than .html (probably the explicit svg namespace is not recognized in standard HTML):

pub2xhtml path/to/Publication1.pub path/to/Publication1.xhtml

In this case the shapes are rendered similarly as you see in LibreOffice but the text contained in them is not visible because it's to small and incorrectly positioned. 

In the first post I missed that the text is actually there, sorry for the confusion. So this bug report can be renamed to "pub2xhtml produces wrong tspan font size and position", and I will open another for tables.

Also, I use pub2xhtml in combination with headless Edge/Chromium to convert PUB->XHTML->PDF when a full LibreOffice installation isn't available. So I would like the XHTML+SVG to be as close as possible to how the pub it's displayed in LibreOffice Draw, it is expected that it's not exactly the same as MS Publisher.
Comment 4 Buovjaga 2024-08-21 17:43:50 UTC
Ok, confirmed the result with

$ pub2xhtml --version
pub2raw 0.1.4