Bug 126041 - No response when trying to save an HTML file into RTF
Summary: No response when trying to save an HTML file into RTF
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.2.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: perf
Depends on:
Blocks: RTF
  Show dependency treegraph
 
Reported: 2019-06-21 14:09 UTC by Simon Urli
Modified: 2022-05-04 17:22 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
HTML file that makes the export crash (42.65 KB, text/html)
2019-06-21 14:10 UTC, Simon Urli
Details
HTML file with its pictures (1.30 MB, application/zip)
2019-06-21 14:12 UTC, Simon Urli
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Urli 2019-06-21 14:09:59 UTC
Description:
I cannot save some HTML file to RTF. 
I have not been able to identify what part of the HTML makes it failing. 

Note that the HTML file refers to some images: I tried to do the export with and without the images without any difference.
Exporting the file to ODT works fine. 

Steps to Reproduce:
1. Open the attached file export_input.html with Writer (not Writer/Web)
2. Save as RTF document


Actual Results:
The save process seems to never end and a growing tmp files is created. It can go up to 4 Go where I decided to kill the process. 

Expected Results:
The doc is properly saved as an RTF document


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 6.2.4.2.0+
Build ID: 6.2.4-1
Threads CPU : 8; OS : Linux 5.1; UI Render : par défaut; VCL: gtk3; 
Locale : fr-FR (fr_FR.UTF-8); Langue IHM : fr-FR
Calc: threaded
Comment 1 Simon Urli 2019-06-21 14:10:56 UTC
Created attachment 152345 [details]
HTML file that makes the export crash
Comment 2 Simon Urli 2019-06-21 14:12:32 UTC
Created attachment 152346 [details]
HTML file with its pictures

This zip files also contains the pictures that the HTML refers to. The bug can be reproduced without it, but I originally obtained it with them.
Comment 3 Mike Kaganski 2019-06-21 15:19:21 UTC
Confirmed with Version: 6.2.5.1 (x64)
Build ID: 9a940173fab1747f02322bc89779759d52b3a086
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded

and with current master (tested attachment 152345 [details]).

Code pointer from debug (I don't plan to work on this, so hopefully this saves a few cycles to whoever fixes this): MSWordExportBase::WriteText never finishes, with m_pTableInfo->getNextNode(pCurrentNode) returning node 354 for current node 359, creating infinite loop.
Comment 4 Xisco Faulí 2019-06-28 14:15:27 UTC Comment hidden (obsolete)
Comment 5 Maxim Monastirsky 2019-06-28 14:44:47 UTC
(In reply to Xisco Faulí from comment #4)
> I tried with: instdir/program/soffice --headless --convert-to rtf
> /home/xisco/Baixades/export_input.html --outdir /home/xisco/Baixades/output/
> 
> and it claims:
> Error: no export filter for /home/xisco/Baixades/output/export_input.rtf
> found, aborting.
> Error: no export filter
> 
> How can I export the HTML to RTF ?
You need to add "--writer" to the command. See Bug 40186 and my comment there.
Comment 6 Buovjaga 2022-05-04 17:22:00 UTC
After some recent improvements to RTF stuff, I thought to try this, but I repro with the command line conversion

Version: 7.4.0.0.alpha0+ / LibreOffice Community
Build ID: 6ebf46e332facfae5fd6027ec667ccd5993dd493
CPU threads: 8; OS: Linux 5.17; UI render: default; VCL: kf5 (cairo+xcb)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded