Bug 73682 - FILEOPEN: HTML with .xls extension is opened in Writer/Web when opening from StartCenter
Summary: FILEOPEN: HTML with .xls extension is opened in Writer/Web when opening from ...
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
(earliest affected) rc
Hardware: Other All
: medium normal
Assignee: Maxim Monastirsky
Whiteboard: target:4.3.0
Keywords: regression
: 76114 76234 (view as bug list)
Depends on:
Reported: 2014-01-16 05:23 UTC by Dave Miller
Modified: 2015-01-24 12:32 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:

test file (285 bytes, text/html)
2014-01-16 09:48 UTC, Maxim Monastirsky

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Miller 2014-01-16 05:23:38 UTC
This is roughly similar to bug 68903, just another variant.

The document in question has an .xls suffix, but appears to actually contain HTML content (root element is a <div> and that contains a <table>) rather than Excel binary data.  I found other xls files with proper excel data in them which open correctly.

However, I've been downloading these xls files from this same site (it's the data export from an event registration site) for a long time, and in LibreOffice 4.0.x it would open them in Calc.  I had a bunch of the older files sitting around, and they have always been in this format (I just upgraded to 4.1.x from 4.0.x).

In this case, selecting one of the Excel filetypes from the selector results in an error trying to open the file, but if I create a new Calc document, then do File > Open from there, it correctly opens it like it used to.

I've also managed to get it to open correctly if I rename the file to have an .html suffix, and then choose "Web Page Query (Calc)" as the filetype when opening it.  Leaving it with the .xls suffix it had when it was downloaded won't allow me to choose this filetype though.

As mentioned, this is a regression, it opened correctly in Calc in 4.0.x

The content of the file looks like this when opened in a raw text editor:

	<table cellspacing="0" rules="all" border="1" style="border-collapse:collapse;">
			<th scope="col">Column1Name</th><th scope="col">Column2Name</th><th scope="col">Column3Name</th>
Comment 1 Maxim Monastirsky 2014-01-16 07:40:44 UTC
Already confirmed with at Bug 68903 comment 2.

(In reply to comment #0)
> In this case, selecting one of the Excel filetypes from the selector results
> in an error trying to open the file
Of course, since it's not an Excel file, just renamed HTML.
Comment 2 retired 2014-01-16 09:12:52 UTC
Dave, please attach  a test file to this bug if possible.
Comment 3 Maxim Monastirsky 2014-01-16 09:14:34 UTC
(In reply to comment #2)
> Dave, please attach  a test file to this bug if possible.
Should be reproducible with *any* HTML renamed to .xls
Comment 4 retired 2014-01-16 09:44:31 UTC
Dave, so true, but having a test file unifies the testing case and makes sure we receive identical results. Otherwise this bug can become very confusing very quickly. I think we can agree it's a good idea to avoid that, no?
Comment 5 Maxim Monastirsky 2014-01-16 09:48:13 UTC
Created attachment 92211 [details]
test file
Comment 6 Dave Miller 2014-01-16 19:15:24 UTC
For the record, I included a test case in the initial description.  You just had to copy/paste it into a file and put an .xls extension on the filename. :)

In fact, that appears to be exactly what Maxim did to create the file he attached. :)
Comment 7 Kohei Yoshida 2014-01-19 17:57:41 UTC
So, this is unfortunately by design.  HTML file format is a generic format not specific to a single application type.  If you want an HTML file to be opened in Calc, you can open it from Calc's file open menu.
Comment 8 Kohei Yoshida 2014-01-19 18:11:31 UTC
Having said that, since we do something similar for CSV, we could try to handle this as well.  The right way to handle it is to have the HTML detection service inspect the file extension, and decide in which application to open the file in, the same way we do for CSV file format.
Comment 9 Kohei Yoshida 2014-01-19 18:17:58 UTC
The caveat is that the HTML detection code is split between writer and calc currently.  The necessary first step would be to combine these split html detection codes into a central place, then put in a fix there.  In fact, the csv detection code used to be split in the writer and calc detection codes as well.  I had moved that into where it is now.  We should do the same for html.
Comment 10 Commit Notification 2014-01-23 14:50:12 UTC
Maxim Monastirsky committed a patch related to this issue.
It has been pushed to "master":


related: fdo#73682 Introduce HTML detection service

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Comment 11 Kohei Yoshida 2014-01-23 19:22:55 UTC
This is now fixed on master. Nice work Maxim!
Comment 12 Maxim Monastirsky 2014-03-18 06:13:59 UTC
*** Bug 76234 has been marked as a duplicate of this bug. ***
Comment 13 Maxim Monastirsky 2014-03-20 11:44:13 UTC
*** Bug 76114 has been marked as a duplicate of this bug. ***
Comment 14 Mateo 2014-10-29 20:42:00 UTC
Maxim, the problem is within 4.3.2 build as well (Windows)?  I thought this was fixed for 4.3?
I change the extension to .html and "open with" CALC and it opens fine.
There are no patches out there for 4.3.2.
Comment 15 Maxim Monastirsky 2014-10-29 20:58:25 UTC
Hi msaum,

As a general rule NEVER reopen old bugs. Instead:

- Open a new bug.
- Attach a test document, that shows the bug. (The bug is mostly fixed, but there could be a problem with a specific file. Therefore I can't say anything without seeing the file you're using.)
- Add me to the CC list of that bug.