Bug 60354 - : import from HTML source limited by avg 7000 lines
Summary: : import from HTML source limited by avg 7000 lines
Status: RESOLVED DUPLICATE of bug 35756
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
3.6.0.4 release
Hardware: Other Linux (All)
: medium normal
Assignee: Vlad
URL:
Whiteboard: BSA
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-06 07:50 UTC by Vlad
Modified: 2013-12-15 23:33 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
from 7203 line import corrupted (64.22 KB, image/jpeg)
2013-02-06 07:50 UTC, Vlad
Details
file for test HTML data import (170.92 KB, application/zip)
2013-05-31 08:34 UTC, Vlad
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vlad 2013-02-06 07:50:22 UTC
Created attachment 74274 [details]
from 7203 line import corrupted

Problem description: 
import from HTML source limited by average 7000 lines.
If in HTML source exists only one table then user can use 
"Page from file...", but if HTML contain many tables, user cannot select proper table to import as page.
Also in case of "Page from file" user cannot use parameters in URL like this:
http://info.domain.com/somepage.php?param1=1&param2=2
User can use it only in case of "Link to external data..." and import limited by 7000 lines.

Steps to reproduce:
1. generate HTML more then 65KB with 30000 lines
2. make new ods
3. Insert/Link to external data (to file from item 1)
4. you see only about 7000 lines from 30000 lines expected
5. Insert/Page from file (to file from item 1)
6. you see all 30000 lines expected

Current behavior:
You see only about 7000 lines from 30000 lines expected.

Expected behavior:
You must see all 30000 lines.

Operating System: Ubuntu
Version: 3.6.0.4 release
Comment 1 Mirosław Zalewski 2013-03-09 17:12:41 UTC
I could not reproduce it on LibreOffice 3.6.5, Debian testing amd64, using file attached to Bug 35756.

In *both* cases the file is not entirely imported. It is truncated at exact the same place.

Could you attach HTML file on which you encounter this problem?

As for "Page from file does not accept parameters in URL" issue: please file another bug report and provide real URL address.
Please do the same for "On Page from file, user can not select table in multi-table HTML document".
Generally, rule of thumb is to post one bug report for one issue.
Comment 2 Joel Madero 2013-05-29 21:11:41 UTC
Marking as NEEDINFO - the HTML file would make it much much easier to triage the bug.

Once you attach html file that we can see the result easily mark as UNCONFIRMED and we will test against master branch.

Thanks!
Comment 3 Vlad 2013-05-31 08:34:14 UTC
Created attachment 80084 [details]
file for test HTML data import

This file is shorter in length, so in LibreOffice 3.5 imported 33858 lines of 50000.

Also from LibreOffice 4.x named HTML table not imported by name.
<table id="data1">
<table id="data2">
in LibreOffice 3.5 can be imported as "HTML__data1" and "HTML__data2" and by order in the file "HTML__1", "HTML__2".
But from LibreOffice 4.x can be imported only by order in the HTML file as "HTML__1" and "HTML__2".
So if source file is changed to
<table id="data1">
<table id="data1details">
<table id="data2">
<table id="data2details">
in LibreOffice 3.5 date still will import by name as "HTML__data1" and "HTML__data2"
but in LibreOffice 4.x will be imported "HTML__2" from new file so user will see
data from "data1details".
Comment 4 ign_christian 2013-05-31 09:12:51 UTC
It looks like Bug 35756. Please REOPENED if you think different.

*** This bug has been marked as a duplicate of bug 35756 ***