Bug 31480 - Find/replace non-printing characters easily
Summary: Find/replace non-printing characters easily
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
: 31509 (view as bug list)
Depends on:
Blocks: Find-Search
  Show dependency treegraph
 
Reported: 2010-11-08 16:01 UTC by David Nelson
Modified: 2016-10-01 09:54 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Nelson 2010-11-08 16:01:27 UTC
Hi, :-)

In Microsoft Office, when you do a find/replace, you have a dropdown list enabling you to easily include many special characters to search for, such as

- carriage return
- new line
- tab mark
- page break
- non-breaking spaces
- and various others.

In LibO, I think you can only do this via regular expressions. But your average user is incapable of using regular expressions.

Could you possibly add a similar dropdown box?

Thanks if so, and thanks very much for your work. :-)
Comment 1 Don't use this account, use tml@iki.fi 2010-11-09 01:47:33 UTC
But note that being able to search for "carriage return", "new line" and "page break" (and possibly also the other ones you mention) depends on those being present in the internal representation of text. I am not sure at all these *typographical concepts" exist in the internal representation of text in OpenOffice.org/LibreOffice. I think I have been told that OOo/LO uses a much more "structured" approach with separate objects for paragraphs etc, and maybe then even stores forced line breaks just as data structures, not as actual embedded carriage returns and/or new line characters.

So implementing this might be much more complex than what it perhaps is in MS Office. That doesn't mean it wouldn't be useful, of course. Even if we keep the traditional OOo way to store text in LibreOffice, we could present to the user an illusion that also the formatting characters you mention are actually present. That might be useful for people migrating from MS Office.

On the other hand, for the (few...) people who actually prefer to think of documents in a structured fashion and not as stream of characters including formatting characters, being able to search for for instance carriage returns sure would seem unnatural. In an ideal world, that is how one should conceptualize documents, no?

Of course, I might be totally misunderstanding stuff above, and in that case, feel free to correct me, and/or ignore my rambling.
Comment 2 David Nelson 2010-11-09 03:09:26 UTC
Hi Tor, 

Thank you for your comments. I think you've indeed understood what I was on about, but:

I understand what you're talking about as regards LibO/OOo's internal storage.

However, that is invisible to the end user.

I, the dumb end user, pressed the carriage return key while typing. I don't care how the software stores it. But I want to be able to search for that carriage return after.

Same thing when I press Shift-Enter (a "new line" or "soft return"). I want to be able to find those "new line" "characters" after.

Same thing for tab "characters". Etc.

Since I made those keystrokes and they have a result on-screen, they are obviously being stored in some form or other. Otherwise, next time I open it, my doc would look different from the way it looked when I typed it, no? ;-)

I sometimes need to search for the "new line" characters and replace them with a "carriage return" and thus create new paragpaphs, etc. Or I need to search for 8 space characters and replace them with a "tab mark" instead.

In MS Office, I have a dropdown list of such "special characters" and it makes life very simple to use them in find/replaces.

Could we get that in LibO, too, please?

Thanks if so. ;-)

Please let me know if I haven't explained clearly. :-)
Comment 3 Kohei Yoshida 2010-11-09 11:30:30 UTC
*** Bug 31509 has been marked as a duplicate of this bug. ***
Comment 4 David Nelson 2010-11-09 11:36:34 UTC
Please note that the term I meant was NON-PRINTING CHARACTERS, not "special characters"...
Comment 5 Gudmund 2011-04-16 09:49:45 UTC
(In reply to comment #3)
> *** Bug 31509 has been marked as a duplicate of this bug. ***

(In reply to comment #2)
> Hi Tor, 
> 
> Thank you for your comments. I think you've indeed understood what I was on
> about, but:
> 
> I understand what you're talking about as regards LibO/OOo's internal storage.
> 
> However, that is invisible to the end user.

Indeed, unlike plain text, where you actually can search for and replace newlines (LF), carriage returns (CR) or combinations (CRLF) if you use the right text handling tools.

Some Unicode pointers:
 LF:    Line Feed, U+000A
 CR:    Carriage Return, U+000D
 CR+LF: CR (U+000D) followed by LF (U+000A)
 NEL:   Next Line, U+0085
 LS:    Line Separator, U+2028
 PS:    Paragraph Separator, U+2029

(I wonder how LibreOffice handles plain text files internally, since those characters really *are* there then...)

> I, the dumb end user, pressed the carriage return key while typing. I don't
> care how the software stores it. But I want to be able to search for that
> carriage return after.
> 
> Same thing when I press Shift-Enter (a "new line" or "soft return"). I want to
> be able to find those "new line" "characters" after.

I can't see why LibreOffice couldn't handle these things by allowing the user an easy way to *both* search *and* replace arbitrary combinations of CR and LF, by handling these things inside the content.xml.

> Since I made those keystrokes and they have a result on-screen, they are
> obviously being stored in some form or other. Otherwise, next time I open it,
> my doc would look different from the way it looked when I typed it, no? ;-)


This is what it can look like:
"<text:p text:style-name="Standard">Two paragraphs starting with this line ending</text:p>
<text:p text:style-name="Standard"/>
-<text:p text:style-name="Standard">Two newlines starting with this line ending<text:line-break/>
<text:line-break/>"

It looks like there may be a few cases to handle. Paragraphs seem to have and opening tag (<text:p text:style-name="Standard"/>), and a closing tag (</text:p>) only if there was text in the line, while newlines only have closing tags (<text:line-break/>).

Writing a textutils script that can handle this simple example is a bit of work, but surely not too hard, even for a non-programmer like me, so a pro like the LibreOffice developers shouldn't find it hard at all ;). 

The only potential problem I can see in this simple example, is the "Standard" style-name bit inside the opening tag. Is there a policy for this in LibreOffice, like using the closest preceding one, or polling a standard template?

My guess at why LibreOffice handles it this way, is that it helps make it handle paragraphs, newlines etc. in a uniform way across platforms that have different ways of handling new lines.

> I sometimes need to search for the "new line" characters and replace them with
> a "carriage return" and thus create new paragpaphs, etc. Or I need to search
> for 8 space characters and replace them with a "tab mark" instead.

You're not alone in this. It's a showstopper for me too and many others, reducing LibreOffice to a very limited number of tasks, forcing me to keep MS Office, which I want to get rid of.
Comment 6 Björn Michaelsen 2011-12-23 11:34:04 UTC Comment hidden (obsolete)
Comment 7 sasha.libreoffice 2012-03-23 07:28:49 UTC
In 3.5.1 not implemented yet
> - carriage return
> - new line
> - tab mark
IMHO it is more easy to add to context Help and tooltips information how to search for these characters using regular expressions than actually implement them.
Similarly for replacing for them.

Problem is with this:
> - page break
> - non-breaking spaces
I do not know how to find them using regular expressions.
Comment 8 Gryllida 2012-04-26 17:43:22 UTC
Implementing graphical user interface (drop-down list) for at least the existing regular expressions, such as \t, \n, $, ^, would be useful to novice users.

There is an add-on ("alternative find and replace" [1]) which does the job, (including probable workarounds of the way LibreOffice stores text? it can actually handle \n in a way different from what the regular expressions page [2] says); it can probably be helpful to implement this bug.

[1] http://extensions.openoffice.org/en/project/AltSearch
[2] http://help.libreoffice.org/Common/List_of_Regular_Expressions
Comment 9 QA Administrators 2014-10-23 17:32:08 UTC Comment hidden (obsolete)
Comment 10 sasha.libreoffice 2014-10-24 11:08:51 UTC
in 4.3.1.2 not implemented yet
Comment 11 Adolfo Jayme 2014-12-25 11:17:11 UTC
*** Bug 87645 has been marked as a duplicate of this bug. ***