Bug 91337 - identifying and regular expression for invisible non-whitespace characters
Summary: identifying and regular expression for invisible non-whitespace characters
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.8.2 release
Hardware: x86-64 (AMD64) Linux (All)
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-16 19:55 UTC by Nick Levinson
Modified: 2015-05-23 17:08 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Levinson 2015-05-16 19:55:35 UTC
Sometimes, invisible characters, other than whitespaces, exist in a file. I can tell, because, even when the font is a monopitch font, when I tap the left or right arrow key, the insertion point moves only very slightly. (I don't have an example any more, so I can't demonstrate with a file for everyone to see.) I'd like a regular expression that will let me find all invisible characters other than full-width whitespaces (i.e., other than at least tabs and hard and soft spaces and other than paragraph ends). I'd also like a way to identify them, such as a place to paste them into that will tell me their Unicode values; identification of glyphs is discussed in bug 91029.
Comment 1 m_a_riosv 2015-05-16 21:40:33 UTC
Hi @Nick

I think in the icu-project, what is used by LibreOffice you can find some help to find those characters.

http://userguide.icu-project.org/strings/regexp

But I don't know if all Metacharacters are implemented with LibreOffice.
Comment 2 Nick Levinson 2015-05-17 20:37:27 UTC
Considering ICU Project, what would help in LibreOffice is the putting together of a bunch of regexes into one request, such as, more or less, in the form of (\Uhhhhhhhh OR \Uhhhhhhhh OR \Uhhhhhhhh OR \Uhhhhhhhh OR \Uhhhhhhhh OR \Uhhhhhhhh OR \Uhhhhhhhh OR \Uhhhhhhhh), but I don't have a list of all those values. It would be easier to use if one regex could encompass all of the zero-character-width characters.
Comment 3 Nick Levinson 2015-05-23 17:08:34 UTC
Perhaps a remedy or a correction: "[:cntrl:]" is already in the regex list (it's for a nonprinting character) and I can't test if it would have accomplished the same thing.

I'm changing the status. If someone else has the problem and can test at least that regex, go ahead and reopen. Thanks.