Bug 88964 - Add SOUNDEX() and LEVELSHTEIN_DISTANCE() function in Calc
Summary: Add SOUNDEX() and LEVELSHTEIN_DISTANCE() function in Calc
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.4.0.3 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Calc-Function
  Show dependency treegraph
 
Reported: 2015-01-31 02:25 UTC by zaxebo zaxebo
Modified: 2018-03-05 22:16 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zaxebo zaxebo 2015-01-31 02:25:27 UTC
soundex is a phonetic algorithm, so that strings can be matched even if there is slight differences in spelling (refer: https://en.wikipedia.org/wiki/Soundex )

This is very useful in spreadsheets in case of datasearch, when large amount of data was entered manually
this is already implemented as user defined function at this url http://www.j-walk.com/SS/excel/tips/tip77.htm

It is really extremely useful an very much needed function, which needs to be incorporated in Calc.
Comment 1 zaxebo zaxebo 2015-01-31 02:30:08 UTC
apart from SOUNDEX():
additionally note that ISOWEEKNUM, CEILING.MATH, SQL.REQUEST and FLOOR.MATH functions of MS -Excel are missing in Calc. Hence it would be nice if they are too implemented in calc
Comment 2 Winfried Donkers 2015-01-31 06:45:48 UTC
Yes indeed: SOUNDEX is not in Calc, nor in Excel. It is an interesting algorithm.
The VBA example in the link you gave, can be modified to the Basic LibreOffice uses, so you can make your own macro.
For it to be a Calc function, there are some 'problems': Calc functions return a value (which can be an array), that is shown in the cell(s) where the function is put. But that doesn't allow you to search in your document. Perhaps what you want is an addition to the search toolbar, to make approximate searching (cf. the Soundex algorithm) possible.

Could you please explain what you want a soundex function to do, and whether you're talking of a function -e.g. like VLOOKUP()- or of the functionality?

BTW: ISOWEEKNUM (bug 50950), CEILING.MATH and FLOOR.MATH (bug 69552 and bug 87342) are work in progress. 
AFAICS SQL.REQUEST is an obsolete Excel function, no longer in Excel since Excel2007. I see no use in adding obsolete functions, there are other ways to communicate with (SQL) databases. If you think you really need this function, would you then please enter a separate bug for this (and add me to CC)?
Comment 3 zaxebo zaxebo 2015-02-01 04:50:52 UTC
i occasionally still get some Excel spreadsheets emailed with SQL.REQUEST function, which i always try to open with libreoffice. After your reply , i checked i found it deprecated in MS Excel itself, so in that case, i too do NOT  need this deprecated SQL.REQUEST for any compaitibility reason.
Comment 4 zaxebo zaxebo 2015-02-01 04:52:41 UTC
yes i need soundex as a function, without any userinterface. That previous url was just for understanding.
---
SOUNDEX as a "FUNCTION":

=SOUNDEX('Smith')   this function should return S530
=SOUNDEX('Smythe')  this function should also return S530
 hence, Smith and Smythe are same as per Soundex

SOUNDEX as a function in MicrosoftSQL Server's "SQL"
 https://technet.microsoft.com/en-us/library/aa259235%28v=SQL.80%29.aspx


SOUNDEX as a function in Oracle PL/SQL's "SQL"
 http://www.techonthenet.com/oracle/functions/soundex.php


=========================================================
SOUNDEX ALGORITHM STEPS :

    the exact soundex algorithm is given at
    https://en.wikipedia.org/wiki/Soundex

    its usage,The U.S. census has been making use of SoundEx code
    http://www.creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm#SoundExAndCensus
==========================================================
Comment 5 zaxebo zaxebo 2015-02-01 04:58:24 UTC
actually when doing manual Dataentry in a large spreadsheet, we need two functions the most for having "dataentry-fault tolerant" analysis

1) SOUNDEX function (as already explained in my above comments)

2) "LEVELSHTEIN DISTANCE" function for calculating Levenshtein distance between "two words". explained at:
https://en.wikipedia.org/wiki/Levenshtein_distance

These two functionalities are large needed in areas like census, survey
Comment 6 zaxebo zaxebo 2015-02-02 09:33:15 UTC
here are people  IMPLEMENTING the widely needed 
soundex() and levenshtein_distance() functions between two words 

http://wiki.lessthandot.com/index.php/Comparing_Words%3A_Levenshtein_Distance


http://wiki.lessthandot.com/index.php/Soundex

When these functions are so much widely used (reference= http://stackoverflow.com/a/1610941 ) for "LOOSE MATCHING OF TWO STRINGS", then why not implement them inbuilt.       :-)
Comment 7 Winfried Donkers 2015-02-02 12:24:21 UTC
Zaxebo,
 
Thank you for your valuable input.
I have set the status of your request to NEW, importance to enhancement and updated the title of this bug report.
I hope to find time to implement the functions, but perhaps someone else may be able to implement them before that.
Comment 8 zaxebo zaxebo 2015-02-03 10:12:14 UTC
Thanks a lot
Comment 9 juscelino quadros 2015-11-01 01:44:21 UTC
This feature is very important for all of us, it has a real use case for us.

use-case of this feature:
We have to get lot of manually typed data from third party Typist/Clerks in the office for e-governance and census related work. So we can pick random sample of that data and match with correct value with this function. So we can have an average data accuracy figure of the random sample, by which we can give additional payments to the more accurate typists and for uploading we can more trust the whole data with better phonetically matching sample.

I hope this will be implemented in libreoffice 5.1 itself.

Now, as per https://wiki.documentfoundation.org/ReleasePlan/5.1#5.1.0_release  , during the  "Nov 16, 2015 - Nov 22, 2015", the libreoffice 5.1 shall be hard feature freezed and branched. And today is 1 Nov 2015.

I pray that somebody may take note of this feature request on priority
Comment 10 Winfried Donkers 2015-11-02 07:07:03 UTC
(In reply to juscelino quadros from comment #9)
> This feature is very important for all of us, it has a real use case for us.

Most of the bug fixes and enhancements of LibreOffice are done by volunteers. For people or organisations that need something done with LibreOffice there is always the possibility to have professionals do the job.
https://www.documentfoundation.org/certification/developers/ lists these. They may be able to help you, at a price of course.

I have these functions on my list to add to Calc, but not on top of the list. I certainly cannot implement them in time for version 5.1.

BTW, if you add yourself to the cc list (header, right column) of this bug report, you will get notice of any changes to this bug report.
Comment 11 Winfried Donkers 2016-02-05 16:47:05 UTC
It seems that the (weighed) LevenShtein distance is already in use in Calc.
When you expand the other options in the search dialog, you can set 3 distances (changed, inserted and deleted) under 'similarity search' which will be used in the search.
Also, I see a 'sounds like' option in master, looking much like soundex. I don't know if that will be included as of version 5.1 or 5.2.
Comment 12 Winfried Donkers 2016-02-17 08:26:35 UTC
Could you please say if the functionality as mentioned in comment #11 is what you need?
If so, this bug report can be closed.
Comment 13 zaxebo zaxebo 2016-02-17 09:18:04 UTC
in comment 11, you have talked about a "search dialogbox" option for soundex() etc , whereas we are requesting for a "function" soundex().

But, We need it as a "function" so that we can calculate using it - in a "cell formula" directly and then apply various other functions on it, as needed.

So request you to, please implement soundex() and levelshtein_distance() as "functions", as soon possible.
Comment 14 Winfried Donkers 2016-02-18 15:39:16 UTC
(In reply to zaxebo zaxebo from comment #13)
Thank you for your clear reply.

However, if you really want the functions implemented asap, you cannot count on volunteers. You will need to use professionals and pay for the effort of adding these functions.

I still have these functions on my list, but my spare time is limited. The only way for me to _make_ time for adding these functions is if you enter the paid path. You can contact me directly for more details.

(I am NOT phishing for money, I'm a professional developer explaining the possible options. You have no obligation at all. )