Mihamina Rakotomandimby <
[email protected]> hat am 11. Juni 2012 um 11:12
geschrieben:
> Hi all,
>
> I have a small job ad website, where some poster tend to flood with the
> same ad, just in order to be on top of the recent sort.
>
> To perturb the strict duplication detection (yes it's weak), they add
> one or two words that makes difference.
>
> The result is a duplication of many ads.
>
> I would like to search for duplicates by looking for ads with 80%-90%
> same words and decide they're the same, so that I can group them.
>
> Of course, putting a limiting mecanism or even a moderation is
> scheduled, but I want to process existing first.
>
> I dont want to use MySQL for indexing, I believe text indexers are best
> tools for this: Am I wrong?
>
> What would you suggest me to process and lookup for duplicates in that
> situation?
Maybe take a look at
http://de.php.net/manual/de/function.similar-text.php
http://de.php.net/manual/de/function.levenshtein.php
>
> --
> RMA.
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit:
http://www.php.net/unsub.php
>
Marco Behnke
Dipl. Informatiker (FH), SAE Audio Engineer Diploma
Zend Certified Engineer PHP 5.3
Tel.: 0174 / 9722336
e-Mail:
marco@behnke.biz
Softwaretechnik Behnke
Heinrich-Heine-Str. 7D
21218 Seevetal
http://www.behnke.biz
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit:
http://www.php.net/unsub.php