Page 1 of 1

Normalizing of names/titles in links for fund-raising

Posted: Sat Jul 05, 2008 10:00 pm
by Lyle Neff
Hello,

I've noticed that there are now links included when searching a title that go to Amazon.com for 10% contribution to IMSLP for purchase.

These llinks do not work in Amazon if there are diacritics in the titles or composers' names.

Is there a way to normalize the pre-set links to the Amazon to remove diacritics? (I'm using "normalize" in the sense that I used to hear in library circles for coding/searching -- i.e., removing any diacritics.) This would be good also for general searching within IMSLP. :D

Posted: Wed Jul 09, 2008 5:22 pm
by imslp
This has actually been a big issue way before the advertising. The sorting order for pages with diacritic letters are out of whack, and I have been trying in vain to fix this for a long time. As simple as one would think removing diacritics would be, it is actually quite hard, especially with all the different encodings. But this is high on my todo list, so don't worry :)

Posted: Wed Jul 09, 2008 8:21 pm
by ras1
Is there a way we can help remove diacritics? Or does it require more privileges?

Posted: Wed Jul 09, 2008 8:30 pm
by imslp
What I am looking for is PHP code that can actually do a thorough job of this. I have yet to find one, though admittedly other things have been demanding my time.

Posted: Wed Jul 09, 2008 9:15 pm
by Lyle Neff
Thanks for looking into this. I would assume that some kind of accessible "side" index or process would be needed, but I'm no systems person. I have no idea how systems like WorldCat (OCLC), Google, etc. make possible searches by removing diacritics inserted by the searcher.

Posted: Wed Jul 09, 2008 10:00 pm
by Generoso
Is this something like what you are looking for?


http://wiki.greenstone.org/wiki/index.p ... diacritics

Good luck
Generoso

Posted: Wed Jul 09, 2008 11:08 pm
by imslp
Unfortunately that is Perl and not PHP, and furthermore it is exactly the method I like to avoid, because there will very likely be encoding issues.

Fortunately, however, I did do some research on this in the past few hours, and have found an elegant and problem-less way to implement it. In fact, I have already added this feature in my own version of the code; will most likely upload it to IMSLP tomorrow or the day after, along with other fixes.