Page 1 of 2

FTP Server

Posted: Mon Dec 08, 2008 4:07 pm
by imslp
Hi all!

The FTP server is up, with two accounts:

1) An account for general uploads, for people who have large collections, but no time to submit them to IMSLP themselves.
2) An account for logo-infested files, so that other people can remove the logos and submit them.

If you want access to either one, simply e-mail me and I'll send you the user/pass. Note that at least at the beginning I'll restrict the FTP usage to seasoned IMSLP contributors, or people who have file collections. If you have some specific reason for wanting access, please also note that in the e-mail.

If there are any IMSLP contributors who are willing to manage this project, please reply here.

Posted: Tue Dec 09, 2008 1:22 am
by kcleung
Thanks Feldmahler! This would make us able to tap into CDROM score resources and in most cases saves us from scanning scores.

So would the next step be to set up a CDROM-request forum for scores which are known to be available to one of the PD CDROM series that are not yet on IMSLP?

Then for OM, it would be nice if we catelogue the copyright status and progress status of the works and have a system where people register before they rip the CDs and strip the works (separately).

For determining copyright status of each work, would need people who are more experienced than me on copyright stuff........

Posted: Tue Dec 09, 2008 1:57 am
by Yagan Kiely
So would the next step be to set up a CDROM-request forum for scores which are known to be available to one of the PD CDROM series that are not yet on IMSLP?
I don't believe this is all that necessary, it isn't like we are getting overrun by requests.
Then for OM, it would be nice if we catelogue the copyright status and progress status of the works and have a system where people register before they rip the CDs and strip the works (separately).
The project will have a select few number of people in it, so place where people register probably isn't necessary. In terms of copyright, I'm not sure how to work this out yet.

Posted: Tue Dec 09, 2008 12:42 pm
by tilmaen
maybe we can get horndude77 to use his software to automatically remove the logos from the logo infested files
http://github.com/horndude77/image-scripts/tree/master
it'd be awesome to see the Orchestra musicians CD-rom library on imslp!

Posted: Thu Dec 11, 2008 8:38 am
by Carolus
Seems to be working quite nicely. I've already added some Beethoven scores for clean-up. We'll have a nice pile of things there before long.

Posted: Thu Dec 11, 2008 8:14 pm
by ras1
And I've thrown in Violin Volume 7 of the orchestral parts.

Posted: Fri Dec 12, 2008 9:03 am
by Generoso
I have just uploaded all 9 volumes of the Orchestra Musicians Cello parts!

Posted: Fri Dec 12, 2008 1:22 pm
by ras1
Amazing! Next week I might have some time to start cleaning things up.

Posted: Sun Dec 14, 2008 5:03 pm
by horndude77
I created a clean folder underneath the tchaikovsky cello section. The program I have got rid of most of the logos. There are just a few front pages which did not work. Perhaps this can be improved. Take a look.

Posted: Sun Dec 14, 2008 8:29 pm
by ras1
That's great! The only issue I have is that the labels at the top of each page, which weren't removed, are also added by CDSM. Is there an easy way to change the program to take that out too?

Posted: Sun Dec 14, 2008 11:32 pm
by Leonard Vertighel
I see no reason for removing anything but the trademarks. Everything else doesn't violate any laws - and neither would it hide the fact where those scans come from, since it's trivial to prove that they are pixel wise identical.

Note however that it's not only front pages where logos were missed. (I've been toying with a script myself, and I've been running into the same kind of problem.)

Posted: Mon Dec 15, 2008 7:25 am
by Carolus
While there is technically no need to remove the added page numbers and titles, I'm generally not very fond of their crude added page numbers and titles and replace them in scores I've processed (see the vocal score for Puccini's Edgar for an example).

I do recommend removing all metatags, bookmarks and any other such embedded added info from the files. Giving them less than a leg to stand on is always prudent.

Posted: Mon Dec 15, 2008 9:45 am
by Leonard Vertighel
If we are going to remove the titles as well, then I'm afraid it's manual labor all the way. There is an added title at the top of every single page in the OM scores. (The page numbers on the other hand seem to be the original ones.) Removal based on coordinate position does not work (and even if it did, it would just leave us with no title at all), which is why horndude was working with a pattern matching algorithm for the logos. But since obviously the title is different for each file, this method is not an option. The only theoretical solution would be OCR, but in practice I don't believe this to be feasible.

Posted: Mon Dec 15, 2008 12:06 pm
by Lyle Neff
Carolus wrote:[...] I do recommend removing all metatags, bookmarks and any other such embedded added info from the files. Giving them less than a leg to stand on is always prudent.
That reminds me: The other day I looked at a file (I think it was the first of the PDFs for Tchaikovsky's "The Seasons") and found that the uploader had embedded a yellow pop-up message on the first page (and apparently at the head of each piece) stating that it is from so-and-so's library and warning the reader to observe the copyright rules of his/her own country.

Since IMSLP already contains a warning like that, and since the source should be given in the "scanner" and "uploaded by" fields, that kind of thing should be removed as well, I should think -- although in this case it was apparently the uploader him/herself who had added the annoying messages to the PDF.

Posted: Mon Dec 15, 2008 5:10 pm
by ras1
In those cleaned Tchaikovsky files, are the titles on the first pages left over from the OMCDL labels, or did the program add them?