Page 1 of 2
[BUG] Faulty IMSLP/WIMA redirections
Posted: Sun Dec 11, 2011 9:28 am
by reccmo
In my talk page WIMA project participant Jeko89 reports errors with some IMSLP/WIMA redirections
I've completed, after many uploads, the transfer of all files by Simone Stella and Fritz Brodersen.
I've tried to check if the automatic redirects from WIMA to IMSLP works with all files... but it doesn't!!
Some examples?
For Brodersen:
It seems to me a random behaviour. It's surely not but....What is the motivation? What have I done wrong?
The answer to Jeko89's final question is that he did nothing wrong. I've noticed several such errors myself. Jeko89's message causes me to raise the problem in this forum. I guess there is some subtle bug(s) in the web script that generates the
WIMA upload log file
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Sun Dec 11, 2011 2:16 pm
by imslp
Actually all of those links look fine to me in Firefox (they redirect to the right file). I'm guessing that this might be a browser issue. Can you send me some screenshots?
Also, could you copy and paste the actual links that are not working, so that I can be sure we are talking about the same thing.
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Mon Dec 12, 2011 1:11 pm
by jeko89
Hello, I raised this problem yesterday.
The problem isn't solved: try with the Missa da pacem (uploaded by Brodersen) at this link
http://icking-music-archive.org/ByComposer/Despres.php:
- the Credo file is correctly linked to IMSLP
- the Credo: Cantus part file isn't linked to IMSLP
It seems to me that the problem doesn't relate to the URLs because thy're quite simile.
It's a challenging bug!
P.S: it has happened to many files that I uploaded (~5% of the total)
Cheers, Giacomo.
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Mon Dec 12, 2011 8:56 pm
by reccmo
I started out preparing a perl script searching the WIMA/IMSLP transfer log file for not working redirection URLs. The perl script extracts the IMSLP URLs from the log file and tests their validity by calling one of the *nix utilities 'wget' or 'curl'.
This script has turned out to be definitely
no good: the outcome is that the IP address of my home pc got banned for 'ripping'. For example if I attempt to access one of the redirection URLS, ''
http://imslp.org/wiki/Special:ReverseLookup/141168" I get this error message:
You have reached this message because the site ripping ban script has been triggered. Site ripping is forbidden; repeated offenders will be banned indefinitely ...
Which utility can I use for checking working URLs when I loop through the transfer log file?
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Tue Dec 13, 2011 6:22 am
by imslp
Before I answer anything else, I first want to make sure I understand what exactly the problem is. Is the uploads not registering in wimaredirects.txt? Is Special:ReverseLookup not giving the correct destination? Or is there some other problem? I cannot give a response unless I first know what the problem is (and I cannot seem to replicate it from the details given in this thread).
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Tue Dec 13, 2011 1:55 pm
by Choralia
imslp wrote:Is the uploads not registering in wimaredirects.txt?
I think this is the case. I looked for the "Cantus" file of the "Credo" of "Missa da pacem", as suggested by Reccmo. Specifically, I searched inside
wimaredirects.txt for the WIMA filename "3_2_Credo_Cantus.pdf", with no success. So it seems that this upolad was not registered in
wimaredirects.txt for some reason. I also searched for it inside the
.htaccess file on the WIMA website, and obviously it was not present there either.
Max
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Tue Dec 13, 2011 4:20 pm
by reccmo
Choralia wrote:imslp wrote:Is the uploads not registering in wimaredirects.txt?
I think this is the case. I looked for the "Cantus" file of the "Credo" of "Missa da pacem", as suggested by Reccmo. Specifically, I searched inside
wimaredirects.txt for the WIMA filename "3_2_Credo_Cantus.pdf", with no success. So it seems that this upolad was not registered in
wimaredirects.txt for some reason. I also searched for it inside the
.htaccess file on the WIMA website, and obviously it was not present there either.
Max
Max outlines
one of the error case types. Another error case type is IMSLP URLs which are recorded in wimaredirects.txt but are non-functioning. When a user accesses a WIMA file which is redirected to one of those non-working IMSLP URLs she'll get one of those bl... 404 errors.
I've tried once again, this time from the WIMA server, to test non-valid redirection URLs systematically. However, with my perl script in test mode (stops upon the 10th line in wimaredirects.txt) I still get the 'anti ripper' error message
You have reached this message because the site ripping ban script has been triggered. Site ripping is forbidden; repeated offenders will be banned indefinitely.
The logic of the perl script is really simple, just
Code: Select all
$infile = "wimaredirects.txt";
$outfile = "wimaredirect_errs.txt";
open (INFILE, "<$infile") or die "Can't open $infile\n";
open (OUTFILE, ">$outfile") or die "Can't open $outfile\n";
$i = 0;
while (<INFILE>) {
$i++;
chomp;
($skip,$wima,$imslp) = split;
@args = ("curl","$imslp");
system(@args) == 0
or print "$wima\n$imslp\n$?\n\n";
last if ($i > 10);
}
So I'm afraid I've no chance to investigate the problem systematically:-(
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Wed Dec 14, 2011 1:42 pm
by imslp
Regarding entries missing from wimaredirects.txt, I have found a bug where, if the submitter makes a mistake during submission and the submission page shows an error on top, wimaredirects.txt will not be updated even if subsequent submission is successful. This bug is fixable but not easily, so I'm thinking of writing an alternate redirect function that can take entire WIMA urls and redirect them to the right place (if submitted; if not it bounces back to WIMA).
Regarding 404 errors, please provide me a sample (or few) of such URLs so that I can figure out what is wrong.
Regarding the script, I would rather prefer to resolve the problem using other methods. The site ripping ban script is there precisely to prevent such bot access to IMSLP so that the server does not get overloaded for everyone else.
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Wed Dec 14, 2011 2:01 pm
by reccmo
imslp wrote:Regarding entries missing from wimaredirects.txt, I have found a bug where, if the submitter makes a mistake during submission and the submission page shows an error on top, wimaredirects.txt will not be updated even if subsequent submission is successful. This bug is fixable but not easily, so I'm thinking of writing an alternate redirect function that can take entire WIMA urls and redirect them to the right place (if submitted; if not it bounces back to WIMA).
Regarding 404 errors, please provide me a sample (or few) of such URLs so that I can figure out what is wrong.
'Redirect /scores/haendel/H312/satz_1-02_Oboe-1.pdf
http://imslp.org/wiki/Special:ReverseLookup/136565'
'Redirect /scores/c.raehs/Facsimiles/XM55/13.pdf
http://imslp.org/wiki/Special:ReverseLookup/141574'
imslp wrote:Regarding the script, I would rather prefer to resolve the problem using other methods. The site ripping ban script is there precisely to prevent such bot access to IMSLP so that the server does not get overloaded for everyone else.
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Wed Dec 14, 2011 5:40 pm
by imslp
This batch of files were removed from the page by the uploader himself because of duplication. (See
http://imslp.org/index.php?title=Concer ... did=860041 ) The normal procedure is to delete the file itself afterwards, but in this case this was not done since the uploader himself removed the files. In any case, there is really nothing I can do except to bounce this file back to WIMA (which will be what happens in the new redirection system).
This batch of files were removed also because of duplication. (See
http://imslp.org/index.php?title=Concer ... did=867907 ) Here the normal procedure was followed and the file was afterwards deleted entirely from the server (hence the different error message). In the new redirection system the file of course will be bounced back to WIMA.
Note that the only thing wimaredirects.txt does is add an entry when a file is submitted; it does nothing else after that. (And hence the new redirection system.)
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Tue Dec 20, 2011 5:55 pm
by imslp
You can now use
http://imslp.org/index.php?title=Specia ... <urlstring>
<urlstring> is the same as the WIMA url in wimaredirects.txt (i.e. without
http://icking-music-archive.org)
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Tue Dec 20, 2011 9:54 pm
by reccmo
Do you plan to apply that format in '
http://imslp.org/wimaredirects.txt'?
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Wed Dec 21, 2011 1:44 pm
by imslp
No, because that would defeat the purpose of the new page. There are flaws in wimaredirects.txt that are not fixable, so the new URL is designed to be a generic redirect URL handling all WIMA PDF files. What IMSLP has it will redirect to IMSLP, what IMSLP does not it will redirect back to WIMA. You can keep using wimaredirects.txt until the WIMA collection is fully transferred, but after that wimaredirects.txt should probably be deprecated in favor of this new URL scheme.
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Mon Dec 26, 2011 6:20 pm
by reccmo
Max has helped me substantially with providing php code for handling the new generic IMSLP redirect on the WIMA server.
The main challenge was to prevent infinite Apache loops for WIMA URLS bounced back to the WIMA server. I believe it's working now and have replaced the WIMA redirects based on IMSLP's upload log by the new generic IMSLP redirect. On the WIMA server a simple .htaccess file in the score root folder and a php redirect handler in the document root folder are involved.
I encourage forum readers to test the redirects of WIMA files, transferred as well as not transferred. Please report any errors encountered.
Re: [BUG] Faulty IMSLP/WIMA redirections
Posted: Tue Dec 27, 2011 11:15 am
by reccmo
reccmo wrote:Max has helped me substantially with providing php code for handling the new generic IMSLP redirect on the WIMA server.
The main challenge was to prevent infinite Apache loops for WIMA URLS bounced back to the WIMA server. I believe it's working now and have replaced the WIMA redirects based on IMSLP's upload log by the new generic IMSLP redirect. On the WIMA server a simple .htaccess file in the score root folder and a php redirect handler in the document root folder are involved.
I encourage forum readers to test the redirects of WIMA files, transferred as well as not transferred. Please report any errors encountered.
Unfortunately it turns out that this redirect method conflicts with IMSLP's WIMA upload logic. When you access a thus redirected WIMA file then the WIMA upload ends up with an error message like
I've been discussing this problem with Max, who says
Error 47 for CURL is "too many redirects":
CURLE_TOO_MANY_REDIRECTS (47)
Too many redirects. When following redirects, libcurl hit the maximum amount. Set your limit with CURLOPT_MAXREDIRS.
I'm not sure that the problem is due to the downloading being managed by php "per se", as it is essentially transparent to the browser. I think that there might be two possible causes:
1) a recursion issue similar to the one that we solved yesterday, but seen from the IMSLP side rather than from the WIMA side;
2) the php downloading being fragmented into small chunks of 2048 bytes. If CURL accounts a redirection for each chunk, it is easy to hit the CURL limit.
Cause 1) should be possibly fixed by Feldmahler, so let's investigate case 2) first. I would suggest to change this instruction on Redirect.php:
$buffer = fread($fd, 2048);
using a much larger value than 2048, so that the file is not fragmented, or fragment in very few chunks. To the limit, one can use:
$buffer = fread($fd, $fsize);
That should always result in a single file chunk.
If the above modification solves the issue, we may conclude that cause 2) applies. If it doesn't, I guess we will have to investigate cause 1) with Feldmahler.
The WIMA server side php logic includes ao. this logic
Code: Select all
header("Content-type: ".$content_type);
header("Content-Disposition: inline; filename=\"".$path_parts["basename"]."\"");
header("Content-length: $fsize");
header("Cache-control: private");
while(!feof($fd)) {
$buffer = fread($fd, 2048);
echo $buffer;
}
fclose ($fd);
I've performed another upload with the fread buffer increased to the full file size ($fsize) - and still end up with error #47. So it looks like we need to look further into a recursion issue on the IMSLP server side.
Can the IMSLP generic redirect be set up to circumvent this problem? If not then a solution might be to let the generic redirect return the WIMA file path modified like
'
http://icking-music-archive.org/scores/ ... es-SAB.pdf' -> '
http://icking-music-archive.org/scores1 ... es-SAB.pdf'
On the WIMA server I've created a new directory, '
http://icking-music-archive.org/scores1/' including symlinks to all level 1 directories in '
http://icking-music-archive.org/scores/'.
For now I've taken back the 'old' redirect method into production.