http://github.com/horndude77/image-scri ... ter/rotate
I've started working on a pdf deskewer (linux only, it might work in cygwin). It is dependent on pagetools (http://pagetools.sourceforge.net/) to find the skew of the page and netpbm (http://netpbm.sourceforge.net/) to read and write pbm files. I wrote my own pbm rotate program because both imagemagick and netpbm rotate tools did strange things to the output. I tested it with the schubert symphony uploaded earlier today (http://imslp.org/wiki/Symphony_No.9,_D. ... rt,_Franz)). Many pages are badly skewed. Here are the first ten pages of the resulting pdf: http://horndude77.googlepages.com/Schub ... 9_part.pdf. Let me know what you think.
P.S. Is the url tag broken or am I doing something wrong?
PDF deskew
Moderator: kcleung
-
- active poster
- Posts: 293
- Joined: Sun Apr 23, 2006 5:08 am
- notabot: YES
- notabot2: Bot
- Location: Phoenix, AZ
-
- active poster
- Posts: 293
- Joined: Sun Apr 23, 2006 5:08 am
- notabot: YES
- notabot2: Bot
- Location: Phoenix, AZ
http://github.com/horndude77/image-scri ... r/pnm_java
I thought the dependencies were a bit difficult for someone to give it a try so I've written it in java with less external dependencies (ant for building, pdfimages for extracting images from pdf, java for running, bash and ruby for some scripts). Also I know how to parallelize java to work across multiple processors so if you have more than one processor it should use them all for image rotation and skew detection. The deskew method is based on the hough transform and it seems to work relatively well though not as fast or as accurate as the pagetools radon transform from what I can tell. I've only tested it on linux, but in this form I think it should be easier to get working on windows or osx. If anyone else finds this of use let me know.
I thought the dependencies were a bit difficult for someone to give it a try so I've written it in java with less external dependencies (ant for building, pdfimages for extracting images from pdf, java for running, bash and ruby for some scripts). Also I know how to parallelize java to work across multiple processors so if you have more than one processor it should use them all for image rotation and skew detection. The deskew method is based on the hough transform and it seems to work relatively well though not as fast or as accurate as the pagetools radon transform from what I can tell. I've only tested it on linux, but in this form I think it should be easier to get working on windows or osx. If anyone else finds this of use let me know.
Code: Select all
$ cd pnm_java
$ time ./scripts/deskew_pdf Schubert_Symphony_7__D.729.pdf
real 15m5.383s
user 20m29.213s
sys 1m39.598s
$ #view pdf
$ okular Schubert_Symphony_7__D.729_out.pdf
-
- Site Admin
- Posts: 2249
- Joined: Sun Dec 10, 2006 11:18 pm
- notabot: 42
- notabot2: Human
- Contact:
Re: PDF deskew
That's really quite good. Your example looks better than Acrobat Pro's built-in de-skew feature (part of the Optimize Sacnned PDF menu command).
-
- active poster
- Posts: 293
- Joined: Sun Apr 23, 2006 5:08 am
- notabot: YES
- notabot2: Bot
- Location: Phoenix, AZ
Re: PDF deskew
http://github.com/horndude77/leptonica- ... s/clean.rb
Thanks! Though I've switched to using leptonica now. It's much faster and gives similar results.
Thanks! Though I've switched to using leptonica now. It's much faster and gives similar results.