Thursday, 31 May 2012

Efficient pdf and postscript manipulation

We send invoices as a PDF, and we sometimes attach a newsletter or similar to the invoice. The catch is that it can make the PDF rather big.

So I have been experimenting. I have a relatively simple SVG file created in inkscape, and including some SVG from the Internet Society (it is about World IPv6 launch).

The SVG is 65k
Saved as a PDF it is 67k
Saved as a PS it is 1.4M
Saved as an EPS it is 1.3M

The problem is that the postscript is not quite right for some font related reason when move to another machine and run through ps2pdf. So needs "normalising" which means using ps2ps. Using ps2ps is usually the best way to get "clean" postscript that works on anything.

ps2ps from the PDF is 2M
pdf2ps from the PDF is the same, 2M, as you would expect
ps2ps from the EPS is 2M
ps2ps from the PS is 2M

Basically, all of these use ghostscript to process the input and make the clean postscript output.

Doing the same on another machine makes 31M files not 2M. Hmmm. The first machine has ghostscript version 9.05 and the second has ghostscript version 8.70. It seems the newer one is better - good.

But there is another app called pdftops, and using that on the PDF makes a 3.2M PS file. But doing the same on another machine makes 168k PS file. The large file is from an older pdftops (0.18.4). The newer version (3.00) makes the small file. Again, the newer version seems better.

Oddly the resultant PS from pdftops seemed to be wrong as well when converted to a pdf. So ps2ps to the rescue again.

Using ps2ps on the 3.2M file created a 692k file
Using ps2ps on the 168k file created a 93k file

Sounds good.

Finally, we want a pdf as the actual output, and ps2pdf on the 93k file created a 54k PDF.

It looks OK, but some subtle shading or transparency was lost in the process. That I can live with! It is not rastered by mistake or anything nasty like that.

So, the whole process to a 65k SVG and made a usable 54k PDF.

Even so, I am not convinced the final PDF has the text as proper font characters rather than outlines, so I am going to do a bit more experimentation. It would be really nice to get them as proper font characters so they are selectable and searchable and even smaller in size...

Update: The reason the output of pdftops did not seem to work was down to some other postscript I was using with it, I think. Seems odd. But on its own, it works. So the trick is export as pdf from inkscape, use the new version of pdftops to make postscript (not pdf2ps). Play with postscript as needed to make final composite document, and they either print or ps2pdf to go back to pdf. The final result was 44K PDF with searchable and selectable text. A tad better than 31M.

2 comments:

  1. I'm surprised, given your habits, that you've not downloaded the specification from http://www.adobe.com/devnet/pdf/pdf_reference.html and written your own optimized PDF generator for the invoices...

    ReplyDelete
    Replies
    1. Fair point - but I don't always re-invent the wheel - honest :-)

      Delete