Wednesday, 13 June 2012

Creating PDFs

I have done some more research on this.

Basically, we are working to make an "info pack" for the broadband lines. We have had these for a long time, but they have not been updated for new types of line (FTTC) and products (SIMs, etc). The reason being that nobody but me knew how to udpate them.

The way we used to work turns out to be sensible in many ways - it was generating postscript from scripts and then using ps2pdf to make a PDF or simply sending to the printer.

The trick is (a) how to generate the postscript in a more maintainable way, and (b) what sort of postscript makes for a good PDF. We want something that is a lot more maintainable than the previous scripts, and something that other staff can update with no knowledge of postscript. Either that or I do some training on postscript, which I may do anyway.

So, what we are doing is, in most cases, using an svg master for a page - editing in inkscape. We have a simple text substitution script to allow some embedded values.

SVG is good for this as is allows simple graphics, embedding of images, and text. The text can be positioned text or flowed paragraphs. The latter is very good when we are embedding data, such as line numbers, names, IP addresses, etc.

We convert this to postscript. We can then work on the postscript to do things like add page numbers, reorder and resize to make an A5 booklet, and all sorts. The final result can go through ps2pdf to make a good PDF.

The trick is for the postscript itself to be as much like hand-made postscript as possible. I.e. the text needs to be, for example: (text) show

This creates a PDF that is compact and has text you can search and select and so on. It allows copy/paste from the PDF nicely, regardless of font used.

The problem is that a graphical editing tool like inkscape is outputting postscript for the purpose of printing. It knows the metrics of the fonts in use, and positions characters. Because postscript does not do utf-8, it will mess with character coding. It will produce output which prints, but not output which makes for a nice PDF. What is worse is that any issue with font metrics creates PDF that looks horrid.

We got close by telling inkscape to make a PDF, and then using ps2ps to make a postscript from that, but that only worked with the very basic fonts like Helvetica, for some reason, otherwise you could not copy/paste from the document.

So, back to basics - hand made postscript works, is small, can do things like flowed text and right alignment using the font metrics internally. It makes for good PDFs regardless of font used. How to get simple "hand made" postscript from easy to edit SVG files? Make your own tool. So I have. It is limited in what it understands, but it works for graphics, text, flowed text, and embedded png images. That is all we need.

The new info packs will be on the A&A control pages soon.

4 comments:

  1. This is a problem I've had for years. I'd be interested to see your tool for creating the docs. Recently I've been creating documents in Openoffice and using a script to unzip them, do a search and replace on the content xml, zip back up and then call openoffice in headless mode to print to a PDF or PS printer in CUPS. It works well but is a bit slow if you're doing it in bulk.

    ReplyDelete
  2. TeX works well for this sort of thing. Being text, it's easy to substitute whatever you like in, and it produces good PostScript and PDF output. If you want to make it easier to edit, you could create the fixed portions of the page in SVG, export to EPS, then just use TeX to overlay the dynamic text on top in the right places.

    ReplyDelete
  3. I guess the overhead is probably too great, but I'd suggest TeX or similar.

    ReplyDelete
  4. I use the open source Jasper reports. It has a gui for making the report layouts. It accepts many data sources, including XML, and outputs pdf.

    It's written in and for Java, which was a problem for me, but i created a Linux command-line and php interface on a web server that serves up the php files it creates.

    ReplyDelete