Some musing on scientific writing…

Some notes for my fellows at University of Liverpool. Quite often we’ve to manage a lot of content, which means get it, digitize it, and manage it to finally end up writing something about it.

Let’s see it from where we want to have it:

All content should be digitized, and it should be indexable. Which means, all articles and books must somehow end up being PDFs with actual content searchable, i.e. they need to be OCR’d if they were scanned, to then be put into a library manager such as BibDesk or DevonThink, to ultimately be used by LaTeX.

So first see how we get to those searchable PDFs.

  1. Hard Cover books: Hack off the back using e.g. QCM-8200M heavy duty desktop cutter. Scan them using a fast scanner, e.g. Fujitsu ScanSnap S1500M. Run OCR on them using e.g. Abbyy command line version, wrapped into a recursive script like the one I published on pdfocrwrapper.cvs.sourceforge.net.
  2. DRM’d version (VitalSource BookShelf): Unfortunately, these files are not indexable, so we need to export them. Fortunately, that’s relatively easy: Just use http://mnott.de/index.php/archives/371 as a reference (works on Mac). If you had the not yet OCR’d version, you’ll end up with a PDF as if it was scanned, which you can then again run through Abbyy.
  3. ePub: ePub’s are not that nice as you’ll really want PDFs. So there’s a bunch of options of converting those; the best one I found is http://epub2pdf.com/
  4. Adobe Digital Editions: There is, again, another option that you may get an DRM lock on your PDFs. If that’s so, try http://apprenticealf.wordpress.com/2010/11/18/dedrm-applescript-for-mac-os-x-10-5-10-6/
  5. Books that are not available through the library: Buy them. Likewise, a google search for your author and title, with an appended set of keywords like download pdf, often helps. Make sure, though, to stay legal and pay for what you use; same is true for previews like using http://ebookmgr.com/content/books-manager-1.2.4-setup.jar

Now that we’ve what we need, as searchable PDF, our Mac will automatically index them (I assume somewhat the same to be true for Windows).

So the first thing to do once we have the content, we need to put it into a library manager. As I’m using LaTeX, I obviously use BibDesk on Mac.

For reading PDF’s, there’s probably no better application on the Mac than Skim. It also operates well with GoodReader on the IPad, and both can sync using Dropbox.

For writing, I use LaTeX and as frontend TeXclipse. The formatting and automated referencing is so powerful, I even write forum posts with that using a scratch project. I’ve created some simple scripts which do word counting and conversion to HTML for me for copy-pasting into the forum.

HTH,

M

Share