instapaper epub issues

I'm a big fan of Instapaper. Sometimes I don't want to read long articles and posts online and so I squirrel them away via Instapaper and then I download them as an epub to my Kobo eReader. As of late my epubs have failed to be legible and I finally tracked it down to some issues with the HTML that is produced by Instapaper.

Here is an easy way to get it fixed using Tidy.

First, unzip the epub file. It's really just a zip file ...

mkdir b
cd b
unzip Instapaper-ReadLater-2012-01-09.epub

Then clean up the html files with the incorrectly escaped data

tidy -m *.html

Lastly, re-assemble the epub zip and cleanup the temporary files

zip Instapaper-ReadLater-2012-01-09.epub *
cd ..
rm -rf b

After doing this my Instapaper epub is once again readable.

That's the manual process. To make it quick and easy use the following script on a box with bash, zip and tidy:

#!/bin/bash

EPUB=$1
TMP_DIR=/tmp/b

if [ -z "${EPUB}" ]; then
	echo "need to specify the file to fix"
	exit 1
else
	CURRENT_DIR=$(pwd)
	EPUB=$(readlink -f ${EPUB})
	mkdir ${TMP_DIR}
	cd ${TMP_DIR}
	unzip ${EPUB} 
	tidy -m *.html
	zip ${EPUB} *
	cd ${CURRENT_DIR}
	rm -rf ${TMP_DIR}
fi

Update: The script had a bug in it which would cause it to zip the corrected files into the zip file under the TMP_DIR. The updated version above should work correctly. My apologies.