I’m a big fan of Instapaper. Sometimes I don’t want to read long articles and posts online and so I squirrel them away via Instapaper and then I download them as an epub to my Kobo eReader. As of late my epubs have failed to be legible and I finally tracked it down to some issues with the HTML that is produced by Instapaper.
Here is an easy way to get it fixed using Tidy.
First, unzip the epub file. It’s really just a zip file …
mkdir b cd b unzip Instapaper-ReadLater-2012-01-09.epub
Then clean up the html files with the incorrectly escaped data
tidy -m *.html
Lastly, re-assemble the epub zip and cleanup the temporary files
zip Instapaper-ReadLater-2012-01-09.epub * cd .. rm -rf b
After doing this my Instapaper epub is once again readable.
That’s the manual process. To make it quick and easy use the following script on a box with bash, zip and tidy:
#!/bin/bash EPUB=$1 TMP_DIR=/tmp/b if [ -z "${EPUB}" ]; then echo "need to specify the file to fix" exit 1 else CURRENT_DIR=$(pwd) EPUB=$(readlink -f ${EPUB}) mkdir ${TMP_DIR} cd ${TMP_DIR} unzip ${EPUB} tidy -m *.html zip ${EPUB} * cd ${CURRENT_DIR} rm -rf ${TMP_DIR} fi
Update: The script had a bug in it which would cause it to zip the corrected files into the zip file under the TMP_DIR. The updated version above should work correctly. My apologies.