From Wordpress to Hugo

December 27, 2018

I couldn’t find a one-stop tool for migrating my existing WordPress.com blog to Hugo, so here’s what I did instead:

Export the posts (XML) and media from WordPress (My Sites -> Gullickson Laboratories -> Settings -> Export -> Export your content/Export media library)
Use wp2md to convert XML to Markdown files
Use sed to format metadata, re-work URL’s, etc.
Extract media from .tar archive and copy it into a subdirectory under static

Here’s the specific comamnds I used to transform the content. If you use these you’ll need to modify some of the parameters to match your source blog info, destination paths, etc.

wp2md -ef "%Y-%m-%d %H:%M:%S" -ps gl-{title}.md -d ./md-output/ your.wordpress.export.xml (exports all content to the md-output directory, prepends each file with gl- and uses a date format that Hugo likes)
sed -i "1i ---" gl* (marks the opening of the Hugo metadata section)
sed -i "13i ---" gl* (marks the closing of the metadata section)
sed -i "2i tags:" gl* (adds a tags section)
sed -i "3i\ - gullicksonlaboratories" gl* (tags the content)
rpl "created:" "date:" gl* (renames the created field to something Hugo understands)
rpl "https://jjg2soc.files.wordpress.com/" "/wp/" gl* (updates media URL’s to use local files instead of the originals)
cp gl* /path/to/hugo/content/posts (copies all exported content into the Hugo site)

This doesn’t fix everything. The most notable problem is that links to pages within the blog will still point to the original server. Another issue (at least in my case) is that there is a header with the title of the post inserted into the Markdown (which results in the title being displayed twice on the rendered page). I think both of these issues can be addressed with a little more tweaking but neither was bad enough to stop me from publishing the site as-is.

One last thing I’ll mention is that the WordPress XML file can contain some things that wp2md considers invalid (they may in fact contain invalid XML). In my case the command fails and displays the offending line number, and it was just a matter of deleting the bad input from the XML file (mine were all messages related to failed attempts to share the post on Facebook). Once these are cleaned-out the command should complete sucessfully.