A bit of a warning up front - this is more than a little bit technical and nerdy. I'm posting this here for the purposes of shring knowledge and helping others who have similar plans and schemes. Fee free to skip it if it's not for you.

I recently left Substack behind (for reasons I wrote about here). I chose to repatriate all of my written content back to my own website, not just spin up another newsletter service.

I wanted to share some details about how I did this. Hopefully it’s useful.

Recommendations for normal humans

If you’re a Substack user, and you want to leave, I’d recommend Buttdown as an easy alternative. Buttondown imports from Substack and a variety of other sources. It has many of the same features for newsletter authors, but isn’t attempting to become a social network. The free tier is generous and the paid plans are priced fairly.

If you want a more robust website experience, perhaps look into Wordpress.com. They support newsletters and membership options as well.

I’m a developer, and what I’m about to describe is for other developers.

What to expect when you export

Substack allows you to export your content. In the account settings area you can schedule an export. You receive an email when it’s complete.

The export file - a zip archive - is retained and downloadable from account settings.

The zip archive contains:

Posts (HTML format)
Subscriber list (CSV format)
Analytical data about posts (CSV format)

The zip archive does not contain:

Post images
Social media images (thumbnails) for posts

Posts

The posts archive contains every post, including drafts in HTML format. Each HTML file is a fragment. That is, it contains only the content in HTML format, but is lacking the HTML header, styles, metadata, etc. This makes it easy to import into another blogging tool.

The HTML itself is the same HTML used on your Substack website. This means it includes links to larger images, little SVG icons for enlarging images, and so on, but does not include the styling or scripting to make those things work.

If you are importing these things into your own site, you’ll need to either fix the HTML or create new CSS rules to display this to match your new site. I chose to programmatically fix the HTML, which I’ll describe below.

Images

As a I mentioned above, no images are included in the export file. All images are linked and are hosted on Substack’s servers. Substack appears to be hosting images on Amazon’s S3 service, but proxying those URLs through an image service which resizes and converts the image to a desirable format like WebP.

It is unclear what Substack will do with those images over time. Are they committing to perpetual image hosting? Who knows?

This motivated me to download all of my images.

Buttondown and Wordpress.com (Wordpress the commercial product, not the free version) will both import those images for you.

Choices

Site

This site is built with Astro. I’ve been happy with Astro. It’s a good HTML-first tool for building custom websites.

Astro has good support for working with content collections in Markdown, JSON, and other data formats.

I already use these features for my portfolio. Incorporating newsletter content works the same way.

Astro has some built in image processing tools. At build time, Astro will reformat and resize images for use on the web. This means I didn’t have to pay for my own image hosting service, nor roll my own.

Email

The newsletter “format” of sending my writing out via email to people who want it works well. It shows up in your inbox. You can read it now, or read it later. Once downloaded, it lives on your device. I didn’t want to give up sending email newsletter, so I looked into a couple of options:

Mailjet is a professional tool. I’ve used it at work as a system to bulk send emails to testing platforms (ask me if an emoji works in a particular email client, I can tell you).

Mailjet will absolutely support newsletters, but also requires I configure a special email sending sub-domain (like email.abouthalf.com or something similar).

I’m not ready for that level of commitment. I ended up choosing Buttondown. It’s very similar to Substack, but not problematic.

One of the best features of Buttondown, is that I can choose my own “read on web” URL - so I can publish a blog post, copy it into a newsletter, and direct readers to my website instead of Substack.

Process

After reviewing Substack’s anemic export, I realized I wanted all those thumbnail images I carefully chose over the past 2 years.

The RSS feed provided by Substack included the thumbnail image, but doesn’t include every post, only the last several.

The archive page which Substack generates includes thumbnails, but it’s a dynamic web application. All of the content is generated by a JavaScript application, so there’s nothing “in” the HTML. Unclear if this is just lazy or a deliberate mechanism to prevent writers from getting organic traffic outside of the Substack ecosystem.

The archive page is an “infinite scroller” - that is when you get to the bottom of the page it loads in the next block of posts. I scrolled all the way to the bottom to load in all the posts, then used my browser’s developer tools to extract the generated HTML. I saved this to a file and wrote a script to find all the images, and then download them.

I used cheerio to process the HTML. Cheerio is a NodeJS package which replicates the jQuery API, but in server-side JavaScript. The basically means I can load up an HTML file and pass in a query like: