New release: Substack Python package

Nick Hagar
2 min readApr 13, 2024

Today, I have a quick post to walk through a new release of my Substack API Python package.

In a recent post, I talked about obfuscation in Substack’s API calls, and the challenges this presented for data collection. The combination of closed endpoints and SPAs, I argued, made it almost impossible to keep data collection packages lightweight, and maybe demanded new software development approaches.

This is still true for some platforms (TikTok), and it could be true for any platform that chooses to lock down its data at any time. But in the time since that last post, I’ve been able to continue work on Substack — finding endpoints for some types of data, and workarounds for others. Now, in this most recent release, I’m hoping to empower new kinds of data collection and analysis.

The new additions fall under two buckets. First, I added functions to collect user data. Substack is making itself into more of a social media platform — it lets users post on its Twitter clone Notes, and it displays which newsletters they read in public profiles. I added functions to collect Notes activity and profile information, including:

  • User metadata
  • Liked posts
  • Notes posted
  • Newsletters a user reads

Second, I added support to collect newsletter recommendations. After The Atlantic’s reporting on Substack’s Nazi problem, many pointed out how recommendations could act as a prominent vector for harmful content. No longer was Substack just a utility for sending emails; it now connected disparate newsletters, allowing ideologically aligned publications to form communities. And so, I think it’s worth systematically understanding how newsletters use this feature and what kinds of associations emerge.

That last point speaks to the main use case I’m trying to support with these additions. Of course, the overarching goal is to increase coverage of the kinds of data a researcher might want to collect. But within that, I think these data are particularly well-suited for understanding networks of reading and recommending. Given a seed set of interesting newsletters, the tools are here to quantitatively fan out to a much larger network, and to examine the engagement and posting behaviors of that network. This is a powerful approach for understanding the dynamics of harm on a platform, or to just get a better grasp of what interaction looks like across email newsletters.

This latest release is available via pip, and you can dig into the available functionality on the project’s GitHub repo.

--

--

Nick Hagar

PhD student @ Northwestern University. I worked in digital media, now I study it.