Member-only story
The weird state of social media data
Read it first on my Substack
My last post was almost exactly four months ago! In that time I’ve…
- Wrapped up my time at Meta (and put out a research paper from my work)
- Successfully defended my dissertation 🎉
- Started a new gig at the New York Times, as a data scientist
As you might imagine, I put this and all other outside projects on hold while I finished the dissertation. Now that I have free time again, I’m hoping to get back into non-academic writing, including in this newsletter.
To refresh from that last post, I’m broadly interested this year in simulation, sequence encoding, and how they interact with novel attention markets. There’s been a lot of activity in all three areas! I’d like to do a deeper dive on some of this soon, but for now, let’s talk about how weird the current landscape of social media data is (and a Python package I just released).
One way to understand how attention works online is to collect a bunch of data from a social media platform. Doing that depends on access to data from large platforms, through an API or other means. But now, open APIs are disappearing, other methods aren’t keeping up with rapidly proliferating platforms, and everything has been complicated by the arrival of generative AI.