Archived: Tumblr and WordPress to Sell Users’ Data to Train AI Tools

This is a simplified archive of the page at https://www.404media.co/tumblr-and-wordpress-to-sell-users-data-to-train-ai-tools/

Use this page embed on your own site:

Internal documents obtained by 404 Media show that Tumblr staff compiled users' data as part of a deal with Midjourney and OpenAI.

tumblr, wordpress, OpenAI, midjourney, automatticReadArchived
Subscribe

Join the newsletter to get the latest updates.

Great! Check your inbox and click the link.

Please enter a valid email address.

Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI, according to a source with internal knowledge about the deals and internal documentation referring to the deals. 

The exact types of data from each platform going to each company are not spelled out in documentation we’ve reviewed, but internal communications reviewed by 404 Media make clear that deals between Automattic, the platforms’ parent company, and OpenAI and Midjourney are imminent.

The internal documentation details a messy and controversial process within Tumblr itself. One internal post made by Cyle Gage, a product manager at Tumblr, states that a query made to prepare data for OpenAI and Midjourney compiled a huge number of user posts that it wasn’t supposed to. It is not clear from Gage’s post whether this data has already been sent to OpenAI and Midjourney, or whether Gage was detailing a process for scrubbing the data before it was to be sent. 

Subscribe to the 404 Media podcast on Apple Podcasts, Google Podcasts, or your favorite podcast app.

Gage wrote:

“the way the data was queried for the initial data dump to Midjourney/OpenAI means we compiled a list of all tumblr’s public post content between 2014 and 2023, but also unfortunately it included, and should not have included:

  • private posts on public blogs
  • posts on deleted or suspended blogs
  • unanswered asks (normally these are not public until they’re answered)
  • private answers (these only show up to the receiver and are not public)
  • posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know)
  • content from premium partner blogs (special brand blogs like Apple’s former music blog, for example, who spent money with us on an ad campaign) that may have creative that doesn’t belong to us, and we don’t have the rights to share with this-parties; this one is kinda unknown to me, what deals are in place historically and what they should prevent us from doing.”

This post is for paid members only

Become a paid member for unlimited ad-free access to articles, bonus podcast content, and more.

Subscribe

Sign up for free access to this post

Free members get access to posts like this one along with an email round-up of our week's stories.

Subscribe

Already have an account? Sign in