Reddit Sues Perplexity AI Over Alleged Unauthorized User Data Collection
Reddit has filed a lawsuit against Perplexity AI and three associated companies, claiming that they collected large volumes of Reddit users’ posts and comments without permission to train their artificial intelligence systems. The company alleges that while certain companies have formal agreements to access Reddit’s content, Perplexity and its partners allegedly bypassed these agreements by using data-scraping services. These services reportedly gathered content by mimicking normal user behavior or pulling posts from search engine results, rather than negotiating proper access or respecting the platform’s terms of service.
According to Reddit, the scraped material includes millions of posts created by everyday users, many of whom likely did not consent to their contributions being used for commercial AI training. The lawsuit describes these tactics as highly aggressive, even likening them to “North Korean hacker” methods, because they deliberately sought to avoid protections designed to prevent unauthorized data collection. Reddit argues that this unauthorized scraping violates its rights and exposes users to the risk of having their content used in ways they never agreed to.
Perplexity AI has pushed back on the claims, stating that it supports access to public information and insists that it handles data responsibly. However, Reddit’s legal team is arguing that public access to content does not give companies the right to use it for profit, especially at the massive scale required for AI training. The case underscores a growing conflict between social media platforms and AI developers, as companies increasingly seek large datasets to train their systems.
The lawsuit also raises broader questions about user privacy and consent in the age of artificial intelligence. Many social media users do not fully understand that their publicly posted content can be scraped and used to train AI systems, potentially without their knowledge or compensation. Reddit’s case highlights the tension between technological innovation and the protection of individual user data. If the court rules in Reddit’s favor, it could establish an important precedent, potentially requiring AI companies to negotiate licenses or seek explicit permission before using social media content for commercial purposes.
As AI continues to advance, the need for massive amounts of training data is increasing. This case illustrates how platforms like Reddit are seeking to assert control over their content and ensure that user contributions are not exploited without consent. Legal experts are watching closely, as the outcome could have wide-ranging implications for the relationship between online communities, their users, and the rapidly growing artificial intelligence industry.






