GenAI as News Gatekeeper? What Traffic Data Shows
Analysis of Comscore data shows that GenAI tools like ChatGPT and Perplexity send little traffic to news publishers.
By Nick Hagar and Nick Diakopoulos
Generative AI has upended search, shifting it from something that directs users to outside sources into something that extracts and summarizes information on its own, a so-called “answer engine.” This change threatens news publishers, who have historically relied on search engines for a large share of their traffic. The tech companies providing these new interfaces continue to partner with news organizations, and Google claims that people continue to click on links on its AI overviews. But it’s still early days, and the question looms large for how these interfaces will impact the attention market: Will these new user portals result in any meaningful referral traffic for news publishers?
This post is the first in a series analyzing Comscore data to understand how generative AI tools interact with news publishers. Today, we explore the overall traffic patterns from ChatGPT and Perplexity and what they mean for news organizations. In future posts, we’ll extend the analysis to examine specific referral patterns to news publishers and provide a more detailed assessment of generative AI’s impact on their traffic.
The Stakes: Why ChatGPT and Perplexity Matter
While incumbents such as Google and Microsoft are also integrating generative features into the search experience, for this analysis, we focus on ChatGPT and Perplexity. Both tools represent serious efforts by new generative AI companies to enter the search space. ChatGPT integrated search in October 2024, and Perplexity, an AI-powered search engine, launched in October 2022. These tools are also widely used — while Perplexity doesn’t report user numbers, one estimate puts it around 15M users, and ChatGPT has recently reported more than 400 million weekly users.
News organizations have responded to the rapid growth of these tools in varied ways: Some, like the Associated Press and Axel Springer, have negotiated licensing deals with OpenAI. Others, like the New York Times, have pursued legal action around the use of their content in training LLMs. These decisions reflect how publishers weigh the risk of tech platform partnerships against their potential impact on referral traffic. By analyzing actual traffic patterns between AI tools and news websites, we can provide publishers with empirical evidence to inform their strategies as they navigate this technological shift.
Our Analysis Approach
To understand this relationship, we analyzed data from Comscore, a firm specializing in media measurement. We analyzed traffic data for the 5 month period from July to November 2024, for about 364,000 active U.S. desktop users in Comscore’s panel of people whose internet activity they track. This time period encompasses several major launches and updates for both ChatGPT and Perplexity — including the launch of ChatGPT Search — that may have impacted traffic to news sites.
Comscore’s dataset provides visit-level data for all referrals from chatgpt.com and perplexity.ai within our 5-month period. Before running our analysis, we transformed this data in two ways.
First, we removed any visits that met one of the following criteria:
- Self-referrals (e.g., perplexity.ai to perplexity.ai, chatgpt.com to chatgpt.com or openai.com). This traffic signals that the browser is communicating back to the website for various functionality while the user is on the page.
- Programmatic traffic, defined as any visit with a MIME type of application/*, or a blank MIME type. This traffic typically indicates the browser receiving some data from the site.
- Redirect traffic (e.g., Google sign-in pages, static file serving) and tool-based traffic (e.g., browser extensions and AI plugins), defined via manual review of website domains.
Second, to identify where traffic flows from these services, we manually categorized the top 1,000 websites by traffic for each. We bucketed sites into one of 22 categories, which emerged through a bottom-up qualitative coding of each website. Major categories included education websites, news outlets, and programming tools (see Appendix A for a full breakdown of the categories). This helps us see what kinds of sources are getting linked to by ChatGPT and Perplexity and how traffic is flowing to those different sources. In our findings, we combined any categories that make up less than 1% of traffic for either service into an “Other” category.
These criteria allow us to focus on referral traffic that reflects a user taking action to navigate from one of these generative AI tools to visit an external website. In other words, we focus on traffic indicating a person actually visited the referred site. After filtering, our sample contains 145,065 visits (106,435 to ChatGPT; 38,630 to Perplexity) from 23,118 unique users (19,876 to ChatGPT; 3,977 to Perplexity), which form the basis for the findings we present.
Where Do ChatGPT and Perplexity Send Traffic?
Academic journals and publishers are a major referral category for both services, with about 16% of traffic from ChatGPT and 22% from Perplexity. Both services seem to take advantage of open access scientific resources, such as on PubMed, Semantic Scholar, and ResearchGate, and these types of links are often followed by users, suggesting an interest in seeing the details from these sources.
Our analysis also reveals distinct referral patterns for both platforms, both in the types of sources and the specifics of exactly which sources are clicked. For instance, Perplexity directs roughly 30% of its traffic to educational resources while ChatGPT sends less than 5% of traffic to these types of sources. They also differ in the details of which education sites they refer to with Perplexity leading to sites such as scribd.com and studocu.com along with university domains (e.g., ugm.ac.id, unirioja.es), while ChatGPT drives traffic towards sites such as texthelp.com, coursera.org, and khanacademy.org. Another example: ChatGPT sends about 18% of its traffic to technical sites, including tech companies (e.g., documentation and product pages on microsoft.com and apple.com), developer tools (github.com), and programming documentation (nodejs.org), whereas Perplexity only sends about 4% to such sites.
Notably, while social media platforms and Wikipedia dominate web traffic writ large, both make up relatively small amounts of referral traffic from these tools. Among social platforms, YouTube receives the most traffic from ChatGPT (8% of referrals). Wikipedia receives just 1% of ChatGPT referrals. In addition, referrals to Google products from ChatGPT are large enough to break out into their own bucket (6% of traffic). These referrals go to a range of services — most prominently, Search and Google Scholar, but also Docs, Maps, and other products.
News Sites Receive Negligible GenAI Traffic
News publishers receive minimal traffic from these AI tools: just 3.2% of ChatGPT’s filtered traffic (7% of unique users — approximately 1,400 of 20,000) and 7.4% of Perplexity’s filtered traffic (21% of unique users — approximately 800 of 4,000). Among the 143 news websites in our sample, only 5 received 100 or more unique visitors during the five-month period. In comparing Perplexity and ChatGPT, news appears to be more of a driver for Perplexity, ranking as the 3rd largest category, after education and academic journals. Perplexity linked to 110 unique news sources in comparison to ChatGPT’s 69, and only 37 of those overlapped between the two, indicating that each service is fairly distinct in the news sources it refers to.
Within news referrals, we also analyzed URLs that returned 404 (page not found) errors. These are significant because they represent potential model hallucinations, an issue reported on previously by Nieman Lab, wherein an LLM fabricates a link to a news source as part of its answer. While we found only 21 such URLs in the Perplexity data, accounting for fewer than 50 referrals, missing links from ChatGPT were more common: We identified 205 such URLs. Almost all of these links go to prominent English-language outlets, with 197 of the missing URLs pointing to either hbr.org, nytimes.com, theguardian.com, nationalgeographic.com, or bbc.com.
While we cannot say definitively whether these referrals are hallucinated links, our analysis provides a couple data points in favor of this argument. First, none of the URLs to those five news sites are archived on the Wayback Machine, which is unusual for large news outlets (nationalgeographic.com, for instance, has over 10,000 URLs archived). Second, manual review of individual cases reveal URLs that don’t seem to correspond to any live stories on these sites. For example, the ChatGPT URL www.nytimes.com/2023/10/15/us/biden-greenhouse-gas-emissions-court.html doesn’t line up with any stories published by the New York Times during October 2023 — the closest match is an article about the EPA from the following year. Similarly, this ChatGPT URL linking to The Guardian doesn’t line up with any coverage in May 2023: www.theguardian.com/technology/2023/may/26/ai-jobs-climate-bill-gates. However, it does have a close match in this June 2024 story. These factors indicate that, while it’s possible ChatGPT is referencing web pages that either changed their URLs or were taken down, it is also likely that the model has hallucinated plausible-looking links to authoritative news outlets when responding to user queries.
After filtering out 404 errors, we found that international outlets dominate the top-referred news sites, with only hbr.org and bbc.com primarily publishing in English. In the longer tail we do find the likes of apnews.com, sciencedaily.com, reuters.com, ft.com and other brand name publishers, but the traffic to each is minimal. Perplexity’s top news sources are exclusively Indonesian — possibly reflecting its popularity in Indonesia extending to Indonesian communities or connections in the U.S. panel we analyzed, driving these referral patterns. For ChatGPT, Spanish-language publications account for three of its five most-referenced sites. Notably, hbr.org receives ChatGPT referrals despite explicitly prohibiting OpenAI crawlers in its robots.txt file.
OpenAI Partnerships Have a Small Effect
Within our sample of news websites, we identified six that had a publicly reported partnership with OpenAI, all announced before the start of our analysis period:
- apnews.com
- nypost.com (News Corp)
- ft.com
- lemonde.fr
- elpais.com (Prisa Media)
- businessinsider.com (Axel Springer)
Statistically, OpenAI’s partner sites don’t receive significantly more traffic than non-partners (Mann-Whitney U = 496.5, p=0.08). The raw numbers are quite small: partner sites averaged just 35 visitors compared to 17 for non-partners — a mere 18 additional visitors over five months. The optimistic interpretation of these findings is that, while the difference isn’t statistically significant, partner sites do receive roughly double the number of visitors as non-partner sites. However, as we discuss below, many publishers report getting negligible traffic from GenAI search tools. This, combined with the low percentage of traffic we see in this sample, suggest that these high-profile content deals primarily serve as licensing arrangements rather than meaningful audience development strategies.
Limitations
Our analysis has a couple key limitations. First, we examined only desktop traffic, excluding mobile usage, which represents a significant portion of how users access both AI tools and news. Although desktop usage appears to dominate mobile in the US, including mobile traffic could shift the picture since mobile user behavior may differ in clickthrough and source needs. Second, our focus on U.S. users may not reflect global patterns, particularly given the international news sites appearing in our top referrals. These limitations suggest opportunities for future research to develop a more comprehensive picture of how generative AI affects publisher traffic across platforms and regions. In addition, future research should examine the impact of integrating generative AI in traditional search engines such as Google and Bing.
News Remains Peripheral in the GenAI Ecosystem
These findings paint a stark picture for generative AI tools as sources of news traffic. They’re also consistent with broader industry data. A recent analysis found that generative AI tools contribute less than 0.1% of referral traffic to 14 top news publishers. And while some news outlets say that they’re seeing growth in ChatGPT referrals, many see virtually no traffic from the platform. This mirrors our earlier analysis, which found that less than 2% of queries to LLMs were news-related.
Together, these findings suggest that, to date, generative AI has not become a meaningful intermediary for news traffic. While GenAI search tools send referral traffic to resources associated with other kinds of information seeking, such as educational and technical materials, they are not connecting users to authoritative news outlets. Moreover, ChatGPT and Perplexity tend to direct users to somewhat different types of sources and well as different specific news sources.
Our analysis cannot speak to why news referral traffic is low. It’s possible that readers are satisfied with the information they get from generated responses and don’t click through to sources. It’s also possible that, in line with our prior findings, GenAI users don’t leverage these tools for news reading, and are still doing traditional web searches, browsing news homepages, and engaging in other deeply entrenched news consumption habits. In either case, from a traffic or audience development perspective, the value to a news publisher of having their links included in a GenAI search tool’s response appears low.
Appendix A: Full category list
These are the 22 categories that we used to label referral destination sites.
- Academic journals: Peer-reviewed journals, or the publishers of those journals
- Blog: Personal or topic-specific blogs
- Business and legal: Businesses, business consultants, legal services
- Career: Job- and career-oriented resources
- Design: Repositories of design resources (images, videos), design tools
- ECommerce: Online shopping
- Education: Educational resources (online courses, study guides), school websites
- Finance: Financial information, tools, and institutions
- Google products: Search, Scholar, Docs, etc.
- Government: Official government sources
- Health and science: Healthcare, health research, and scientific information
- Interests and hobbies: Recreational and intellectual pursuits
- Malicious or inactive: Spam sites, malware, pornography, offline or otherwise inaccessible
- Marketing: SEO and other marketing resources
- Music and streaming: Streaming services
- News: News outlets
- NGO: Non-Governmental Organizations, professional associations, think tanks
- Publishing and culture: Book publishing, literature, cultural institutions
- Reference: Dictionaries and encyclopedias
- Search: Search engines
- Social: Social media platforms
- Technical: Documentation for software/programming tools, websites for software/tech companies