news

The New York Times and other top news sites block SearchGPT web crawlers

2024-08-03

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Bianniu reported on August 3 that according to foreign news reports,OpenAIAbout a week after launching SearchGPT, some top news publishers have made it clear they want nothing to do with the startup’s new search engine.

The New York Times and at least 13 other news sites have blocked OAI-SearchBot, a web crawler that indexes information so OpenAI can retrieve and display relevant results to SearchGPT users.

Originality.ai tracked this content and found that 14 of the top 1,000 website publishers have blocked OAI-SearchBot. Other publications on the list include Wired, The New Yorker, Vogue, Vanity Fair, and GQ.

Jon Gillham, CEO of Originality.ai, said it was a bit puzzling.

"I'm not sure why publishers would block it," he told Business Insider. "This is traffic that publishers want and need."

When OpenAI released SearchGPT last week, it stressed that OAI-SearchBot does not crawl the web to collect data to train its AI models such as GPT-5. It recommends that website owners allow the new bot to ensure your site appears in search results.

Without access to every website, OpenAI's SearchGPT service is likely to be less sophisticated than Google's search engine. Business Insider asked Gillham if any large news publishers had blocked Google's search bots, and he said he didn't know of any that had.

Lack of trust or suspicion about search traffic

OpenAI has another web crawler called GPTbot that collects online data for AI model training. Hundreds of websites have blocked it. This makes more sense: you want traffic from search engines, but you don’t want to give up your content to train AI models that might compete with you.

Yet OpenAI has been collecting online data without permission for years. Maybe publishers just didn’t trust it when it said its new search bot wouldn’t secretly steal their content as AI training data?

'I think so,' said Guillam.

Another theory: Today’s search results don’t always direct users to sites that work hard to create original content. One of the goals of the new AI search engines is to retain users by showing them snippets. If publishers no longer see a lot of traffic from search engines, why should they allow their web crawlers?

Complaint from the New York Times

Gillum also noted that OpenAI has been busy this year striking deals with publishers to use their content archives. (Business Insider parent company Axel Springer signed one of those deals.)

Gillum added: This seems like a deliberate series of steps by OpenAI, first getting in good shape with publishers, signing all these partnerships, and then announcing SearchGPT.

The biggest publisher opponent is The New York Times, which has sued OpenAI and Microsoft, accusing the two tech companies of illegally using its work to create competing products.

Charlie Stadtlander, a New York Times spokesman, said in a statement: “The New York Times does not authorize the use of our work for search generation or AI training purposes without an express written agreement, regardless of whether we block or restrict any particular bot from crawling our content.

In its complaint against OpenAI and Microsoft, The New York Times addressed the issue of search engines becoming more artificially intelligent and potentially siphoning traffic away from publishers.

"Defendants also use Microsoft's Bing search index, which replicates and categorizes The New York Times' online content, to generate responses containing verbatim excerpts and detailed summaries of New York Times articles that are significantly longer and more detailed than what is returned by traditional search engines," the publisher wrote in its complaint. "Defendants' tools serve up New York Times content without The New York Times' permission or authorization, undermining and harming The New York Times' relationship with its readers and depriving The New York Times of subscription, licensing, advertising and affiliate revenue."

OpenAI did not respond to a request for comment.