Artificial Intelligence Companies Continue to Collect Data from the Internet

It turns out that artificial intelligence companies bypass the instructions, also known as robots.txt.

With the rise of artificial intelligence, companies entering this field need huge amounts of data to develop their own tools. The first alternative that comes to mind to find this data is, of course, the internet. on the other hand every data on the internet, not every article can be used to train artificial intelligence. Websites indicate whether data can be collected from them with a file called robots.txt.

According to Reuters, many artificial intelligence developer They choose to bypass the directions in this file and collect data from these sites. Although Perplexity, which introduces itself as a “free artificial intelligence search engine”, is one of the companies that attracts the most reactions in this regard, it is not alone in this practice.

OpenAI, Anthropic…

According to reports, many artificial intelligence developers robots.txt It bypasses the files and continues to receive content from the sites. Although no names were given in the report, it was learned that OpenAI and Anthropic were among these companies. perplexity It turned out that a server used by was also not following these guidelines. Perplexity CEO Aravind Srinivas had previously said that the company “is not in a position to first bypass the protocol and then lie about it.”

Robots.txt protocol on the other hand since the 1990s It is used and actually has no legal binding. Perhaps creating a new, stricter and more detailed protocol on this issue will contribute to the solution of the problem.

RELATED NEWS

OpenAI Reveals When GPT-5 Will Be Released and How Smart It Will Be

RELATED NEWS

Ilya Sutskever Leaving OpenAI Announces His Own Artificial Intelligence Company

Source :
https://www.engadget.com/ai-companies-are-reportedly-still-scraping-websites-despite-protocols-meant-to-block-them-132308524.html?src=rss


source site-37