news

HowNet "accuses" MiTa AI Search of infringement: immediately disconnect the link to our website

2024-08-16

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

(Image source: unsplash)

Titanium Media App News on August 16, domestic AI startup "Shanghai Mita Network Technology Co., Ltd." (referred to as "Mita Technology") issued a statement saying that CNKI recently sent a letter to warn AI search startup Mita Technology that it presented academic paper titles, directories and abstracts in AI search results without permission, which constituted serious infringement. The infringement notification letter was as long as 28 pages.

Mita Technology emphasizes thatEven though we do not understand this behavior, the company respects CNKI's choice. From now on, MiTo AI search will no longer include the titles and abstracts of CNKI documents, but will instead include the titles and abstracts of documents from other authoritative Chinese and English knowledge bases. We also welcome other databases to cooperate and discuss.

It is reported that Mita Technology was founded in 2018. The company's CEO Min Kerui was previously the chief scientist of Cheetah Mobile and is currently the chief intelligent scientist of the Legal Artificial Intelligence Laboratory of Peking University.

In 2018, Mita successively launched the legal AI translation product "Mita Translation" and the error correction and proofreading product "Mita Writing Cat"; in 2022, it launched the article generation product "Quantum Sketch", which had more than 10,000 daily active users within one week of its launch.

Since March this year, the "Mita AI Search" created by Mita Technology has suddenly become popular and attracted market attention. The website has received more than 7 million visits that month. According to the website visit data monitoring platform Similar Web, the number of visits to Mita Search in March ranked third among a group of AI products in China, second only to Baidu Wenxin YiyanheDark Side of the MoonKimi; The growth rate for that month reached 550%.

In August this year, Mita Technology announced that it had completed a round A financing of over RMB 100 million, led by Ant Group, with a post-investment valuation of US$150 million (approximately RMB 1.077 billion). Mita’s previous shareholders included Mingshi Capital, Cheetah Mobile and Amoy Capital.

From a product perspective, compared with traditional search engines, AI search directly gives users answers to questions and attaches source links. The Mita AI search website provides three answer modes: "concise", "in-depth" and "research", and the search scope can be set to "whole network", "library", "academic", "podcast" and other sources.

Regarding the follow-up of the notice, according to Southern Metropolis Daily, Wang Yiwei, COO of Mita Technology, said that CNKI did not specify in the notice what rights were violated. Mita AI search of CNKI's "Academic" section can also bring traffic to CNKI. Some users activate their CNKI accounts through Mita AI search and pay for CNKI, which is a benefit to CNKI. Given that CNKI requires the link to be disconnected, "we will not force the intersection."

According to Wang Yiwei, the previous AI search results were not only linked to CNKI, but other authoritative Chinese and English databases have not yet requested to disconnect the links.Disconnecting the CNKI link does not affect the user experience of the MiTa product.

Titanium Media App learned thatTongfang CNKI (Beijing) Technology Co., Ltd., the parent company of CNKI, recently created a model called the Chinese Knowledge Big Model (Huazhi Big Model) with Huawei, which can support scenarios in the fields of knowledge services, scientific research, exploratory learning, production and operation, auxiliary diagnosis and treatment, and smart justice.

Zhang Hongwei, general manager of Tongfang CNKI, revealed in July this year that CNKI is a leading digital publishing and knowledge service company in the industry, with users in more than 90 countries around the world. Education, scientific research, think tanks, government, enterprises, and scientific research institutions are basically all CNKI users, especially in domestic educational and scientific research institutions, where the market share is basically 100%. Currently, Tongfang CNKI is affiliated to China National Nuclear Corporation, a central enterprise. The organization has established cooperative relations with more than 70 countries and more than 20,000 publishing institutions around the world, and has initially built a world knowledge big data system, operating the world's largest Chinese knowledge resource library.

Zhang Hongwei pointed out that based on the Huazhi big model, the company has carried out a deep transformation of the entire line of CNKI products, from processing and annotation to adding this tool to the service platform for various industries. Since it was officially opened to the public in mid-May this year, the number of Huazhi users has grown rapidly, and currently the number of individual users has exceeded 10 million.

However, CNKI has been fined repeatedly, and the industry has certain doubts about its development model. On December 26, 2022, the State Administration for Market Regulation made an administrative penalty decision in accordance with the law, ordering CNKI to stop illegal activities and imposing a fine of 5% of its sales in China in 2021 of 1.752 billion yuan, totaling 87.6 million yuan; in September 2023, the Cyberspace Administration of China made a decision on administrative penalties related to network security review against CNKI in accordance with the law, ordering it to stop illegal processing of personal information and imposing a fine of 50 million yuan, citing 14 apps including CNKI Mobile and CNKI Reading that were accused of having related problems, as well as violating the principle of necessity to collect personal information, collecting personal information without consent, not disclosing or not clearly stating the rules for collection and use, not providing an account cancellation function, and not deleting user personal information in a timely manner after the user cancels the account.

Liu Wenjie, professor at the Institute of Comparative Law of China University of Political Science and Law, believes that the abstract is a concentrated summary of the content of the paper, especially the ideas and viewpoints. If search services are provided to Internet users and public data on the Internet is crawled to provide paper abstracts, it should be regarded as fair use under copyright law and usually does not constitute copyright infringement.

Recently, Elizabeth Gibney, editor of the internationally renowned journal Nature, published an article stating that more and more academic publishers are licensing research papers to technology companies for training AI models. One academic publisher earned $23 million from this, while the author earned nothing. In many cases, the authors were not consulted on these transactions, which has aroused strong dissatisfaction among some researchers.

"If your paper has not yet been used as AI training data, it will probably soon become part of the training." Elizabeth Gipney pointed out in the article that academic paper authors currently have little right to interfere when publishers sell their copyrighted works. There is no ready-made mechanism to confirm whether publicly published articles are used as AI training data. How to establish a fairer mechanism to protect the rights of creators in the use of large language models is worthy of extensive discussion in academia and the copyright industry.

Large language model (LLM) usually rely on large amounts of data scraped from the Internet for training. These data include billions of pieces of language information (called "tags"), and by analyzing the patterns between these tags, the model is able to generate fluent text. Academic papers are more valuable than large amounts of ordinary data because of their rich content and high information density, and are an important source of data in AI training. Stefan Baack, a data analyst at the Mozilla Foundation, pointed out that scientific papers are of great help in the training of large language models, especially in terms of reasoning ability on scientific topics. It is precisely because of the high value of the data that major technology companies have spent huge sums of money to purchase data sets.

You Yunting, a lawyer and senior partner of Shanghai Dabang Law Firm, said that the biggest problem with the "academic" section of the MiTa AI search is that it can fully display the content of the article. "Although the paper PDF cannot be downloaded in the research results, users can click on the PDF link on the results page to view the full text of the article, which infringes on the information network dissemination rights of the article." However, if after absorbing the essence of the article, AI relays it to the user in its own way, according to the Copyright Law, reasonable quoting of part of the content of the work in order to illustrate a certain issue is fair use.

Regarding the use of paper data for training of large models, You Yunting said that this behavior does not infringe on CNKI. According to the Copyright Law, the training process is copying and learning. Copying is copying the article from the Internet to the server for training. Whether learning is an infringement, there is currently no clear legal judgment to determine. However, whether it is the right to copy, learn, or other rights involved in copyright, CNKI is not the owner of the paper.

Zhang Hongwei admitted that in the era of big AI models, HowNet needs to establish an ecosystem and cooperation.

"If there is no upstream, publishing industry, and data industry to continuously supply high-quality data, it will be difficult for our artificial intelligence industry to continue to develop with high quality. To solve this problem, we actually need the wisdom of the entire industry. We need to work together to build aAIGC"We are willing to cooperate with you in this regard to promote the sustainable and high-quality development of the industry." Zhang Hongwei said.