news

To provide high-quality corpus for large AI models, the construction of this infrastructure has started!

2024-08-20

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

On August 19, at the 2024 Beijing Artificial Intelligence Ecosystem Conference, the construction of a trusted circulation infrastructure for high-value corpus was officially launched. Under the protection of cutting-edge information technologies such as blockchain and privacy computing, the corpus data that supports the high-quality growth of large AI models will bid farewell to disorderly circulation and turn into a "highway" of standardized operation. This is of great significance for accelerating the formation of a highland for AI large model training and promoting my country's AI to achieve overtaking and leapfrog development.

In recent years, artificial intelligence technology has developed rapidly and played an increasingly important role in key areas of national economic operation. The high-value corpus data used in large model training is like an important "fuel" in the development of artificial intelligence large models. However, high-value corpus data is distributed across units, industries, and regions, lacks sufficient privacy and security guarantees and effective circulation incentive mechanisms, and makes the owners of high-value corpus data often "dare not share" or "unwilling to share". The "difficulty in supply, circulation, and use" of a large amount of high-value corpus data has become a bottleneck for the further development of artificial intelligence in my country.

It is reported that the trusted circulation infrastructure for high-value corpus is led by the National Blockchain Technology Innovation Center and Beijing Energy Group, and jointly built by more than 10 key corpus data units in my country, including Xinhua News Agency National Key Laboratory, People's Daily Online, Higher Education Press, and China General Technology Group. The new generation of information technology represented by blockchain and privacy computing, with its trusted evidence, non-tamperable, easy to confirm ownership, and full protection of data privacy and security, can ensure the trusted and secure circulation, use and management of corpus data, effectively solving the above problems.

Relevant person in charge of the National Blockchain Technology Innovation Center introduced that the high-value corpus trusted circulation infrastructure will use my country's independently controllable and leading blockchain software and hardware technology to build a distributed corpus data interconnection bridge covering the whole country, linking corpus suppliers, processors, and demanders, and realizing trusted access to distributed corpus data across the country, which can be discovered and accessed across regions to form a high-quality corpus data set; at the same time, it will use innovative privacy computing technology to ensure that large-model high-value corpus data cannot be disseminated again without authorization during processing and model training through the method of "data does not leave the domain and is available but not visible"; in addition, the infrastructure will also carry out on-chain incentives through smart contracts to provide continuous endogenous driving force for the supply and circulation of corpus resources.

With the support of the high-value corpus trusted circulation infrastructure, key units of national corpus data will also develop standards for the trusted and secure circulation of corpus data based on blockchain and privacy computing, forming a sustainable ecosystem for the circulation and value-added of high-value corpus data.

Source: Beijing Daily Client

Reporter: Sun Qiru

Report/Feedback