Infringement Risks and Legal Governance Paths of Training Data for Large Models

WANG Xiaohua

High-Technology and Commercialization ›› 2026, Vol. 32 ›› Issue (2) : 125.

主管:中国科学院
主办:中国科学院文献情报中心、中国高科技产业化研究会
ISSN:1006-222X
CN:11-3556/N
High-Technology and Commercialization ›› 2026, Vol. 32 ›› Issue (2) : 125.

Infringement Risks and Legal Governance Paths of Training Data for Large Models

  • WANG Xiaohua
Author information +
History +

Abstract

With the rapid development of large model technology, training data has become a fundamental element for the performance improvement and capability emergence of generative artificial intelligence. However, the high complexity of training data in terms of sources, structures and usage methods has continuously triggered infringement risks in personal information protection, copyright protection and data rights allocation during the model training and content generation processes. Existing legal norms are mostly based on the assumption of traditional data processing and content production models, making it difficult to effectively respond to the new risks caused by training data for large models. Therefore, starting from the source types and legal attributes of training data, this paper systematically sorts out the main infringement risks in the full life cycle of training data for large models, analyzes the institutional causes, and on this basis, puts forward a legal governance path centered on classified governance, risk orientation and liability allocation, so as to provide normative support for the compliant development of large model technology.

Key words

large model / training data / infringement risk / digital rule of law

Cite this article

Download Citations
WANG Xiaohua. Infringement Risks and Legal Governance Paths of Training Data for Large Models[J]. High-Technology and Commercialization. 2026, 32(2): 125

Accesses

Citation

Detail

Sections
Recommended

/