Infringement Risks and Legal Governance Paths of Training Data 
for Large Models

High-Technology and Commercialization ›› 2026, Vol. 32 ›› Issue (2) : 125.

Infringement Risks and Legal Governance Paths of Training Data for Large Models

WANG Xiaohua

Author information +

History +

Abstract

With the rapid development of large model technology, training data has become a fundamental element for the performance improvement and capability emergence of generative artificial intelligence. However, the high complexity of training data in terms of sources, structures and usage methods has continuously triggered infringement risks in personal information protection, copyright protection and data rights allocation during the model training and content generation processes. Existing legal norms are mostly based on the assumption of traditional data processing and content production models, making it difficult to effectively respond to the new risks caused by training data for large models. Therefore, starting from the source types and legal attributes of training data, this paper systematically sorts out the main infringement risks in the full life cycle of training data for large models, analyzes the institutional causes, and on this basis, puts forward a legal governance path centered on classified governance, risk orientation and liability allocation, so as to provide normative support for the compliant development of large model technology.

Key words

large model / training data / infringement risk / digital rule of law

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

WANG Xiaohua. Infringement Risks and Legal Governance Paths of Training Data for Large Models[J]. High-Technology and Commercialization. 2026, 32(2): 125