大模型训练数据的侵权风险与法治路径

高科技与产业化 ›› 2026, Vol. 32 ›› Issue (2) : 125.

知识产权

大模型训练数据的侵权风险与法治路径

王小花

作者信息 +

Infringement Risks and Legal Governance Paths of Training Data for Large Models

WANG Xiaohua

Author information +

文章历史 +

摘要

随着大模型技术的快速发展，训练数据已成为生成式人工智能性能跃升与能力涌现的基础性要素。然而，训练数据在来源、结构与使用方式上的高度复杂性，使其在模型训练与内容生成过程中不断引发个人信息保护、著作权保护及数据权益配置等方面的侵权风险。现有法律规范多以传统数据处理与内容生产模式为假定前提，难以有效回应大模型训练数据所引发的新型风险。因而，本文从训练数据的来源类型与法律属性出发，系统梳理大模型训练数据在全生命周期中的主要侵权风险，分析其制度成因，并在此基础上提出以分类治理、风险导向与责任配置为核心的法治路径，以期为大模型技术的合规发展提供规范支持。

Abstract

With the rapid development of large model technology, training data has become a fundamental element for the performance improvement and capability emergence of generative artificial intelligence. However, the high complexity of training data in terms of sources, structures and usage methods has continuously triggered infringement risks in personal information protection, copyright protection and data rights allocation during the model training and content generation processes. Existing legal norms are mostly based on the assumption of traditional data processing and content production models, making it difficult to effectively respond to the new risks caused by training data for large models. Therefore, starting from the source types and legal attributes of training data, this paper systematically sorts out the main infringement risks in the full life cycle of training data for large models, analyzes the institutional causes, and on this basis, puts forward a legal governance path centered on classified governance, risk orientation and liability allocation, so as to provide normative support for the compliant development of large model technology.

导出引用

王小花. 大模型训练数据的侵权风险与法治路径[J]. 高科技与产业化. 2026, 32(2): 125

WANG Xiaohua. Infringement Risks and Legal Governance Paths of Training Data for Large Models[J]. High-Technology and Commercialization. 2026, 32(2): 125