首页 最新 热门 推荐

  • 首页
  • 最新
  • 热门
  • 推荐

3.5 Lora 原理与实战

  • 25-02-16 23:40
  • 2516
  • 13533
blog.csdn.net

目录

1 Lora的思想:

1.1 设定:

1.2 步骤:

1.3 举例说明: 

2 环境配置:

3 代码实战演练(基于Bloom模型):

 3.1 导包

3.2 加载数据集

3.3 数据集处理

3.4 创建模型

3.4.1 配置文件

3.4.2 构建模型

3.5 配置训练参数

3.6 创建训练器

3.7 模型训练

3.8 模型推理

3.9 加载权重文件并进行权重合并

3.9.1 加载权重文件: 

3.9.2 进行权重合并:

3.10 完整保存模型


1 Lora的思想:

LoRA(Low-Rank Adaptation) 是一种通过矩阵拆分来高效微调大型预训练模型的方法。它的核心思想是将模型中的权重矩阵分解为低秩矩阵的形式,通过这种方式减少参数数量,同时保持模型的能力。

接下来让我们通过一个简单的例子来理解 LoRA 的矩阵拆分。

1.1 设定:

假设我们有一个大型模型的一层权重矩阵 W,维度为 768×768。如果直接微调这个权重矩阵,会涉及 768×768=589,824个参数,非常耗费计算资源。LoRA 的思想是通过两个低秩矩阵 A 和 B来近似这个权重矩阵的变化 ΔW,从而只需要训练较少的参数。

1.2 步骤:

1.3 举例说明: 

通过将权重矩阵的变化部分拆分成两个低秩矩阵 A 和 B,LoRA 极大地减少了需要微调的参数量。即使在原始权重矩阵非常大的情况下,LoRA 也能通过训练少量的参数高效地进行模型微调。这种方法特别适合处理像 GPT、BERT 这样的大型预训练模型的微调任务。

2 环境配置:

3 代码实战演练(基于Bloom模型):

 3.1 导包

  1. from datasets import Dataset
  2. from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer

3.2 加载数据集

  1. ds = Dataset.load_from_disk("../data/alpaca_data_zh/")
  2. ds

3.3 数据集处理

  1. tokenizer = AutoTokenizer.from_pretrained("Langboat/bloom-1b4-zh")
  2. tokenizer
  3. def process_func(example):
  4. MAX_LENGTH = 256
  5. input_ids, attention_mask, labels = [], [], []
  6. instruction = tokenizer("\n".join(["Human: " + example["instruction"], example["input"]]).strip() + "\n\nAssistant: ")
  7. response = tokenizer(example["output"] + tokenizer.eos_token)
  8. input_ids = instruction["input_ids"] + response["input_ids"]
  9. attention_mask = instruction["attention_mask"] + response["attention_mask"]
  10. labels = [-100] * len(instruction["input_ids"]) + response["input_ids"]
  11. if len(input_ids) > MAX_LENGTH:
  12. input_ids = input_ids[:MAX_LENGTH]
  13. attention_mask = attention_mask[:MAX_LENGTH]
  14. labels = labels[:MAX_LENGTH]
  15. return {
  16. "input_ids": input_ids,
  17. "attention_mask": attention_mask,
  18. "labels": labels
  19. }
  20. tokenized_ds = ds.map(process_func, remove_columns=ds.column_names)
  21. tokenized_ds

3.4 创建模型

model = AutoModelForCausalLM.from_pretrained("Langboat/bloom-1b4-zh", low_cpu_mem_usage=True)

为了指定后续用于训练的模块,我们不妨先遍历并打印模型中所有可训练参数的名称:

  1. for name, parameter in model.named_parameters():
  2. print(name)
  1. transformer.word_embeddings.weight
  2. transformer.word_embeddings_layernorm.weight
  3. transformer.word_embeddings_layernorm.bias
  4. transformer.h.0.input_layernorm.weight
  5. transformer.h.0.input_layernorm.bias
  6. transformer.h.0.self_attention.query_key_value.weight
  7. transformer.h.0.self_attention.query_key_value.bias
  8. transformer.h.0.self_attention.dense.weight
  9. transformer.h.0.self_attention.dense.bias
  10. transformer.h.0.post_attention_layernorm.weight
  11. transformer.h.0.post_attention_layernorm.bias
  12. transformer.h.0.mlp.dense_h_to_4h.weight
  13. transformer.h.0.mlp.dense_h_to_4h.bias
  14. transformer.h.0.mlp.dense_4h_to_h.weight
  15. transformer.h.0.mlp.dense_4h_to_h.bias
  16. transformer.h.1.input_layernorm.weight
  17. transformer.h.1.input_layernorm.bias
  18. transformer.h.1.self_attention.query_key_value.weight
  19. transformer.h.1.self_attention.query_key_value.bias
  20. transformer.h.1.self_attention.dense.weight
  21. transformer.h.1.self_attention.dense.bias
  22. transformer.h.1.post_attention_layernorm.weight
  23. transformer.h.1.post_attention_layernorm.bias
  24. transformer.h.1.mlp.dense_h_to_4h.weight
  25. transformer.h.1.mlp.dense_h_to_4h.bias
  26. transformer.h.1.mlp.dense_4h_to_h.weight
  27. transformer.h.1.mlp.dense_4h_to_h.bias
  28. transformer.h.2.input_layernorm.weight
  29. transformer.h.2.input_layernorm.bias
  30. transformer.h.2.self_attention.query_key_value.weight
  31. transformer.h.2.self_attention.query_key_value.bias
  32. transformer.h.2.self_attention.dense.weight
  33. transformer.h.2.self_attention.dense.bias
  34. transformer.h.2.post_attention_layernorm.weight
  35. transformer.h.2.post_attention_layernorm.bias
  36. transformer.h.2.mlp.dense_h_to_4h.weight
  37. transformer.h.2.mlp.dense_h_to_4h.bias
  38. transformer.h.2.mlp.dense_4h_to_h.weight
  39. transformer.h.2.mlp.dense_4h_to_h.bias
  40. transformer.h.3.input_layernorm.weight
  41. transformer.h.3.input_layernorm.bias
  42. transformer.h.3.self_attention.query_key_value.weight
  43. transformer.h.3.self_attention.query_key_value.bias
  44. transformer.h.3.self_attention.dense.weight
  45. transformer.h.3.self_attention.dense.bias
  46. transformer.h.3.post_attention_layernorm.weight
  47. transformer.h.3.post_attention_layernorm.bias
  48. transformer.h.3.mlp.dense_h_to_4h.weight
  49. transformer.h.3.mlp.dense_h_to_4h.bias
  50. transformer.h.3.mlp.dense_4h_to_h.weight
  51. transformer.h.3.mlp.dense_4h_to_h.bias
  52. transformer.h.4.input_layernorm.weight
  53. transformer.h.4.input_layernorm.bias
  54. transformer.h.4.self_attention.query_key_value.weight
  55. transformer.h.4.self_attention.query_key_value.bias
  56. transformer.h.4.self_attention.dense.weight
  57. transformer.h.4.self_attention.dense.bias
  58. transformer.h.4.post_attention_layernorm.weight
  59. transformer.h.4.post_attention_layernorm.bias
  60. transformer.h.4.mlp.dense_h_to_4h.weight
  61. transformer.h.4.mlp.dense_h_to_4h.bias
  62. transformer.h.4.mlp.dense_4h_to_h.weight
  63. transformer.h.4.mlp.dense_4h_to_h.bias
  64. transformer.h.5.input_layernorm.weight
  65. transformer.h.5.input_layernorm.bias
  66. transformer.h.5.self_attention.query_key_value.weight
  67. transformer.h.5.self_attention.query_key_value.bias
  68. transformer.h.5.self_attention.dense.weight
  69. transformer.h.5.self_attention.dense.bias
  70. transformer.h.5.post_attention_layernorm.weight
  71. transformer.h.5.post_attention_layernorm.bias
  72. transformer.h.5.mlp.dense_h_to_4h.weight
  73. transformer.h.5.mlp.dense_h_to_4h.bias
  74. transformer.h.5.mlp.dense_4h_to_h.weight
  75. transformer.h.5.mlp.dense_4h_to_h.bias
  76. transformer.h.6.input_layernorm.weight
  77. transformer.h.6.input_layernorm.bias
  78. transformer.h.6.self_attention.query_key_value.weight
  79. transformer.h.6.self_attention.query_key_value.bias
  80. transformer.h.6.self_attention.dense.weight
  81. transformer.h.6.self_attention.dense.bias
  82. transformer.h.6.post_attention_layernorm.weight
  83. transformer.h.6.post_attention_layernorm.bias
  84. transformer.h.6.mlp.dense_h_to_4h.weight
  85. transformer.h.6.mlp.dense_h_to_4h.bias
  86. transformer.h.6.mlp.dense_4h_to_h.weight
  87. transformer.h.6.mlp.dense_4h_to_h.bias
  88. transformer.h.7.input_layernorm.weight
  89. transformer.h.7.input_layernorm.bias
  90. transformer.h.7.self_attention.query_key_value.weight
  91. transformer.h.7.self_attention.query_key_value.bias
  92. transformer.h.7.self_attention.dense.weight
  93. transformer.h.7.self_attention.dense.bias
  94. transformer.h.7.post_attention_layernorm.weight
  95. transformer.h.7.post_attention_layernorm.bias
  96. transformer.h.7.mlp.dense_h_to_4h.weight
  97. transformer.h.7.mlp.dense_h_to_4h.bias
  98. transformer.h.7.mlp.dense_4h_to_h.weight
  99. transformer.h.7.mlp.dense_4h_to_h.bias
  100. transformer.h.8.input_layernorm.weight
  101. transformer.h.8.input_layernorm.bias
  102. transformer.h.8.self_attention.query_key_value.weight
  103. transformer.h.8.self_attention.query_key_value.bias
  104. transformer.h.8.self_attention.dense.weight
  105. transformer.h.8.self_attention.dense.bias
  106. transformer.h.8.post_attention_layernorm.weight
  107. transformer.h.8.post_attention_layernorm.bias
  108. transformer.h.8.mlp.dense_h_to_4h.weight
  109. transformer.h.8.mlp.dense_h_to_4h.bias
  110. transformer.h.8.mlp.dense_4h_to_h.weight
  111. transformer.h.8.mlp.dense_4h_to_h.bias
  112. transformer.h.9.input_layernorm.weight
  113. transformer.h.9.input_layernorm.bias
  114. transformer.h.9.self_attention.query_key_value.weight
  115. transformer.h.9.self_attention.query_key_value.bias
  116. transformer.h.9.self_attention.dense.weight
  117. transformer.h.9.self_attention.dense.bias
  118. transformer.h.9.post_attention_layernorm.weight
  119. transformer.h.9.post_attention_layernorm.bias
  120. transformer.h.9.mlp.dense_h_to_4h.weight
  121. transformer.h.9.mlp.dense_h_to_4h.bias
  122. transformer.h.9.mlp.dense_4h_to_h.weight
  123. transformer.h.9.mlp.dense_4h_to_h.bias
  124. transformer.h.10.input_layernorm.weight
  125. transformer.h.10.input_layernorm.bias
  126. transformer.h.10.self_attention.query_key_value.weight
  127. transformer.h.10.self_attention.query_key_value.bias
  128. transformer.h.10.self_attention.dense.weight
  129. transformer.h.10.self_attention.dense.bias
  130. transformer.h.10.post_attention_layernorm.weight
  131. transformer.h.10.post_attention_layernorm.bias
  132. transformer.h.10.mlp.dense_h_to_4h.weight
  133. transformer.h.10.mlp.dense_h_to_4h.bias
  134. transformer.h.10.mlp.dense_4h_to_h.weight
  135. transformer.h.10.mlp.dense_4h_to_h.bias
  136. transformer.h.11.input_layernorm.weight
  137. transformer.h.11.input_layernorm.bias
  138. transformer.h.11.self_attention.query_key_value.weight
  139. transformer.h.11.self_attention.query_key_value.bias
  140. transformer.h.11.self_attention.dense.weight
  141. transformer.h.11.self_attention.dense.bias
  142. transformer.h.11.post_attention_layernorm.weight
  143. transformer.h.11.post_attention_layernorm.bias
  144. transformer.h.11.mlp.dense_h_to_4h.weight
  145. transformer.h.11.mlp.dense_h_to_4h.bias
  146. transformer.h.11.mlp.dense_4h_to_h.weight
  147. transformer.h.11.mlp.dense_4h_to_h.bias
  148. transformer.h.12.input_layernorm.weight
  149. transformer.h.12.input_layernorm.bias
  150. transformer.h.12.self_attention.query_key_value.weight
  151. transformer.h.12.self_attention.query_key_value.bias
  152. transformer.h.12.self_attention.dense.weight
  153. transformer.h.12.self_attention.dense.bias
  154. transformer.h.12.post_attention_layernorm.weight
  155. transformer.h.12.post_attention_layernorm.bias
  156. transformer.h.12.mlp.dense_h_to_4h.weight
  157. transformer.h.12.mlp.dense_h_to_4h.bias
  158. transformer.h.12.mlp.dense_4h_to_h.weight
  159. transformer.h.12.mlp.dense_4h_to_h.bias
  160. transformer.h.13.input_layernorm.weight
  161. transformer.h.13.input_layernorm.bias
  162. transformer.h.13.self_attention.query_key_value.weight
  163. transformer.h.13.self_attention.query_key_value.bias
  164. transformer.h.13.self_attention.dense.weight
  165. transformer.h.13.self_attention.dense.bias
  166. transformer.h.13.post_attention_layernorm.weight
  167. transformer.h.13.post_attention_layernorm.bias
  168. transformer.h.13.mlp.dense_h_to_4h.weight
  169. transformer.h.13.mlp.dense_h_to_4h.bias
  170. transformer.h.13.mlp.dense_4h_to_h.weight
  171. transformer.h.13.mlp.dense_4h_to_h.bias
  172. transformer.h.14.input_layernorm.weight
  173. transformer.h.14.input_layernorm.bias
  174. transformer.h.14.self_attention.query_key_value.weight
  175. transformer.h.14.self_attention.query_key_value.bias
  176. transformer.h.14.self_attention.dense.weight
  177. transformer.h.14.self_attention.dense.bias
  178. transformer.h.14.post_attention_layernorm.weight
  179. transformer.h.14.post_attention_layernorm.bias
  180. transformer.h.14.mlp.dense_h_to_4h.weight
  181. transformer.h.14.mlp.dense_h_to_4h.bias
  182. transformer.h.14.mlp.dense_4h_to_h.weight
  183. transformer.h.14.mlp.dense_4h_to_h.bias
  184. transformer.h.15.input_layernorm.weight
  185. transformer.h.15.input_layernorm.bias
  186. transformer.h.15.self_attention.query_key_value.weight
  187. transformer.h.15.self_attention.query_key_value.bias
  188. transformer.h.15.self_attention.dense.weight
  189. transformer.h.15.self_attention.dense.bias
  190. transformer.h.15.post_attention_layernorm.weight
  191. transformer.h.15.post_attention_layernorm.bias
  192. transformer.h.15.mlp.dense_h_to_4h.weight
  193. transformer.h.15.mlp.dense_h_to_4h.bias
  194. transformer.h.15.mlp.dense_4h_to_h.weight
  195. transformer.h.15.mlp.dense_4h_to_h.bias
  196. transformer.h.16.input_layernorm.weight
  197. transformer.h.16.input_layernorm.bias
  198. transformer.h.16.self_attention.query_key_value.weight
  199. transformer.h.16.self_attention.query_key_value.bias
  200. transformer.h.16.self_attention.dense.weight
  201. transformer.h.16.self_attention.dense.bias
  202. transformer.h.16.post_attention_layernorm.weight
  203. transformer.h.16.post_attention_layernorm.bias
  204. transformer.h.16.mlp.dense_h_to_4h.weight
  205. transformer.h.16.mlp.dense_h_to_4h.bias
  206. transformer.h.16.mlp.dense_4h_to_h.weight
  207. transformer.h.16.mlp.dense_4h_to_h.bias
  208. transformer.h.17.input_layernorm.weight
  209. transformer.h.17.input_layernorm.bias
  210. transformer.h.17.self_attention.query_key_value.weight
  211. transformer.h.17.self_attention.query_key_value.bias
  212. transformer.h.17.self_attention.dense.weight
  213. transformer.h.17.self_attention.dense.bias
  214. transformer.h.17.post_attention_layernorm.weight
  215. transformer.h.17.post_attention_layernorm.bias
  216. transformer.h.17.mlp.dense_h_to_4h.weight
  217. transformer.h.17.mlp.dense_h_to_4h.bias
  218. transformer.h.17.mlp.dense_4h_to_h.weight
  219. transformer.h.17.mlp.dense_4h_to_h.bias
  220. transformer.h.18.input_layernorm.weight
  221. transformer.h.18.input_layernorm.bias
  222. transformer.h.18.self_attention.query_key_value.weight
  223. transformer.h.18.self_attention.query_key_value.bias
  224. transformer.h.18.self_attention.dense.weight
  225. transformer.h.18.self_attention.dense.bias
  226. transformer.h.18.post_attention_layernorm.weight
  227. transformer.h.18.post_attention_layernorm.bias
  228. transformer.h.18.mlp.dense_h_to_4h.weight
  229. transformer.h.18.mlp.dense_h_to_4h.bias
  230. transformer.h.18.mlp.dense_4h_to_h.weight
  231. transformer.h.18.mlp.dense_4h_to_h.bias
  232. transformer.h.19.input_layernorm.weight
  233. transformer.h.19.input_layernorm.bias
  234. transformer.h.19.self_attention.query_key_value.weight
  235. transformer.h.19.self_attention.query_key_value.bias
  236. transformer.h.19.self_attention.dense.weight
  237. transformer.h.19.self_attention.dense.bias
  238. transformer.h.19.post_attention_layernorm.weight
  239. transformer.h.19.post_attention_layernorm.bias
  240. transformer.h.19.mlp.dense_h_to_4h.weight
  241. transformer.h.19.mlp.dense_h_to_4h.bias
  242. transformer.h.19.mlp.dense_4h_to_h.weight
  243. transformer.h.19.mlp.dense_4h_to_h.bias
  244. transformer.h.20.input_layernorm.weight
  245. transformer.h.20.input_layernorm.bias
  246. transformer.h.20.self_attention.query_key_value.weight
  247. transformer.h.20.self_attention.query_key_value.bias
  248. transformer.h.20.self_attention.dense.weight
  249. transformer.h.20.self_attention.dense.bias
  250. transformer.h.20.post_attention_layernorm.weight
  251. transformer.h.20.post_attention_layernorm.bias
  252. transformer.h.20.mlp.dense_h_to_4h.weight
  253. transformer.h.20.mlp.dense_h_to_4h.bias
  254. transformer.h.20.mlp.dense_4h_to_h.weight
  255. transformer.h.20.mlp.dense_4h_to_h.bias
  256. transformer.h.21.input_layernorm.weight
  257. transformer.h.21.input_layernorm.bias
  258. transformer.h.21.self_attention.query_key_value.weight
  259. transformer.h.21.self_attention.query_key_value.bias
  260. transformer.h.21.self_attention.dense.weight
  261. transformer.h.21.self_attention.dense.bias
  262. transformer.h.21.post_attention_layernorm.weight
  263. transformer.h.21.post_attention_layernorm.bias
  264. transformer.h.21.mlp.dense_h_to_4h.weight
  265. transformer.h.21.mlp.dense_h_to_4h.bias
  266. transformer.h.21.mlp.dense_4h_to_h.weight
  267. transformer.h.21.mlp.dense_4h_to_h.bias
  268. transformer.h.22.input_layernorm.weight
  269. transformer.h.22.input_layernorm.bias
  270. transformer.h.22.self_attention.query_key_value.weight
  271. transformer.h.22.self_attention.query_key_value.bias
  272. transformer.h.22.self_attention.dense.weight
  273. transformer.h.22.self_attention.dense.bias
  274. transformer.h.22.post_attention_layernorm.weight
  275. transformer.h.22.post_attention_layernorm.bias
  276. transformer.h.22.mlp.dense_h_to_4h.weight
  277. transformer.h.22.mlp.dense_h_to_4h.bias
  278. transformer.h.22.mlp.dense_4h_to_h.weight
  279. transformer.h.22.mlp.dense_4h_to_h.bias
  280. transformer.h.23.input_layernorm.weight
  281. transformer.h.23.input_layernorm.bias
  282. transformer.h.23.self_attention.query_key_value.weight
  283. transformer.h.23.self_attention.query_key_value.bias
  284. transformer.h.23.self_attention.dense.weight
  285. transformer.h.23.self_attention.dense.bias
  286. transformer.h.23.post_attention_layernorm.weight
  287. transformer.h.23.post_attention_layernorm.bias
  288. transformer.h.23.mlp.dense_h_to_4h.weight
  289. transformer.h.23.mlp.dense_h_to_4h.bias
  290. transformer.h.23.mlp.dense_4h_to_h.weight
  291. transformer.h.23.mlp.dense_4h_to_h.bias
  292. transformer.ln_f.weight
  293. transformer.ln_f.bias

3.4.1 配置文件

  1. from peft import LoraConfig, TaskType, get_peft_model
  2. config = LoraConfig(task_type=TaskType.CAUSAL_LM, target_modules=".*\.1.*query_key_value", modules_to_save=["word_embeddings"])
  3. config

主要应用参数:

1)r:

  • 含义:r 指的是 LoRA 中用于矩阵拆分的低秩值。这个值定义了矩阵分解中低秩矩阵的维度,即矩阵 A 和 B 的秩。通常,r 远小于原始权重矩阵的维度。
  • 作用:决定微调过程中需要训练的参数数量。r 越小,训练的参数越少;r 越大,模型表达能力越强。
  • 默认值:r=8,表示使用 8 维度的低秩矩阵。

 2)target_modules:

  • 含义:指定模型中的哪些模块或层将被 LoRA 替换或修改。可以是模块名称的列表,也可以是正则表达式来匹配模块名称。(指定后,模型结构也将做相应调整
  • 作用:确定 LoRA 应该在哪些模型模块(如自注意力层的查询矩阵 q 或值矩阵 v)上应用。通过正则表达式,可以灵活地指定匹配的模块。
  • 示例:['query_key_value'] (也可以以正则形式加载

3)lora_alpha:

  • 含义:lora_alpha 是一个缩放因子,用于调整 LoRA 中更新权重的比例。
  • 作用:控制权重矩阵的调整幅度。微调过程中计算的更新量 ΔW\Delta WΔW 会乘以这个系数,防止大幅度更新导致模型过拟合。
  • 默认值:lora_alpha=8,表示使用 8 作为缩放因子。

4)lora_dropout:

  • 含义:lora_dropout 是 LoRA 层的 dropout 概率。
  • 作用:防止过拟合。通过在训练过程中随机丢弃一些 LoRA 计算的权重更新,来提高模型的泛化能力。
  • 默认值:lora_dropout=0.0,表示不使用 dropout。如果将其设置为非零值,则会启用 dropout。

5)modules_to_save:

  • 含义:指定除 LoRA 层之外的其他需要设置为可训练并在最后保存的模块名列表。
  • 作用:在某些任务中(如序列分类、标记分类等),可能需要将某些特定的模块(如 classifier/score)设置为可训练,并保存它们的权重,以确保这些任务的正确执行。

3.4.2 构建模型

model = get_peft_model(model, config)
config

LoraConfig(peft_type=, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=, inference_mode=False, r=8, target_modules='.*\\.1.*query_key_value', lora_alpha=8, lora_dropout=0.0, fan_in_fan_out=False, bias='none', modules_to_save=['word_embeddings'], init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}) 

  1. for name, parameter in model.named_parameters():
  2. print(name)
  1. base_model.model.base_model.model.transformer.word_embeddings.original_module.weight
  2. base_model.model.base_model.model.transformer.word_embeddings.modules_to_save.default.weight
  3. base_model.model.base_model.model.transformer.word_embeddings_layernorm.weight
  4. base_model.model.base_model.model.transformer.word_embeddings_layernorm.bias
  5. base_model.model.base_model.model.transformer.h.0.input_layernorm.weight
  6. base_model.model.base_model.model.transformer.h.0.input_layernorm.bias
  7. base_model.model.base_model.model.transformer.h.0.self_attention.query_key_value.weight
  8. base_model.model.base_model.model.transformer.h.0.self_attention.query_key_value.bias
  9. base_model.model.base_model.model.transformer.h.0.self_attention.query_key_value.lora_A.default.weight
  10. base_model.model.base_model.model.transformer.h.0.self_attention.query_key_value.lora_B.default.weight
  11. base_model.model.base_model.model.transformer.h.0.self_attention.dense.weight
  12. base_model.model.base_model.model.transformer.h.0.self_attention.dense.bias
  13. base_model.model.base_model.model.transformer.h.0.post_attention_layernorm.weight
  14. base_model.model.base_model.model.transformer.h.0.post_attention_layernorm.bias
  15. base_model.model.base_model.model.transformer.h.0.mlp.dense_h_to_4h.weight
  16. base_model.model.base_model.model.transformer.h.0.mlp.dense_h_to_4h.bias
  17. base_model.model.base_model.model.transformer.h.0.mlp.dense_4h_to_h.weight
  18. base_model.model.base_model.model.transformer.h.0.mlp.dense_4h_to_h.bias
  19. base_model.model.base_model.model.transformer.h.1.input_layernorm.weight
  20. base_model.model.base_model.model.transformer.h.1.input_layernorm.bias
  21. base_model.model.base_model.model.transformer.h.1.self_attention.query_key_value.weight
  22. base_model.model.base_model.model.transformer.h.1.self_attention.query_key_value.bias
  23. base_model.model.base_model.model.transformer.h.1.self_attention.query_key_value.lora_A.default.weight
  24. base_model.model.base_model.model.transformer.h.1.self_attention.query_key_value.lora_B.default.weight
  25. base_model.model.base_model.model.transformer.h.1.self_attention.dense.weight
  26. base_model.model.base_model.model.transformer.h.1.self_attention.dense.bias
  27. base_model.model.base_model.model.transformer.h.1.post_attention_layernorm.weight
  28. base_model.model.base_model.model.transformer.h.1.post_attention_layernorm.bias
  29. base_model.model.base_model.model.transformer.h.1.mlp.dense_h_to_4h.weight
  30. base_model.model.base_model.model.transformer.h.1.mlp.dense_h_to_4h.bias
  31. base_model.model.base_model.model.transformer.h.1.mlp.dense_4h_to_h.weight
  32. base_model.model.base_model.model.transformer.h.1.mlp.dense_4h_to_h.bias
  33. base_model.model.base_model.model.transformer.h.2.input_layernorm.weight
  34. base_model.model.base_model.model.transformer.h.2.input_layernorm.bias
  35. base_model.model.base_model.model.transformer.h.2.self_attention.query_key_value.weight
  36. base_model.model.base_model.model.transformer.h.2.self_attention.query_key_value.bias
  37. base_model.model.base_model.model.transformer.h.2.self_attention.query_key_value.lora_A.default.weight
  38. base_model.model.base_model.model.transformer.h.2.self_attention.query_key_value.lora_B.default.weight
  39. base_model.model.base_model.model.transformer.h.2.self_attention.dense.weight
  40. base_model.model.base_model.model.transformer.h.2.self_attention.dense.bias
  41. base_model.model.base_model.model.transformer.h.2.post_attention_layernorm.weight
  42. base_model.model.base_model.model.transformer.h.2.post_attention_layernorm.bias
  43. base_model.model.base_model.model.transformer.h.2.mlp.dense_h_to_4h.weight
  44. base_model.model.base_model.model.transformer.h.2.mlp.dense_h_to_4h.bias
  45. base_model.model.base_model.model.transformer.h.2.mlp.dense_4h_to_h.weight
  46. base_model.model.base_model.model.transformer.h.2.mlp.dense_4h_to_h.bias
  47. base_model.model.base_model.model.transformer.h.3.input_layernorm.weight
  48. base_model.model.base_model.model.transformer.h.3.input_layernorm.bias
  49. base_model.model.base_model.model.transformer.h.3.self_attention.query_key_value.weight
  50. base_model.model.base_model.model.transformer.h.3.self_attention.query_key_value.bias
  51. base_model.model.base_model.model.transformer.h.3.self_attention.query_key_value.lora_A.default.weight
  52. base_model.model.base_model.model.transformer.h.3.self_attention.query_key_value.lora_B.default.weight
  53. base_model.model.base_model.model.transformer.h.3.self_attention.dense.weight
  54. base_model.model.base_model.model.transformer.h.3.self_attention.dense.bias
  55. base_model.model.base_model.model.transformer.h.3.post_attention_layernorm.weight
  56. base_model.model.base_model.model.transformer.h.3.post_attention_layernorm.bias
  57. base_model.model.base_model.model.transformer.h.3.mlp.dense_h_to_4h.weight
  58. base_model.model.base_model.model.transformer.h.3.mlp.dense_h_to_4h.bias
  59. base_model.model.base_model.model.transformer.h.3.mlp.dense_4h_to_h.weight
  60. base_model.model.base_model.model.transformer.h.3.mlp.dense_4h_to_h.bias
  61. base_model.model.base_model.model.transformer.h.4.input_layernorm.weight
  62. base_model.model.base_model.model.transformer.h.4.input_layernorm.bias
  63. base_model.model.base_model.model.transformer.h.4.self_attention.query_key_value.weight
  64. base_model.model.base_model.model.transformer.h.4.self_attention.query_key_value.bias
  65. base_model.model.base_model.model.transformer.h.4.self_attention.query_key_value.lora_A.default.weight
  66. base_model.model.base_model.model.transformer.h.4.self_attention.query_key_value.lora_B.default.weight
  67. base_model.model.base_model.model.transformer.h.4.self_attention.dense.weight
  68. base_model.model.base_model.model.transformer.h.4.self_attention.dense.bias
  69. base_model.model.base_model.model.transformer.h.4.post_attention_layernorm.weight
  70. base_model.model.base_model.model.transformer.h.4.post_attention_layernorm.bias
  71. base_model.model.base_model.model.transformer.h.4.mlp.dense_h_to_4h.weight
  72. base_model.model.base_model.model.transformer.h.4.mlp.dense_h_to_4h.bias
  73. base_model.model.base_model.model.transformer.h.4.mlp.dense_4h_to_h.weight
  74. base_model.model.base_model.model.transformer.h.4.mlp.dense_4h_to_h.bias
  75. base_model.model.base_model.model.transformer.h.5.input_layernorm.weight
  76. base_model.model.base_model.model.transformer.h.5.input_layernorm.bias
  77. base_model.model.base_model.model.transformer.h.5.self_attention.query_key_value.weight
  78. base_model.model.base_model.model.transformer.h.5.self_attention.query_key_value.bias
  79. base_model.model.base_model.model.transformer.h.5.self_attention.query_key_value.lora_A.default.weight
  80. base_model.model.base_model.model.transformer.h.5.self_attention.query_key_value.lora_B.default.weight
  81. base_model.model.base_model.model.transformer.h.5.self_attention.dense.weight
  82. base_model.model.base_model.model.transformer.h.5.self_attention.dense.bias
  83. base_model.model.base_model.model.transformer.h.5.post_attention_layernorm.weight
  84. base_model.model.base_model.model.transformer.h.5.post_attention_layernorm.bias
  85. base_model.model.base_model.model.transformer.h.5.mlp.dense_h_to_4h.weight
  86. base_model.model.base_model.model.transformer.h.5.mlp.dense_h_to_4h.bias
  87. base_model.model.base_model.model.transformer.h.5.mlp.dense_4h_to_h.weight
  88. base_model.model.base_model.model.transformer.h.5.mlp.dense_4h_to_h.bias
  89. base_model.model.base_model.model.transformer.h.6.input_layernorm.weight
  90. base_model.model.base_model.model.transformer.h.6.input_layernorm.bias
  91. base_model.model.base_model.model.transformer.h.6.self_attention.query_key_value.weight
  92. base_model.model.base_model.model.transformer.h.6.self_attention.query_key_value.bias
  93. base_model.model.base_model.model.transformer.h.6.self_attention.query_key_value.lora_A.default.weight
  94. base_model.model.base_model.model.transformer.h.6.self_attention.query_key_value.lora_B.default.weight
  95. base_model.model.base_model.model.transformer.h.6.self_attention.dense.weight
  96. base_model.model.base_model.model.transformer.h.6.self_attention.dense.bias
  97. base_model.model.base_model.model.transformer.h.6.post_attention_layernorm.weight
  98. base_model.model.base_model.model.transformer.h.6.post_attention_layernorm.bias
  99. base_model.model.base_model.model.transformer.h.6.mlp.dense_h_to_4h.weight
  100. base_model.model.base_model.model.transformer.h.6.mlp.dense_h_to_4h.bias
  101. base_model.model.base_model.model.transformer.h.6.mlp.dense_4h_to_h.weight
  102. base_model.model.base_model.model.transformer.h.6.mlp.dense_4h_to_h.bias
  103. base_model.model.base_model.model.transformer.h.7.input_layernorm.weight
  104. base_model.model.base_model.model.transformer.h.7.input_layernorm.bias
  105. base_model.model.base_model.model.transformer.h.7.self_attention.query_key_value.weight
  106. base_model.model.base_model.model.transformer.h.7.self_attention.query_key_value.bias
  107. base_model.model.base_model.model.transformer.h.7.self_attention.query_key_value.lora_A.default.weight
  108. base_model.model.base_model.model.transformer.h.7.self_attention.query_key_value.lora_B.default.weight
  109. base_model.model.base_model.model.transformer.h.7.self_attention.dense.weight
  110. base_model.model.base_model.model.transformer.h.7.self_attention.dense.bias
  111. base_model.model.base_model.model.transformer.h.7.post_attention_layernorm.weight
  112. base_model.model.base_model.model.transformer.h.7.post_attention_layernorm.bias
  113. base_model.model.base_model.model.transformer.h.7.mlp.dense_h_to_4h.weight
  114. base_model.model.base_model.model.transformer.h.7.mlp.dense_h_to_4h.bias
  115. base_model.model.base_model.model.transformer.h.7.mlp.dense_4h_to_h.weight
  116. base_model.model.base_model.model.transformer.h.7.mlp.dense_4h_to_h.bias
  117. base_model.model.base_model.model.transformer.h.8.input_layernorm.weight
  118. base_model.model.base_model.model.transformer.h.8.input_layernorm.bias
  119. base_model.model.base_model.model.transformer.h.8.self_attention.query_key_value.weight
  120. base_model.model.base_model.model.transformer.h.8.self_attention.query_key_value.bias
  121. base_model.model.base_model.model.transformer.h.8.self_attention.query_key_value.lora_A.default.weight
  122. base_model.model.base_model.model.transformer.h.8.self_attention.query_key_value.lora_B.default.weight
  123. base_model.model.base_model.model.transformer.h.8.self_attention.dense.weight
  124. base_model.model.base_model.model.transformer.h.8.self_attention.dense.bias
  125. base_model.model.base_model.model.transformer.h.8.post_attention_layernorm.weight
  126. base_model.model.base_model.model.transformer.h.8.post_attention_layernorm.bias
  127. base_model.model.base_model.model.transformer.h.8.mlp.dense_h_to_4h.weight
  128. base_model.model.base_model.model.transformer.h.8.mlp.dense_h_to_4h.bias
  129. base_model.model.base_model.model.transformer.h.8.mlp.dense_4h_to_h.weight
  130. base_model.model.base_model.model.transformer.h.8.mlp.dense_4h_to_h.bias
  131. base_model.model.base_model.model.transformer.h.9.input_layernorm.weight
  132. base_model.model.base_model.model.transformer.h.9.input_layernorm.bias
  133. base_model.model.base_model.model.transformer.h.9.self_attention.query_key_value.weight
  134. base_model.model.base_model.model.transformer.h.9.self_attention.query_key_value.bias
  135. base_model.model.base_model.model.transformer.h.9.self_attention.query_key_value.lora_A.default.weight
  136. base_model.model.base_model.model.transformer.h.9.self_attention.query_key_value.lora_B.default.weight
  137. base_model.model.base_model.model.transformer.h.9.self_attention.dense.weight
  138. base_model.model.base_model.model.transformer.h.9.self_attention.dense.bias
  139. base_model.model.base_model.model.transformer.h.9.post_attention_layernorm.weight
  140. base_model.model.base_model.model.transformer.h.9.post_attention_layernorm.bias
  141. base_model.model.base_model.model.transformer.h.9.mlp.dense_h_to_4h.weight
  142. base_model.model.base_model.model.transformer.h.9.mlp.dense_h_to_4h.bias
  143. base_model.model.base_model.model.transformer.h.9.mlp.dense_4h_to_h.weight
  144. base_model.model.base_model.model.transformer.h.9.mlp.dense_4h_to_h.bias
  145. base_model.model.base_model.model.transformer.h.10.input_layernorm.weight
  146. base_model.model.base_model.model.transformer.h.10.input_layernorm.bias
  147. base_model.model.base_model.model.transformer.h.10.self_attention.query_key_value.weight
  148. base_model.model.base_model.model.transformer.h.10.self_attention.query_key_value.bias
  149. base_model.model.base_model.model.transformer.h.10.self_attention.query_key_value.lora_A.default.weight
  150. base_model.model.base_model.model.transformer.h.10.self_attention.query_key_value.lora_B.default.weight
  151. base_model.model.base_model.model.transformer.h.10.self_attention.dense.weight
  152. base_model.model.base_model.model.transformer.h.10.self_attention.dense.bias
  153. base_model.model.base_model.model.transformer.h.10.post_attention_layernorm.weight
  154. base_model.model.base_model.model.transformer.h.10.post_attention_layernorm.bias
  155. base_model.model.base_model.model.transformer.h.10.mlp.dense_h_to_4h.weight
  156. base_model.model.base_model.model.transformer.h.10.mlp.dense_h_to_4h.bias
  157. base_model.model.base_model.model.transformer.h.10.mlp.dense_4h_to_h.weight
  158. base_model.model.base_model.model.transformer.h.10.mlp.dense_4h_to_h.bias
  159. base_model.model.base_model.model.transformer.h.11.input_layernorm.weight
  160. base_model.model.base_model.model.transformer.h.11.input_layernorm.bias
  161. base_model.model.base_model.model.transformer.h.11.self_attention.query_key_value.weight
  162. base_model.model.base_model.model.transformer.h.11.self_attention.query_key_value.bias
  163. base_model.model.base_model.model.transformer.h.11.self_attention.query_key_value.lora_A.default.weight
  164. base_model.model.base_model.model.transformer.h.11.self_attention.query_key_value.lora_B.default.weight
  165. base_model.model.base_model.model.transformer.h.11.self_attention.dense.weight
  166. base_model.model.base_model.model.transformer.h.11.self_attention.dense.bias
  167. base_model.model.base_model.model.transformer.h.11.post_attention_layernorm.weight
  168. base_model.model.base_model.model.transformer.h.11.post_attention_layernorm.bias
  169. base_model.model.base_model.model.transformer.h.11.mlp.dense_h_to_4h.weight
  170. base_model.model.base_model.model.transformer.h.11.mlp.dense_h_to_4h.bias
  171. base_model.model.base_model.model.transformer.h.11.mlp.dense_4h_to_h.weight
  172. base_model.model.base_model.model.transformer.h.11.mlp.dense_4h_to_h.bias
  173. base_model.model.base_model.model.transformer.h.12.input_layernorm.weight
  174. base_model.model.base_model.model.transformer.h.12.input_layernorm.bias
  175. base_model.model.base_model.model.transformer.h.12.self_attention.query_key_value.weight
  176. base_model.model.base_model.model.transformer.h.12.self_attention.query_key_value.bias
  177. base_model.model.base_model.model.transformer.h.12.self_attention.query_key_value.lora_A.default.weight
  178. base_model.model.base_model.model.transformer.h.12.self_attention.query_key_value.lora_B.default.weight
  179. base_model.model.base_model.model.transformer.h.12.self_attention.dense.weight
  180. base_model.model.base_model.model.transformer.h.12.self_attention.dense.bias
  181. base_model.model.base_model.model.transformer.h.12.post_attention_layernorm.weight
  182. base_model.model.base_model.model.transformer.h.12.post_attention_layernorm.bias
  183. base_model.model.base_model.model.transformer.h.12.mlp.dense_h_to_4h.weight
  184. base_model.model.base_model.model.transformer.h.12.mlp.dense_h_to_4h.bias
  185. base_model.model.base_model.model.transformer.h.12.mlp.dense_4h_to_h.weight
  186. base_model.model.base_model.model.transformer.h.12.mlp.dense_4h_to_h.bias
  187. base_model.model.base_model.model.transformer.h.13.input_layernorm.weight
  188. base_model.model.base_model.model.transformer.h.13.input_layernorm.bias
  189. base_model.model.base_model.model.transformer.h.13.self_attention.query_key_value.weight
  190. base_model.model.base_model.model.transformer.h.13.self_attention.query_key_value.bias
  191. base_model.model.base_model.model.transformer.h.13.self_attention.query_key_value.lora_A.default.weight
  192. base_model.model.base_model.model.transformer.h.13.self_attention.query_key_value.lora_B.default.weight
  193. base_model.model.base_model.model.transformer.h.13.self_attention.dense.weight
  194. base_model.model.base_model.model.transformer.h.13.self_attention.dense.bias
  195. base_model.model.base_model.model.transformer.h.13.post_attention_layernorm.weight
  196. base_model.model.base_model.model.transformer.h.13.post_attention_layernorm.bias
  197. base_model.model.base_model.model.transformer.h.13.mlp.dense_h_to_4h.weight
  198. base_model.model.base_model.model.transformer.h.13.mlp.dense_h_to_4h.bias
  199. base_model.model.base_model.model.transformer.h.13.mlp.dense_4h_to_h.weight
  200. base_model.model.base_model.model.transformer.h.13.mlp.dense_4h_to_h.bias
  201. base_model.model.base_model.model.transformer.h.14.input_layernorm.weight
  202. base_model.model.base_model.model.transformer.h.14.input_layernorm.bias
  203. base_model.model.base_model.model.transformer.h.14.self_attention.query_key_value.weight
  204. base_model.model.base_model.model.transformer.h.14.self_attention.query_key_value.bias
  205. base_model.model.base_model.model.transformer.h.14.self_attention.query_key_value.lora_A.default.weight
  206. base_model.model.base_model.model.transformer.h.14.self_attention.query_key_value.lora_B.default.weight
  207. base_model.model.base_model.model.transformer.h.14.self_attention.dense.weight
  208. base_model.model.base_model.model.transformer.h.14.self_attention.dense.bias
  209. base_model.model.base_model.model.transformer.h.14.post_attention_layernorm.weight
  210. base_model.model.base_model.model.transformer.h.14.post_attention_layernorm.bias
  211. base_model.model.base_model.model.transformer.h.14.mlp.dense_h_to_4h.weight
  212. base_model.model.base_model.model.transformer.h.14.mlp.dense_h_to_4h.bias
  213. base_model.model.base_model.model.transformer.h.14.mlp.dense_4h_to_h.weight
  214. base_model.model.base_model.model.transformer.h.14.mlp.dense_4h_to_h.bias
  215. base_model.model.base_model.model.transformer.h.15.input_layernorm.weight
  216. base_model.model.base_model.model.transformer.h.15.input_layernorm.bias
  217. base_model.model.base_model.model.transformer.h.15.self_attention.query_key_value.weight
  218. base_model.model.base_model.model.transformer.h.15.self_attention.query_key_value.bias
  219. base_model.model.base_model.model.transformer.h.15.self_attention.query_key_value.lora_A.default.weight
  220. base_model.model.base_model.model.transformer.h.15.self_attention.query_key_value.lora_B.default.weight
  221. base_model.model.base_model.model.transformer.h.15.self_attention.dense.weight
  222. base_model.model.base_model.model.transformer.h.15.self_attention.dense.bias
  223. base_model.model.base_model.model.transformer.h.15.post_attention_layernorm.weight
  224. base_model.model.base_model.model.transformer.h.15.post_attention_layernorm.bias
  225. base_model.model.base_model.model.transformer.h.15.mlp.dense_h_to_4h.weight
  226. base_model.model.base_model.model.transformer.h.15.mlp.dense_h_to_4h.bias
  227. base_model.model.base_model.model.transformer.h.15.mlp.dense_4h_to_h.weight
  228. base_model.model.base_model.model.transformer.h.15.mlp.dense_4h_to_h.bias
  229. base_model.model.base_model.model.transformer.h.16.input_layernorm.weight
  230. base_model.model.base_model.model.transformer.h.16.input_layernorm.bias
  231. base_model.model.base_model.model.transformer.h.16.self_attention.query_key_value.weight
  232. base_model.model.base_model.model.transformer.h.16.self_attention.query_key_value.bias
  233. base_model.model.base_model.model.transformer.h.16.self_attention.query_key_value.lora_A.default.weight
  234. base_model.model.base_model.model.transformer.h.16.self_attention.query_key_value.lora_B.default.weight
  235. base_model.model.base_model.model.transformer.h.16.self_attention.dense.weight
  236. base_model.model.base_model.model.transformer.h.16.self_attention.dense.bias
  237. base_model.model.base_model.model.transformer.h.16.post_attention_layernorm.weight
  238. base_model.model.base_model.model.transformer.h.16.post_attention_layernorm.bias
  239. base_model.model.base_model.model.transformer.h.16.mlp.dense_h_to_4h.weight
  240. base_model.model.base_model.model.transformer.h.16.mlp.dense_h_to_4h.bias
  241. base_model.model.base_model.model.transformer.h.16.mlp.dense_4h_to_h.weight
  242. base_model.model.base_model.model.transformer.h.16.mlp.dense_4h_to_h.bias
  243. base_model.model.base_model.model.transformer.h.17.input_layernorm.weight
  244. base_model.model.base_model.model.transformer.h.17.input_layernorm.bias
  245. base_model.model.base_model.model.transformer.h.17.self_attention.query_key_value.weight
  246. base_model.model.base_model.model.transformer.h.17.self_attention.query_key_value.bias
  247. base_model.model.base_model.model.transformer.h.17.self_attention.query_key_value.lora_A.default.weight
  248. base_model.model.base_model.model.transformer.h.17.self_attention.query_key_value.lora_B.default.weight
  249. base_model.model.base_model.model.transformer.h.17.self_attention.dense.weight
  250. base_model.model.base_model.model.transformer.h.17.self_attention.dense.bias
  251. base_model.model.base_model.model.transformer.h.17.post_attention_layernorm.weight
  252. base_model.model.base_model.model.transformer.h.17.post_attention_layernorm.bias
  253. base_model.model.base_model.model.transformer.h.17.mlp.dense_h_to_4h.weight
  254. base_model.model.base_model.model.transformer.h.17.mlp.dense_h_to_4h.bias
  255. base_model.model.base_model.model.transformer.h.17.mlp.dense_4h_to_h.weight
  256. base_model.model.base_model.model.transformer.h.17.mlp.dense_4h_to_h.bias
  257. base_model.model.base_model.model.transformer.h.18.input_layernorm.weight
  258. base_model.model.base_model.model.transformer.h.18.input_layernorm.bias
  259. base_model.model.base_model.model.transformer.h.18.self_attention.query_key_value.weight
  260. base_model.model.base_model.model.transformer.h.18.self_attention.query_key_value.bias
  261. base_model.model.base_model.model.transformer.h.18.self_attention.query_key_value.lora_A.default.weight
  262. base_model.model.base_model.model.transformer.h.18.self_attention.query_key_value.lora_B.default.weight
  263. base_model.model.base_model.model.transformer.h.18.self_attention.dense.weight
  264. base_model.model.base_model.model.transformer.h.18.self_attention.dense.bias
  265. base_model.model.base_model.model.transformer.h.18.post_attention_layernorm.weight
  266. base_model.model.base_model.model.transformer.h.18.post_attention_layernorm.bias
  267. base_model.model.base_model.model.transformer.h.18.mlp.dense_h_to_4h.weight
  268. base_model.model.base_model.model.transformer.h.18.mlp.dense_h_to_4h.bias
  269. base_model.model.base_model.model.transformer.h.18.mlp.dense_4h_to_h.weight
  270. base_model.model.base_model.model.transformer.h.18.mlp.dense_4h_to_h.bias
  271. base_model.model.base_model.model.transformer.h.19.input_layernorm.weight
  272. base_model.model.base_model.model.transformer.h.19.input_layernorm.bias
  273. base_model.model.base_model.model.transformer.h.19.self_attention.query_key_value.weight
  274. base_model.model.base_model.model.transformer.h.19.self_attention.query_key_value.bias
  275. base_model.model.base_model.model.transformer.h.19.self_attention.query_key_value.lora_A.default.weight
  276. base_model.model.base_model.model.transformer.h.19.self_attention.query_key_value.lora_B.default.weight
  277. base_model.model.base_model.model.transformer.h.19.self_attention.dense.weight
  278. base_model.model.base_model.model.transformer.h.19.self_attention.dense.bias
  279. base_model.model.base_model.model.transformer.h.19.post_attention_layernorm.weight
  280. base_model.model.base_model.model.transformer.h.19.post_attention_layernorm.bias
  281. base_model.model.base_model.model.transformer.h.19.mlp.dense_h_to_4h.weight
  282. base_model.model.base_model.model.transformer.h.19.mlp.dense_h_to_4h.bias
  283. base_model.model.base_model.model.transformer.h.19.mlp.dense_4h_to_h.weight
  284. base_model.model.base_model.model.transformer.h.19.mlp.dense_4h_to_h.bias
  285. base_model.model.base_model.model.transformer.h.20.input_layernorm.weight
  286. base_model.model.base_model.model.transformer.h.20.input_layernorm.bias
  287. base_model.model.base_model.model.transformer.h.20.self_attention.query_key_value.weight
  288. base_model.model.base_model.model.transformer.h.20.self_attention.query_key_value.bias
  289. base_model.model.base_model.model.transformer.h.20.self_attention.query_key_value.lora_A.default.weight
  290. base_model.model.base_model.model.transformer.h.20.self_attention.query_key_value.lora_B.default.weight
  291. base_model.model.base_model.model.transformer.h.20.self_attention.dense.weight
  292. base_model.model.base_model.model.transformer.h.20.self_attention.dense.bias
  293. base_model.model.base_model.model.transformer.h.20.post_attention_layernorm.weight
  294. base_model.model.base_model.model.transformer.h.20.post_attention_layernorm.bias
  295. base_model.model.base_model.model.transformer.h.20.mlp.dense_h_to_4h.weight
  296. base_model.model.base_model.model.transformer.h.20.mlp.dense_h_to_4h.bias
  297. base_model.model.base_model.model.transformer.h.20.mlp.dense_4h_to_h.weight
  298. base_model.model.base_model.model.transformer.h.20.mlp.dense_4h_to_h.bias
  299. base_model.model.base_model.model.transformer.h.21.input_layernorm.weight
  300. base_model.model.base_model.model.transformer.h.21.input_layernorm.bias
  301. base_model.model.base_model.model.transformer.h.21.self_attention.query_key_value.weight
  302. base_model.model.base_model.model.transformer.h.21.self_attention.query_key_value.bias
  303. base_model.model.base_model.model.transformer.h.21.self_attention.query_key_value.lora_A.default.weight
  304. base_model.model.base_model.model.transformer.h.21.self_attention.query_key_value.lora_B.default.weight
  305. base_model.model.base_model.model.transformer.h.21.self_attention.dense.weight
  306. base_model.model.base_model.model.transformer.h.21.self_attention.dense.bias
  307. base_model.model.base_model.model.transformer.h.21.post_attention_layernorm.weight
  308. base_model.model.base_model.model.transformer.h.21.post_attention_layernorm.bias
  309. base_model.model.base_model.model.transformer.h.21.mlp.dense_h_to_4h.weight
  310. base_model.model.base_model.model.transformer.h.21.mlp.dense_h_to_4h.bias
  311. base_model.model.base_model.model.transformer.h.21.mlp.dense_4h_to_h.weight
  312. base_model.model.base_model.model.transformer.h.21.mlp.dense_4h_to_h.bias
  313. base_model.model.base_model.model.transformer.h.22.input_layernorm.weight
  314. base_model.model.base_model.model.transformer.h.22.input_layernorm.bias
  315. base_model.model.base_model.model.transformer.h.22.self_attention.query_key_value.weight
  316. base_model.model.base_model.model.transformer.h.22.self_attention.query_key_value.bias
  317. base_model.model.base_model.model.transformer.h.22.self_attention.query_key_value.lora_A.default.weight
  318. base_model.model.base_model.model.transformer.h.22.self_attention.query_key_value.lora_B.default.weight
  319. base_model.model.base_model.model.transformer.h.22.self_attention.dense.weight
  320. base_model.model.base_model.model.transformer.h.22.self_attention.dense.bias
  321. base_model.model.base_model.model.transformer.h.22.post_attention_layernorm.weight
  322. base_model.model.base_model.model.transformer.h.22.post_attention_layernorm.bias
  323. base_model.model.base_model.model.transformer.h.22.mlp.dense_h_to_4h.weight
  324. base_model.model.base_model.model.transformer.h.22.mlp.dense_h_to_4h.bias
  325. base_model.model.base_model.model.transformer.h.22.mlp.dense_4h_to_h.weight
  326. base_model.model.base_model.model.transformer.h.22.mlp.dense_4h_to_h.bias
  327. base_model.model.base_model.model.transformer.h.23.input_layernorm.weight
  328. base_model.model.base_model.model.transformer.h.23.input_layernorm.bias
  329. base_model.model.base_model.model.transformer.h.23.self_attention.query_key_value.weight
  330. base_model.model.base_model.model.transformer.h.23.self_attention.query_key_value.bias
  331. base_model.model.base_model.model.transformer.h.23.self_attention.query_key_value.lora_A.default.weight
  332. base_model.model.base_model.model.transformer.h.23.self_attention.query_key_value.lora_B.default.weight
  333. base_model.model.base_model.model.transformer.h.23.self_attention.dense.weight
  334. base_model.model.base_model.model.transformer.h.23.self_attention.dense.bias
  335. base_model.model.base_model.model.transformer.h.23.post_attention_layernorm.weight
  336. base_model.model.base_model.model.transformer.h.23.post_attention_layernorm.bias
  337. base_model.model.base_model.model.transformer.h.23.mlp.dense_h_to_4h.weight
  338. base_model.model.base_model.model.transformer.h.23.mlp.dense_h_to_4h.bias
  339. base_model.model.base_model.model.transformer.h.23.mlp.dense_4h_to_h.weight
  340. base_model.model.base_model.model.transformer.h.23.mlp.dense_4h_to_h.bias
  341. base_model.model.base_model.model.transformer.ln_f.weight
  342. base_model.model.base_model.model.transformer.ln_f.bias
model.print_trainable_parameters()

trainable params: 96,077,824 || all params: 1,399,189,504 || trainable%: 6.866677010178601

3.5 配置训练参数

  1. args = TrainingArguments(
  2. output_dir="./chatbot",
  3. per_device_train_batch_size=1,
  4. gradient_accumulation_steps=8,
  5. logging_steps=10,
  6. num_train_epochs=1
  7. )

3.6 创建训练器

  1. trainer = Trainer(
  2. model=model,
  3. args=args,
  4. train_dataset=tokenized_ds,
  5. data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
  6. )

3.7 模型训练

trainer.train()

3.8 模型推理

  1. model = model.cuda()
  2. ipt = tokenizer("Human: {}\n{}".format("数学考试有哪些技巧?", "").strip() + "\n\nAssistant: ", return_tensors="pt").to(model.device)
  3. print(tokenizer.decode(model.generate(**ipt, max_length=256, do_sample=True)[0], skip_special_tokens=True))

3.9 加载权重文件并进行权重合并

3.9.1 加载权重文件: 

  1. from transformers import AutoModelForCausalLM, AutoTokenizer
  2. from peft import PeftModel
  3. model = AutoModelForCausalLM.from_pretrained("Langboat/bloom-1b4-zh", low_cpu_mem_usage=True)
  4. tokenizer = AutoTokenizer.from_pretrained("Langboat/bloom-1b4-zh")
  5. p_model = PeftModel.from_pretrained(model, model_id="./chatbot/checkpoint-500/")
  6. p_model
  1. PeftModelForCausalLM(
  2. (base_model): LoraModel(
  3. (model): BloomForCausalLM(
  4. (transformer): BloomModel(
  5. (word_embeddings): Embedding(46145, 2048)
  6. (word_embeddings_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  7. (h): ModuleList(
  8. (0): BloomBlock(
  9. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  10. (self_attention): BloomAttention(
  11. (query_key_value): Linear(
  12. in_features=2048, out_features=6144, bias=True
  13. (lora_dropout): ModuleDict(
  14. (default): Identity()
  15. )
  16. (lora_A): ModuleDict(
  17. (default): Linear(in_features=2048, out_features=8, bias=False)
  18. )
  19. (lora_B): ModuleDict(
  20. (default): Linear(in_features=8, out_features=6144, bias=False)
  21. )
  22. (lora_embedding_A): ParameterDict()
  23. (lora_embedding_B): ParameterDict()
  24. )
  25. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  26. (attention_dropout): Dropout(p=0.0, inplace=False)
  27. )
  28. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  29. (mlp): BloomMLP(
  30. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  31. (gelu_impl): BloomGelu()
  32. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  33. )
  34. )
  35. (1): BloomBlock(
  36. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  37. (self_attention): BloomAttention(
  38. (query_key_value): Linear(
  39. in_features=2048, out_features=6144, bias=True
  40. (lora_dropout): ModuleDict(
  41. (default): Identity()
  42. )
  43. (lora_A): ModuleDict(
  44. (default): Linear(in_features=2048, out_features=8, bias=False)
  45. )
  46. (lora_B): ModuleDict(
  47. (default): Linear(in_features=8, out_features=6144, bias=False)
  48. )
  49. (lora_embedding_A): ParameterDict()
  50. (lora_embedding_B): ParameterDict()
  51. )
  52. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  53. (attention_dropout): Dropout(p=0.0, inplace=False)
  54. )
  55. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  56. (mlp): BloomMLP(
  57. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  58. (gelu_impl): BloomGelu()
  59. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  60. )
  61. )
  62. (2): BloomBlock(
  63. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  64. (self_attention): BloomAttention(
  65. (query_key_value): Linear(
  66. in_features=2048, out_features=6144, bias=True
  67. (lora_dropout): ModuleDict(
  68. (default): Identity()
  69. )
  70. (lora_A): ModuleDict(
  71. (default): Linear(in_features=2048, out_features=8, bias=False)
  72. )
  73. (lora_B): ModuleDict(
  74. (default): Linear(in_features=8, out_features=6144, bias=False)
  75. )
  76. (lora_embedding_A): ParameterDict()
  77. (lora_embedding_B): ParameterDict()
  78. )
  79. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  80. (attention_dropout): Dropout(p=0.0, inplace=False)
  81. )
  82. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  83. (mlp): BloomMLP(
  84. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  85. (gelu_impl): BloomGelu()
  86. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  87. )
  88. )
  89. (3): BloomBlock(
  90. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  91. (self_attention): BloomAttention(
  92. (query_key_value): Linear(
  93. in_features=2048, out_features=6144, bias=True
  94. (lora_dropout): ModuleDict(
  95. (default): Identity()
  96. )
  97. (lora_A): ModuleDict(
  98. (default): Linear(in_features=2048, out_features=8, bias=False)
  99. )
  100. (lora_B): ModuleDict(
  101. (default): Linear(in_features=8, out_features=6144, bias=False)
  102. )
  103. (lora_embedding_A): ParameterDict()
  104. (lora_embedding_B): ParameterDict()
  105. )
  106. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  107. (attention_dropout): Dropout(p=0.0, inplace=False)
  108. )
  109. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  110. (mlp): BloomMLP(
  111. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  112. (gelu_impl): BloomGelu()
  113. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  114. )
  115. )
  116. (4): BloomBlock(
  117. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  118. (self_attention): BloomAttention(
  119. (query_key_value): Linear(
  120. in_features=2048, out_features=6144, bias=True
  121. (lora_dropout): ModuleDict(
  122. (default): Identity()
  123. )
  124. (lora_A): ModuleDict(
  125. (default): Linear(in_features=2048, out_features=8, bias=False)
  126. )
  127. (lora_B): ModuleDict(
  128. (default): Linear(in_features=8, out_features=6144, bias=False)
  129. )
  130. (lora_embedding_A): ParameterDict()
  131. (lora_embedding_B): ParameterDict()
  132. )
  133. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  134. (attention_dropout): Dropout(p=0.0, inplace=False)
  135. )
  136. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  137. (mlp): BloomMLP(
  138. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  139. (gelu_impl): BloomGelu()
  140. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  141. )
  142. )
  143. (5): BloomBlock(
  144. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  145. (self_attention): BloomAttention(
  146. (query_key_value): Linear(
  147. in_features=2048, out_features=6144, bias=True
  148. (lora_dropout): ModuleDict(
  149. (default): Identity()
  150. )
  151. (lora_A): ModuleDict(
  152. (default): Linear(in_features=2048, out_features=8, bias=False)
  153. )
  154. (lora_B): ModuleDict(
  155. (default): Linear(in_features=8, out_features=6144, bias=False)
  156. )
  157. (lora_embedding_A): ParameterDict()
  158. (lora_embedding_B): ParameterDict()
  159. )
  160. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  161. (attention_dropout): Dropout(p=0.0, inplace=False)
  162. )
  163. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  164. (mlp): BloomMLP(
  165. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  166. (gelu_impl): BloomGelu()
  167. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  168. )
  169. )
  170. (6): BloomBlock(
  171. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  172. (self_attention): BloomAttention(
  173. (query_key_value): Linear(
  174. in_features=2048, out_features=6144, bias=True
  175. (lora_dropout): ModuleDict(
  176. (default): Identity()
  177. )
  178. (lora_A): ModuleDict(
  179. (default): Linear(in_features=2048, out_features=8, bias=False)
  180. )
  181. (lora_B): ModuleDict(
  182. (default): Linear(in_features=8, out_features=6144, bias=False)
  183. )
  184. (lora_embedding_A): ParameterDict()
  185. (lora_embedding_B): ParameterDict()
  186. )
  187. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  188. (attention_dropout): Dropout(p=0.0, inplace=False)
  189. )
  190. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  191. (mlp): BloomMLP(
  192. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  193. (gelu_impl): BloomGelu()
  194. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  195. )
  196. )
  197. (7): BloomBlock(
  198. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  199. (self_attention): BloomAttention(
  200. (query_key_value): Linear(
  201. in_features=2048, out_features=6144, bias=True
  202. (lora_dropout): ModuleDict(
  203. (default): Identity()
  204. )
  205. (lora_A): ModuleDict(
  206. (default): Linear(in_features=2048, out_features=8, bias=False)
  207. )
  208. (lora_B): ModuleDict(
  209. (default): Linear(in_features=8, out_features=6144, bias=False)
  210. )
  211. (lora_embedding_A): ParameterDict()
  212. (lora_embedding_B): ParameterDict()
  213. )
  214. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  215. (attention_dropout): Dropout(p=0.0, inplace=False)
  216. )
  217. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  218. (mlp): BloomMLP(
  219. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  220. (gelu_impl): BloomGelu()
  221. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  222. )
  223. )
  224. (8): BloomBlock(
  225. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  226. (self_attention): BloomAttention(
  227. (query_key_value): Linear(
  228. in_features=2048, out_features=6144, bias=True
  229. (lora_dropout): ModuleDict(
  230. (default): Identity()
  231. )
  232. (lora_A): ModuleDict(
  233. (default): Linear(in_features=2048, out_features=8, bias=False)
  234. )
  235. (lora_B): ModuleDict(
  236. (default): Linear(in_features=8, out_features=6144, bias=False)
  237. )
  238. (lora_embedding_A): ParameterDict()
  239. (lora_embedding_B): ParameterDict()
  240. )
  241. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  242. (attention_dropout): Dropout(p=0.0, inplace=False)
  243. )
  244. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  245. (mlp): BloomMLP(
  246. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  247. (gelu_impl): BloomGelu()
  248. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  249. )
  250. )
  251. (9): BloomBlock(
  252. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  253. (self_attention): BloomAttention(
  254. (query_key_value): Linear(
  255. in_features=2048, out_features=6144, bias=True
  256. (lora_dropout): ModuleDict(
  257. (default): Identity()
  258. )
  259. (lora_A): ModuleDict(
  260. (default): Linear(in_features=2048, out_features=8, bias=False)
  261. )
  262. (lora_B): ModuleDict(
  263. (default): Linear(in_features=8, out_features=6144, bias=False)
  264. )
  265. (lora_embedding_A): ParameterDict()
  266. (lora_embedding_B): ParameterDict()
  267. )
  268. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  269. (attention_dropout): Dropout(p=0.0, inplace=False)
  270. )
  271. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  272. (mlp): BloomMLP(
  273. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  274. (gelu_impl): BloomGelu()
  275. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  276. )
  277. )
  278. (10): BloomBlock(
  279. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  280. (self_attention): BloomAttention(
  281. (query_key_value): Linear(
  282. in_features=2048, out_features=6144, bias=True
  283. (lora_dropout): ModuleDict(
  284. (default): Identity()
  285. )
  286. (lora_A): ModuleDict(
  287. (default): Linear(in_features=2048, out_features=8, bias=False)
  288. )
  289. (lora_B): ModuleDict(
  290. (default): Linear(in_features=8, out_features=6144, bias=False)
  291. )
  292. (lora_embedding_A): ParameterDict()
  293. (lora_embedding_B): ParameterDict()
  294. )
  295. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  296. (attention_dropout): Dropout(p=0.0, inplace=False)
  297. )
  298. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  299. (mlp): BloomMLP(
  300. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  301. (gelu_impl): BloomGelu()
  302. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  303. )
  304. )
  305. (11): BloomBlock(
  306. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  307. (self_attention): BloomAttention(
  308. (query_key_value): Linear(
  309. in_features=2048, out_features=6144, bias=True
  310. (lora_dropout): ModuleDict(
  311. (default): Identity()
  312. )
  313. (lora_A): ModuleDict(
  314. (default): Linear(in_features=2048, out_features=8, bias=False)
  315. )
  316. (lora_B): ModuleDict(
  317. (default): Linear(in_features=8, out_features=6144, bias=False)
  318. )
  319. (lora_embedding_A): ParameterDict()
  320. (lora_embedding_B): ParameterDict()
  321. )
  322. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  323. (attention_dropout): Dropout(p=0.0, inplace=False)
  324. )
  325. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  326. (mlp): BloomMLP(
  327. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  328. (gelu_impl): BloomGelu()
  329. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  330. )
  331. )
  332. (12): BloomBlock(
  333. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  334. (self_attention): BloomAttention(
  335. (query_key_value): Linear(
  336. in_features=2048, out_features=6144, bias=True
  337. (lora_dropout): ModuleDict(
  338. (default): Identity()
  339. )
  340. (lora_A): ModuleDict(
  341. (default): Linear(in_features=2048, out_features=8, bias=False)
  342. )
  343. (lora_B): ModuleDict(
  344. (default): Linear(in_features=8, out_features=6144, bias=False)
  345. )
  346. (lora_embedding_A): ParameterDict()
  347. (lora_embedding_B): ParameterDict()
  348. )
  349. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  350. (attention_dropout): Dropout(p=0.0, inplace=False)
  351. )
  352. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  353. (mlp): BloomMLP(
  354. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  355. (gelu_impl): BloomGelu()
  356. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  357. )
  358. )
  359. (13): BloomBlock(
  360. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  361. (self_attention): BloomAttention(
  362. (query_key_value): Linear(
  363. in_features=2048, out_features=6144, bias=True
  364. (lora_dropout): ModuleDict(
  365. (default): Identity()
  366. )
  367. (lora_A): ModuleDict(
  368. (default): Linear(in_features=2048, out_features=8, bias=False)
  369. )
  370. (lora_B): ModuleDict(
  371. (default): Linear(in_features=8, out_features=6144, bias=False)
  372. )
  373. (lora_embedding_A): ParameterDict()
  374. (lora_embedding_B): ParameterDict()
  375. )
  376. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  377. (attention_dropout): Dropout(p=0.0, inplace=False)
  378. )
  379. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  380. (mlp): BloomMLP(
  381. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  382. (gelu_impl): BloomGelu()
  383. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  384. )
  385. )
  386. (14): BloomBlock(
  387. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  388. (self_attention): BloomAttention(
  389. (query_key_value): Linear(
  390. in_features=2048, out_features=6144, bias=True
  391. (lora_dropout): ModuleDict(
  392. (default): Identity()
  393. )
  394. (lora_A): ModuleDict(
  395. (default): Linear(in_features=2048, out_features=8, bias=False)
  396. )
  397. (lora_B): ModuleDict(
  398. (default): Linear(in_features=8, out_features=6144, bias=False)
  399. )
  400. (lora_embedding_A): ParameterDict()
  401. (lora_embedding_B): ParameterDict()
  402. )
  403. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  404. (attention_dropout): Dropout(p=0.0, inplace=False)
  405. )
  406. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  407. (mlp): BloomMLP(
  408. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  409. (gelu_impl): BloomGelu()
  410. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  411. )
  412. )
  413. (15): BloomBlock(
  414. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  415. (self_attention): BloomAttention(
  416. (query_key_value): Linear(
  417. in_features=2048, out_features=6144, bias=True
  418. (lora_dropout): ModuleDict(
  419. (default): Identity()
  420. )
  421. (lora_A): ModuleDict(
  422. (default): Linear(in_features=2048, out_features=8, bias=False)
  423. )
  424. (lora_B): ModuleDict(
  425. (default): Linear(in_features=8, out_features=6144, bias=False)
  426. )
  427. (lora_embedding_A): ParameterDict()
  428. (lora_embedding_B): ParameterDict()
  429. )
  430. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  431. (attention_dropout): Dropout(p=0.0, inplace=False)
  432. )
  433. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  434. (mlp): BloomMLP(
  435. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  436. (gelu_impl): BloomGelu()
  437. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  438. )
  439. )
  440. (16): BloomBlock(
  441. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  442. (self_attention): BloomAttention(
  443. (query_key_value): Linear(
  444. in_features=2048, out_features=6144, bias=True
  445. (lora_dropout): ModuleDict(
  446. (default): Identity()
  447. )
  448. (lora_A): ModuleDict(
  449. (default): Linear(in_features=2048, out_features=8, bias=False)
  450. )
  451. (lora_B): ModuleDict(
  452. (default): Linear(in_features=8, out_features=6144, bias=False)
  453. )
  454. (lora_embedding_A): ParameterDict()
  455. (lora_embedding_B): ParameterDict()
  456. )
  457. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  458. (attention_dropout): Dropout(p=0.0, inplace=False)
  459. )
  460. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  461. (mlp): BloomMLP(
  462. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  463. (gelu_impl): BloomGelu()
  464. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  465. )
  466. )
  467. (17): BloomBlock(
  468. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  469. (self_attention): BloomAttention(
  470. (query_key_value): Linear(
  471. in_features=2048, out_features=6144, bias=True
  472. (lora_dropout): ModuleDict(
  473. (default): Identity()
  474. )
  475. (lora_A): ModuleDict(
  476. (default): Linear(in_features=2048, out_features=8, bias=False)
  477. )
  478. (lora_B): ModuleDict(
  479. (default): Linear(in_features=8, out_features=6144, bias=False)
  480. )
  481. (lora_embedding_A): ParameterDict()
  482. (lora_embedding_B): ParameterDict()
  483. )
  484. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  485. (attention_dropout): Dropout(p=0.0, inplace=False)
  486. )
  487. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  488. (mlp): BloomMLP(
  489. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  490. (gelu_impl): BloomGelu()
  491. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  492. )
  493. )
  494. (18): BloomBlock(
  495. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  496. (self_attention): BloomAttention(
  497. (query_key_value): Linear(
  498. in_features=2048, out_features=6144, bias=True
  499. (lora_dropout): ModuleDict(
  500. (default): Identity()
  501. )
  502. (lora_A): ModuleDict(
  503. (default): Linear(in_features=2048, out_features=8, bias=False)
  504. )
  505. (lora_B): ModuleDict(
  506. (default): Linear(in_features=8, out_features=6144, bias=False)
  507. )
  508. (lora_embedding_A): ParameterDict()
  509. (lora_embedding_B): ParameterDict()
  510. )
  511. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  512. (attention_dropout): Dropout(p=0.0, inplace=False)
  513. )
  514. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  515. (mlp): BloomMLP(
  516. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  517. (gelu_impl): BloomGelu()
  518. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  519. )
  520. )
  521. (19): BloomBlock(
  522. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  523. (self_attention): BloomAttention(
  524. (query_key_value): Linear(
  525. in_features=2048, out_features=6144, bias=True
  526. (lora_dropout): ModuleDict(
  527. (default): Identity()
  528. )
  529. (lora_A): ModuleDict(
  530. (default): Linear(in_features=2048, out_features=8, bias=False)
  531. )
  532. (lora_B): ModuleDict(
  533. (default): Linear(in_features=8, out_features=6144, bias=False)
  534. )
  535. (lora_embedding_A): ParameterDict()
  536. (lora_embedding_B): ParameterDict()
  537. )
  538. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  539. (attention_dropout): Dropout(p=0.0, inplace=False)
  540. )
  541. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  542. (mlp): BloomMLP(
  543. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  544. (gelu_impl): BloomGelu()
  545. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  546. )
  547. )
  548. (20): BloomBlock(
  549. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  550. (self_attention): BloomAttention(
  551. (query_key_value): Linear(
  552. in_features=2048, out_features=6144, bias=True
  553. (lora_dropout): ModuleDict(
  554. (default): Identity()
  555. )
  556. (lora_A): ModuleDict(
  557. (default): Linear(in_features=2048, out_features=8, bias=False)
  558. )
  559. (lora_B): ModuleDict(
  560. (default): Linear(in_features=8, out_features=6144, bias=False)
  561. )
  562. (lora_embedding_A): ParameterDict()
  563. (lora_embedding_B): ParameterDict()
  564. )
  565. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  566. (attention_dropout): Dropout(p=0.0, inplace=False)
  567. )
  568. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  569. (mlp): BloomMLP(
  570. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  571. (gelu_impl): BloomGelu()
  572. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  573. )
  574. )
  575. (21): BloomBlock(
  576. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  577. (self_attention): BloomAttention(
  578. (query_key_value): Linear(
  579. in_features=2048, out_features=6144, bias=True
  580. (lora_dropout): ModuleDict(
  581. (default): Identity()
  582. )
  583. (lora_A): ModuleDict(
  584. (default): Linear(in_features=2048, out_features=8, bias=False)
  585. )
  586. (lora_B): ModuleDict(
  587. (default): Linear(in_features=8, out_features=6144, bias=False)
  588. )
  589. (lora_embedding_A): ParameterDict()
  590. (lora_embedding_B): ParameterDict()
  591. )
  592. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  593. (attention_dropout): Dropout(p=0.0, inplace=False)
  594. )
  595. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  596. (mlp): BloomMLP(
  597. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  598. (gelu_impl): BloomGelu()
  599. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  600. )
  601. )
  602. (22): BloomBlock(
  603. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  604. (self_attention): BloomAttention(
  605. (query_key_value): Linear(
  606. in_features=2048, out_features=6144, bias=True
  607. (lora_dropout): ModuleDict(
  608. (default): Identity()
  609. )
  610. (lora_A): ModuleDict(
  611. (default): Linear(in_features=2048, out_features=8, bias=False)
  612. )
  613. (lora_B): ModuleDict(
  614. (default): Linear(in_features=8, out_features=6144, bias=False)
  615. )
  616. (lora_embedding_A): ParameterDict()
  617. (lora_embedding_B): ParameterDict()
  618. )
  619. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  620. (attention_dropout): Dropout(p=0.0, inplace=False)
  621. )
  622. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  623. (mlp): BloomMLP(
  624. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  625. (gelu_impl): BloomGelu()
  626. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  627. )
  628. )
  629. (23): BloomBlock(
  630. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  631. (self_attention): BloomAttention(
  632. (query_key_value): Linear(
  633. in_features=2048, out_features=6144, bias=True
  634. (lora_dropout): ModuleDict(
  635. (default): Identity()
  636. )
  637. (lora_A): ModuleDict(
  638. (default): Linear(in_features=2048, out_features=8, bias=False)
  639. )
  640. (lora_B): ModuleDict(
  641. (default): Linear(in_features=8, out_features=6144, bias=False)
  642. )
  643. (lora_embedding_A): ParameterDict()
  644. (lora_embedding_B): ParameterDict()
  645. )
  646. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  647. (attention_dropout): Dropout(p=0.0, inplace=False)
  648. )
  649. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  650. (mlp): BloomMLP(
  651. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  652. (gelu_impl): BloomGelu()
  653. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  654. )
  655. )
  656. )
  657. (ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  658. )
  659. (lm_head): Linear(in_features=2048, out_features=46145, bias=False)
  660. )
  661. )
  662. )

3.9.2 进行权重合并:

  1. merge_model = p_model.merge_and_unload()
  2. merge_model

微调前:
模型 = A(原始权重)

微调中:
模型 = A + B(LoRA 微调权重)

微调结束后,调用 merge_and_unload():
模型 = C(合并后的权重,C = A + B)
不再需要 LoRA 相关的 B 层

merge_model = p_model.merge_and_unload() 这段代码涉及 LoRA(Low-Rank Adaptation) 微调方法中的模型合并操作。具体来说,它的作用是将微调的 LoRA 权重与原始模型的权重合并,最终得到一个完整的模型,并卸载不再需要的 LoRA 层。这在微调完成后,是一种简化模型结构、减少额外内存占用的方式。

  1. BloomForCausalLM(
  2. (transformer): BloomModel(
  3. (word_embeddings): Embedding(46145, 2048)
  4. (word_embeddings_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  5. (h): ModuleList(
  6. (0): BloomBlock(
  7. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  8. (self_attention): BloomAttention(
  9. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  10. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  11. (attention_dropout): Dropout(p=0.0, inplace=False)
  12. )
  13. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  14. (mlp): BloomMLP(
  15. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  16. (gelu_impl): BloomGelu()
  17. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  18. )
  19. )
  20. (1): BloomBlock(
  21. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  22. (self_attention): BloomAttention(
  23. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  24. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  25. (attention_dropout): Dropout(p=0.0, inplace=False)
  26. )
  27. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  28. (mlp): BloomMLP(
  29. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  30. (gelu_impl): BloomGelu()
  31. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  32. )
  33. )
  34. (2): BloomBlock(
  35. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  36. (self_attention): BloomAttention(
  37. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  38. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  39. (attention_dropout): Dropout(p=0.0, inplace=False)
  40. )
  41. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  42. (mlp): BloomMLP(
  43. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  44. (gelu_impl): BloomGelu()
  45. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  46. )
  47. )
  48. (3): BloomBlock(
  49. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  50. (self_attention): BloomAttention(
  51. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  52. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  53. (attention_dropout): Dropout(p=0.0, inplace=False)
  54. )
  55. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  56. (mlp): BloomMLP(
  57. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  58. (gelu_impl): BloomGelu()
  59. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  60. )
  61. )
  62. (4): BloomBlock(
  63. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  64. (self_attention): BloomAttention(
  65. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  66. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  67. (attention_dropout): Dropout(p=0.0, inplace=False)
  68. )
  69. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  70. (mlp): BloomMLP(
  71. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  72. (gelu_impl): BloomGelu()
  73. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  74. )
  75. )
  76. (5): BloomBlock(
  77. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  78. (self_attention): BloomAttention(
  79. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  80. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  81. (attention_dropout): Dropout(p=0.0, inplace=False)
  82. )
  83. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  84. (mlp): BloomMLP(
  85. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  86. (gelu_impl): BloomGelu()
  87. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  88. )
  89. )
  90. (6): BloomBlock(
  91. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  92. (self_attention): BloomAttention(
  93. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  94. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  95. (attention_dropout): Dropout(p=0.0, inplace=False)
  96. )
  97. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  98. (mlp): BloomMLP(
  99. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  100. (gelu_impl): BloomGelu()
  101. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  102. )
  103. )
  104. (7): BloomBlock(
  105. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  106. (self_attention): BloomAttention(
  107. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  108. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  109. (attention_dropout): Dropout(p=0.0, inplace=False)
  110. )
  111. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  112. (mlp): BloomMLP(
  113. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  114. (gelu_impl): BloomGelu()
  115. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  116. )
  117. )
  118. (8): BloomBlock(
  119. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  120. (self_attention): BloomAttention(
  121. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  122. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  123. (attention_dropout): Dropout(p=0.0, inplace=False)
  124. )
  125. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  126. (mlp): BloomMLP(
  127. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  128. (gelu_impl): BloomGelu()
  129. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  130. )
  131. )
  132. (9): BloomBlock(
  133. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  134. (self_attention): BloomAttention(
  135. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  136. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  137. (attention_dropout): Dropout(p=0.0, inplace=False)
  138. )
  139. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  140. (mlp): BloomMLP(
  141. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  142. (gelu_impl): BloomGelu()
  143. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  144. )
  145. )
  146. (10): BloomBlock(
  147. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  148. (self_attention): BloomAttention(
  149. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  150. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  151. (attention_dropout): Dropout(p=0.0, inplace=False)
  152. )
  153. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  154. (mlp): BloomMLP(
  155. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  156. (gelu_impl): BloomGelu()
  157. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  158. )
  159. )
  160. (11): BloomBlock(
  161. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  162. (self_attention): BloomAttention(
  163. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  164. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  165. (attention_dropout): Dropout(p=0.0, inplace=False)
  166. )
  167. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  168. (mlp): BloomMLP(
  169. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  170. (gelu_impl): BloomGelu()
  171. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  172. )
  173. )
  174. (12): BloomBlock(
  175. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  176. (self_attention): BloomAttention(
  177. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  178. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  179. (attention_dropout): Dropout(p=0.0, inplace=False)
  180. )
  181. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  182. (mlp): BloomMLP(
  183. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  184. (gelu_impl): BloomGelu()
  185. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  186. )
  187. )
  188. (13): BloomBlock(
  189. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  190. (self_attention): BloomAttention(
  191. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  192. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  193. (attention_dropout): Dropout(p=0.0, inplace=False)
  194. )
  195. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  196. (mlp): BloomMLP(
  197. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  198. (gelu_impl): BloomGelu()
  199. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  200. )
  201. )
  202. (14): BloomBlock(
  203. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  204. (self_attention): BloomAttention(
  205. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  206. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  207. (attention_dropout): Dropout(p=0.0, inplace=False)
  208. )
  209. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  210. (mlp): BloomMLP(
  211. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  212. (gelu_impl): BloomGelu()
  213. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  214. )
  215. )
  216. (15): BloomBlock(
  217. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  218. (self_attention): BloomAttention(
  219. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  220. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  221. (attention_dropout): Dropout(p=0.0, inplace=False)
  222. )
  223. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  224. (mlp): BloomMLP(
  225. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  226. (gelu_impl): BloomGelu()
  227. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  228. )
  229. )
  230. (16): BloomBlock(
  231. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  232. (self_attention): BloomAttention(
  233. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  234. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  235. (attention_dropout): Dropout(p=0.0, inplace=False)
  236. )
  237. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  238. (mlp): BloomMLP(
  239. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  240. (gelu_impl): BloomGelu()
  241. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  242. )
  243. )
  244. (17): BloomBlock(
  245. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  246. (self_attention): BloomAttention(
  247. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  248. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  249. (attention_dropout): Dropout(p=0.0, inplace=False)
  250. )
  251. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  252. (mlp): BloomMLP(
  253. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  254. (gelu_impl): BloomGelu()
  255. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  256. )
  257. )
  258. (18): BloomBlock(
  259. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  260. (self_attention): BloomAttention(
  261. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  262. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  263. (attention_dropout): Dropout(p=0.0, inplace=False)
  264. )
  265. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  266. (mlp): BloomMLP(
  267. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  268. (gelu_impl): BloomGelu()
  269. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  270. )
  271. )
  272. (19): BloomBlock(
  273. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  274. (self_attention): BloomAttention(
  275. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  276. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  277. (attention_dropout): Dropout(p=0.0, inplace=False)
  278. )
  279. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  280. (mlp): BloomMLP(
  281. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  282. (gelu_impl): BloomGelu()
  283. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  284. )
  285. )
  286. (20): BloomBlock(
  287. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  288. (self_attention): BloomAttention(
  289. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  290. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  291. (attention_dropout): Dropout(p=0.0, inplace=False)
  292. )
  293. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  294. (mlp): BloomMLP(
  295. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  296. (gelu_impl): BloomGelu()
  297. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  298. )
  299. )
  300. (21): BloomBlock(
  301. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  302. (self_attention): BloomAttention(
  303. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  304. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  305. (attention_dropout): Dropout(p=0.0, inplace=False)
  306. )
  307. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  308. (mlp): BloomMLP(
  309. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  310. (gelu_impl): BloomGelu()
  311. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  312. )
  313. )
  314. (22): BloomBlock(
  315. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  316. (self_attention): BloomAttention(
  317. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  318. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  319. (attention_dropout): Dropout(p=0.0, inplace=False)
  320. )
  321. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  322. (mlp): BloomMLP(
  323. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  324. (gelu_impl): BloomGelu()
  325. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  326. )
  327. )
  328. (23): BloomBlock(
  329. (input_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  330. (self_attention): BloomAttention(
  331. (query_key_value): Linear(in_features=2048, out_features=6144, bias=True)
  332. (dense): Linear(in_features=2048, out_features=2048, bias=True)
  333. (attention_dropout): Dropout(p=0.0, inplace=False)
  334. )
  335. (post_attention_layernorm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  336. (mlp): BloomMLP(
  337. (dense_h_to_4h): Linear(in_features=2048, out_features=8192, bias=True)
  338. (gelu_impl): BloomGelu()
  339. (dense_4h_to_h): Linear(in_features=8192, out_features=2048, bias=True)
  340. )
  341. )
  342. )
  343. (ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  344. )
  345. (lm_head): Linear(in_features=2048, out_features=46145, bias=False)
  346. )
  1. ipt = tokenizer("Human: {}\n{}".format("考试有哪些技巧?", "").strip() + "\n\nAssistant: ", return_tensors="pt")
  2. tokenizer.decode(merge_model.generate(**ipt, do_sample=False)[0], skip_special_tokens=True)

3.10 完整保存模型

merge_model.save_pretrained("./chatbot/merge_model")
注:本文转载自blog.csdn.net的笨笨sg的文章"https://blog.csdn.net/a131529/article/details/142963913"。版权归原作者所有,此博客不拥有其著作权,亦不承担相应法律责任。如有侵权,请联系我们删除。
复制链接
复制链接
相关推荐
发表评论
登录后才能发表评论和回复 注册

/ 登录

评论记录:

未查询到任何数据!
回复评论:

分类栏目

后端 (14832) 前端 (14280) 移动开发 (3760) 编程语言 (3851) Java (3904) Python (3298) 人工智能 (10119) AIGC (2810) 大数据 (3499) 数据库 (3945) 数据结构与算法 (3757) 音视频 (2669) 云原生 (3145) 云平台 (2965) 前沿技术 (2993) 开源 (2160) 小程序 (2860) 运维 (2533) 服务器 (2698) 操作系统 (2325) 硬件开发 (2492) 嵌入式 (2955) 微软技术 (2769) 软件工程 (2056) 测试 (2865) 网络空间安全 (2948) 网络与通信 (2797) 用户体验设计 (2592) 学习和成长 (2593) 搜索 (2744) 开发工具 (7108) 游戏 (2829) HarmonyOS (2935) 区块链 (2782) 数学 (3112) 3C硬件 (2759) 资讯 (2909) Android (4709) iOS (1850) 代码人生 (3043) 阅读 (2841)

热门文章

101
推荐
关于我们 隐私政策 免责声明 联系我们
Copyright © 2020-2025 蚁人论坛 (iYenn.com) All Rights Reserved.
Scroll to Top