Hadoop系列文章目录
1、hadoop3.1.4简单介绍及部署、简单验证
2、HDFS操作 - shell客户端
3、HDFS的使用(读写、上传、下载、遍历、查找文件、整个目录拷贝、只拷贝文件、列出文件夹下文件、删除文件及目录、获取文件及文件夹属性等)-java
4、HDFS-java操作类HDFSUtil及junit测试(HDFS的常见操作以及HA环境的配置)
5、HDFS API的RESTful风格–WebHDFS
6、HDFS的HttpFS-代理服务
7、大数据中常见的文件存储格式以及hadoop中支持的压缩算法
8、HDFS内存存储策略支持和“冷热温”存储
9、hadoop高可用HA集群部署及三种方式验证
10、HDFS小文件解决方案–Archive
11、hadoop环境下的Sequence File的读写与合并
12、HDFS Trash垃圾桶回收介绍与示例
13、HDFS Snapshot快照
14、HDFS 透明加密KMS
15、MapReduce介绍及wordcount
16、MapReduce的基本用法示例-自定义序列化、排序、分区、分组和topN
17、MapReduce的分区Partition介绍
18、MapReduce的计数器与通过MapReduce读取/写入数据库示例
19、Join操作map side join 和 reduce side join
20、MapReduce 工作流介绍
21、MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件
22、MapReduce使用Gzip压缩、Snappy压缩和Lzo压缩算法写文件和读取相应的文件
23、hadoop集群中yarn运行mapreduce的内存、CPU分配调度计算与优化
一、介绍
1、快照介绍
快照(Snapshot)是数据存储的某一时刻的状态记录;与备份不同,备份(Backup)则是数据存储的某一个时刻的副本。
HDFS Snapshot快照是整个文件系统或某个目录在某个时刻的镜像。
该镜像并不会随着源目录的改变而进行动态的更新。
2、快照作用
- 数据恢复
对重要目录进行创建snapshot的操作,当用户误操作时,可以通过snapshot来进行相关的恢复操作 - 数据备份
使用snapshot来进行整个集群或者某些目录、文件的备份
管理员以某个时刻的snapshot作为备份的起始结点,然后通过比较不同备份之间差异性,来进行增量备份 - 数据测试
可以临时的为用户针对要操作的数据来创建一个snapshot,然后让用户在对应的snapshot上进行相关的实验和测试
3、HDFS快照功能
- HDFS快照不是数据的简单拷贝,只做差异的记录
- 对于大多不变的数据,你所看到的数据其实是当前物理路径所指的内容,而发生变更的inode数据才会被快照额外拷贝,也就是所说的差异拷贝
- inode指索引节点,用来存放文件及目录的基本信息,包含时间、名称、拥有者、所在组等
- HDFS快照不会复制datanode中的块,只记录了块列表和文件大小
- HDFS快照不会对常规HDFS操作产生不利影响,修改记录按逆时针顺序进行,因此可以直接访问当前数据。通过从当前数据中减去修改来计算快照数据
二、使用及验证
1、快照功能开启
HDFS中可以针对整个文件系统或者某个目录创建快照,但是创建快照的前提是相应的目录开启快照的功能。如果针对没有启动快照功能的目录创建快照则会报错。
#启用快照功能:
hdfs dfsadmin -allowSnapshot /testsnapshot
[alanchan@server1 ~]$ hadoop fs -ls /testsnapshot
[alanchan@server1 ~]$ hadoop fs -createSnapshot /testsnapshot test_snapshot
createSnapshot: Directory is not a snapshottable directory: /testsnapshot
[alanchan@server1 ~]$ hdfs dfsadmin -allowSnapshot /testsnapshot
Allowing snapshot on /testsnapshot succeeded
[alanchan@server1 ~]$ hadoop fs -createSnapshot /testsnapshot test_snapshot
Created snapshot /testsnapshot/.snapshot/test_snapshot
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
2、快照功能禁用
HDFS中可以针对已经开启快照功能的目录进行禁用。禁用的前提是该目录的所有快照已经被删除。
#禁用快照功能:
hdfs dfsadmin -disallowSnapshot /testsnapshot
- 1
- 2
3、快照操作相关命令
createSnapshot 创建快照
deleteSnapshot 删除快照
renameSnapshot 重命名快照
lsSnapshottableDir 列出可以快照目录列表
snapshotDiff 获取快照差异报告
[alanchan@server1 ~]$ hadoop dfs
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.
Usage: hadoop fs [generic options]
...
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
...
command [genericOptions] [commandOptions]
[alanchan@server1 ~]# hdfs lsSnapshottableDir
[alanchan@server1 ~]# hdfs snapshotDiff
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
4、HDFS Snapshot使用
1)、开启指定目录快照功能
[alanchan@server1 ~]$ hdfs dfsadmin -allowSnapshot /testsnapshot
Allowing snapshot on /testsnapshot succeeded
- 1
- 2
2)、创建快照
#系统自动生成快照名称
hdfs dfs -createSnapshot /testsnapshot
#指定名称创建快照
hdfs dfs -createSnapshot /testsnapshot test_snapshot
[alanchan@server1 ~]$ hadoop fs -createSnapshot /testsnapshot test_snapshot
Created snapshot /testsnapshot/.snapshot/test_snapshot
- 1
- 2
- 3
- 4
- 5
- 6
- 7
3)、Web页面浏览快照
4)、重命名快照
#重命名快照
hdfs dfs -renameSnapshot /testsnapshot test_snapshot rename_test_snapshot
[alanchan@server1 ~]$ hdfs dfs -renameSnapshot /testsnapshot test_snapshot rename_test_snapshot
[alanchan@server1 ~]$ hadoop fs -ls /testsnapshot/.snapshot
Found 1 items
drwxr-xr-x - alanchan supergroup 0 2022-09-13 09:38 /testsnapshot/.snapshot/rename_test_snapshot
- 1
- 2
- 3
- 4
- 5
- 6
- 7
5)、列出HDFS集群所有开启快照功能的目录
#列出HDFS集群所有开启快照功能的目录
hdfs lsSnapshottableDir
[alanchan@server1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 alanchan supergroup 0 2022-09-13 09:38 1 65536 /testsnapshot
- 1
- 2
- 3
- 4
- 5
6)、快照间差异比较
准备快照
hdfs dfs -createSnapshot /testsnapshot snapshot1
#1、在当前文件夹下创建一个快照snapshot1
[alanchan@server1 ~]$ hdfs dfs -createSnapshot /testsnapshot snapshot1
Created snapshot /testsnapshot/.snapshot/snapshot1
#2、在当前文件夹下上传一个文件,并再创建一个快照snapshot2
[alanchan@server1 ~]$ cd /usr/local/bigdata/hadoop-3.1.4
[alanchan@server1 hadoop-3.1.4]$ echo 12345 >> 1.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -put 1.txt /testsnapshot
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -ls /testsnapshot
Found 1 items
-rw-r--r-- 3 alanchan supergroup 6 2022-09-13 10:05 /testsnapshot/1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot2
Created snapshot /testsnapshot/.snapshot/snapshot2
#3、修改文件内容后,再创建一个快照snapshot3
[alanchan@server1 hadoop-3.1.4]$ echo 54321 >> 2.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -appendToFile 2.txt /testsnapshot/1.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -cat /testsnapshot/1.txt
12345
54321
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot3
Created snapshot /testsnapshot/.snapshot/snapshot3
#4、上传一个文件,再创建一个快照snapshot4
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -put README.txt /testsnapshot
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot4
Created snapshot /testsnapshot/.snapshot/snapshot4
#5、删除一个文件后,再创建一个快照snapshot5
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -rm /testsnapshot/README.txt
2022-09-13 10:09:31,863 INFO fs.TrashPolicyDefault: Moved: 'hdfs://HadoopHAcluster/testsnapshot/README.txt' to trash at: hdfs://HadoopHAcluster/user/alanchan/.Trash/Current/testsnapshot/README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot5
Created snapshot /testsnapshot/.snapshot/snapshot5
[alanchan@server1 hadoop-3.1.4]$
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
7)、比较快照差异
+ The file/directory has been created.
- The file/directory has been deleted.
M The file/directory has been modified.
R The file/directory has been renamed.
- 1
- 2
- 3
- 4
#比较指定目录两个版本快照之间的差异
hdfs snapshotDiff /testsnapshot snapshot1 snapshot2
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot2
Difference between snapshot snapshot1 and snapshot snapshot2 under directory /testsnapshot:
M .
+ ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot3
Difference between snapshot snapshot1 and snapshot snapshot3 under directory /testsnapshot:
M .
+ ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot4
Difference between snapshot snapshot1 and snapshot snapshot4 under directory /testsnapshot:
M .
+ ./1.txt
+ ./README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot5
Difference between snapshot snapshot1 and snapshot snapshot5 under directory /testsnapshot:
M .
+ ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot3
Difference between snapshot snapshot2 and snapshot snapshot3 under directory /testsnapshot:
M ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot4
Difference between snapshot snapshot2 and snapshot snapshot4 under directory /testsnapshot:
M .
+ ./README.txt
M ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot5
Difference between snapshot snapshot2 and snapshot snapshot5 under directory /testsnapshot:
M ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot3 snapshot4
Difference between snapshot snapshot3 and snapshot snapshot4 under directory /testsnapshot:
M .
+ ./README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot3 snapshot5
Difference between snapshot snapshot3 and snapshot snapshot5 under directory /testsnapshot:
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot4 snapshot5
Difference between snapshot snapshot4 and snapshot snapshot5 under directory /testsnapshot:
M .
- ./README.txt
[alanchan@server1 hadoop-3.1.4]$
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
8)、删除快照
hdfs dfs -deleteSnapshot /testsnapshot rename_test_snapshot
- 1
9)、删除开启快照功能的目录
hadoop fs -rm -r /allenwoon
拥有快照的目录不允许被删除,某种程度上也保护了文件安全
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -rm -r /testsnapshot
rm: Failed to move to trash: hdfs://HadoopHAcluster/testsnapshot: The directory /testsnapshot cannot be deleted since /testsnapshot is snapshottable and already has snapshots
- 1
- 2
评论记录:
回复评论: