Hadoop系列文章目录
1、hadoop3.1.4简单介绍及部署、简单验证
 2、HDFS操作 - shell客户端
 3、HDFS的使用(读写、上传、下载、遍历、查找文件、整个目录拷贝、只拷贝文件、列出文件夹下文件、删除文件及目录、获取文件及文件夹属性等)-java
 4、HDFS-java操作类HDFSUtil及junit测试(HDFS的常见操作以及HA环境的配置)
 5、HDFS API的RESTful风格–WebHDFS
 6、HDFS的HttpFS-代理服务
 7、大数据中常见的文件存储格式以及hadoop中支持的压缩算法
 8、HDFS内存存储策略支持和“冷热温”存储
 9、hadoop高可用HA集群部署及三种方式验证
 10、HDFS小文件解决方案–Archive
 11、hadoop环境下的Sequence File的读写与合并
 12、HDFS Trash垃圾桶回收介绍与示例
 13、HDFS Snapshot快照
 14、HDFS 透明加密KMS
 15、MapReduce介绍及wordcount
 16、MapReduce的基本用法示例-自定义序列化、排序、分区、分组和topN
 17、MapReduce的分区Partition介绍
 18、MapReduce的计数器与通过MapReduce读取/写入数据库示例
 19、Join操作map side join 和 reduce side join
 20、MapReduce 工作流介绍
 21、MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件
 22、MapReduce使用Gzip压缩、Snappy压缩和Lzo压缩算法写文件和读取相应的文件
 23、hadoop集群中yarn运行mapreduce的内存、CPU分配调度计算与优化
一、介绍
1、快照介绍
快照(Snapshot)是数据存储的某一时刻的状态记录;与备份不同,备份(Backup)则是数据存储的某一个时刻的副本。
 HDFS Snapshot快照是整个文件系统或某个目录在某个时刻的镜像。
 该镜像并不会随着源目录的改变而进行动态的更新。
2、快照作用
- 数据恢复
 对重要目录进行创建snapshot的操作,当用户误操作时,可以通过snapshot来进行相关的恢复操作
- 数据备份
 使用snapshot来进行整个集群或者某些目录、文件的备份
 管理员以某个时刻的snapshot作为备份的起始结点,然后通过比较不同备份之间差异性,来进行增量备份
- 数据测试
 可以临时的为用户针对要操作的数据来创建一个snapshot,然后让用户在对应的snapshot上进行相关的实验和测试
3、HDFS快照功能
- HDFS快照不是数据的简单拷贝,只做差异的记录
- 对于大多不变的数据,你所看到的数据其实是当前物理路径所指的内容,而发生变更的inode数据才会被快照额外拷贝,也就是所说的差异拷贝
- inode指索引节点,用来存放文件及目录的基本信息,包含时间、名称、拥有者、所在组等
- HDFS快照不会复制datanode中的块,只记录了块列表和文件大小
- HDFS快照不会对常规HDFS操作产生不利影响,修改记录按逆时针顺序进行,因此可以直接访问当前数据。通过从当前数据中减去修改来计算快照数据
二、使用及验证
1、快照功能开启
HDFS中可以针对整个文件系统或者某个目录创建快照,但是创建快照的前提是相应的目录开启快照的功能。如果针对没有启动快照功能的目录创建快照则会报错。
#启用快照功能:
hdfs dfsadmin -allowSnapshot /testsnapshot
[alanchan@server1 ~]$ hadoop fs -ls /testsnapshot
[alanchan@server1 ~]$ hadoop fs -createSnapshot  /testsnapshot test_snapshot
createSnapshot: Directory is not a snapshottable directory: /testsnapshot
[alanchan@server1 ~]$ hdfs dfsadmin -allowSnapshot /testsnapshot
Allowing snapshot on /testsnapshot succeeded
[alanchan@server1 ~]$ hadoop fs -createSnapshot  /testsnapshot test_snapshot
Created snapshot /testsnapshot/.snapshot/test_snapshot
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
2、快照功能禁用
HDFS中可以针对已经开启快照功能的目录进行禁用。禁用的前提是该目录的所有快照已经被删除。
#禁用快照功能:
hdfs dfsadmin -disallowSnapshot  /testsnapshot
- 1
- 2
3、快照操作相关命令
createSnapshot 创建快照
 deleteSnapshot 删除快照
 renameSnapshot 重命名快照
 lsSnapshottableDir 列出可以快照目录列表
 snapshotDiff 获取快照差异报告
[alanchan@server1 ~]$ hadoop dfs
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.
Usage: hadoop fs [generic options]
...
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
...
command [genericOptions] [commandOptions]
[alanchan@server1 ~]# hdfs lsSnapshottableDir
[alanchan@server1 ~]# hdfs snapshotDiff     
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
4、HDFS Snapshot使用
1)、开启指定目录快照功能
[alanchan@server1 ~]$ hdfs dfsadmin -allowSnapshot /testsnapshot
Allowing snapshot on /testsnapshot succeeded
- 1
- 2
2)、创建快照
#系统自动生成快照名称
hdfs dfs -createSnapshot /testsnapshot
#指定名称创建快照
hdfs dfs -createSnapshot /testsnapshot test_snapshot
[alanchan@server1 ~]$ hadoop fs -createSnapshot  /testsnapshot test_snapshot
Created snapshot /testsnapshot/.snapshot/test_snapshot
- 1
- 2
- 3
- 4
- 5
- 6
- 7

3)、Web页面浏览快照

4)、重命名快照
#重命名快照
hdfs dfs -renameSnapshot /testsnapshot test_snapshot rename_test_snapshot 
[alanchan@server1 ~]$ hdfs dfs -renameSnapshot /testsnapshot test_snapshot rename_test_snapshot 
[alanchan@server1 ~]$ hadoop fs -ls  /testsnapshot/.snapshot
Found 1 items
drwxr-xr-x   - alanchan supergroup          0 2022-09-13 09:38 /testsnapshot/.snapshot/rename_test_snapshot
- 1
- 2
- 3
- 4
- 5
- 6
- 7
5)、列出HDFS集群所有开启快照功能的目录
#列出HDFS集群所有开启快照功能的目录
hdfs lsSnapshottableDir
[alanchan@server1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 alanchan supergroup 0 2022-09-13 09:38 1 65536 /testsnapshot
- 1
- 2
- 3
- 4
- 5
6)、快照间差异比较
准备快照
hdfs dfs -createSnapshot /testsnapshot snapshot1
#1、在当前文件夹下创建一个快照snapshot1
[alanchan@server1 ~]$ hdfs dfs -createSnapshot /testsnapshot snapshot1
Created snapshot /testsnapshot/.snapshot/snapshot1
#2、在当前文件夹下上传一个文件,并再创建一个快照snapshot2
[alanchan@server1 ~]$ cd /usr/local/bigdata/hadoop-3.1.4
[alanchan@server1 hadoop-3.1.4]$ echo 12345 >> 1.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -put 1.txt /testsnapshot
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -ls /testsnapshot
Found 1 items
-rw-r--r--   3 alanchan supergroup          6 2022-09-13 10:05 /testsnapshot/1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot2
Created snapshot /testsnapshot/.snapshot/snapshot2
#3、修改文件内容后,再创建一个快照snapshot3
[alanchan@server1 hadoop-3.1.4]$ echo 54321 >> 2.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -appendToFile 2.txt /testsnapshot/1.txt
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -cat /testsnapshot/1.txt
12345
54321
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot3
Created snapshot /testsnapshot/.snapshot/snapshot3
#4、上传一个文件,再创建一个快照snapshot4
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -put README.txt /testsnapshot
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot4
Created snapshot /testsnapshot/.snapshot/snapshot4
#5、删除一个文件后,再创建一个快照snapshot5
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -rm /testsnapshot/README.txt
2022-09-13 10:09:31,863 INFO fs.TrashPolicyDefault: Moved: 'hdfs://HadoopHAcluster/testsnapshot/README.txt' to trash at: hdfs://HadoopHAcluster/user/alanchan/.Trash/Current/testsnapshot/README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs dfs -createSnapshot /testsnapshot snapshot5
Created snapshot /testsnapshot/.snapshot/snapshot5
[alanchan@server1 hadoop-3.1.4]$ 
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32

7)、比较快照差异
+ The file/directory has been created.
-  The file/directory has been deleted.
M  The file/directory has been modified.
R  The file/directory has been renamed.
- 1
- 2
- 3
- 4
#比较指定目录两个版本快照之间的差异
hdfs snapshotDiff /testsnapshot snapshot1 snapshot2
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot2
Difference between snapshot snapshot1 and snapshot snapshot2 under directory /testsnapshot:
M       .
+       ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot3
Difference between snapshot snapshot1 and snapshot snapshot3 under directory /testsnapshot:
M       .
+       ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot4
Difference between snapshot snapshot1 and snapshot snapshot4 under directory /testsnapshot:
M       .
+       ./1.txt
+       ./README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot1 snapshot5
Difference between snapshot snapshot1 and snapshot snapshot5 under directory /testsnapshot:
M       .
+       ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot3
Difference between snapshot snapshot2 and snapshot snapshot3 under directory /testsnapshot:
M       ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot4
Difference between snapshot snapshot2 and snapshot snapshot4 under directory /testsnapshot:
M       .
+       ./README.txt
M       ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot2 snapshot5
Difference between snapshot snapshot2 and snapshot snapshot5 under directory /testsnapshot:
M       ./1.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot3 snapshot4
Difference between snapshot snapshot3 and snapshot snapshot4 under directory /testsnapshot:
M       .
+       ./README.txt
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot3 snapshot5
Difference between snapshot snapshot3 and snapshot snapshot5 under directory /testsnapshot:
[alanchan@server1 hadoop-3.1.4]$ hdfs snapshotDiff /testsnapshot snapshot4 snapshot5
Difference between snapshot snapshot4 and snapshot snapshot5 under directory /testsnapshot:
M       .
-       ./README.txt
[alanchan@server1 hadoop-3.1.4]$ 
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
8)、删除快照
 hdfs dfs -deleteSnapshot /testsnapshot rename_test_snapshot
- 1
9)、删除开启快照功能的目录
hadoop fs -rm -r /allenwoon
 拥有快照的目录不允许被删除,某种程度上也保护了文件安全
[alanchan@server1 hadoop-3.1.4]$ hadoop fs -rm -r /testsnapshot
rm: Failed to move to trash: hdfs://HadoopHAcluster/testsnapshot: The directory /testsnapshot cannot be deleted since /testsnapshot is snapshottable and already has snapshots
- 1
- 2
 
                                    
评论记录:
回复评论: