Hadoop系列文章目录
1、hadoop3.1.4简单介绍及部署、简单验证
2、HDFS操作 - shell客户端
3、HDFS的使用(读写、上传、下载、遍历、查找文件、整个目录拷贝、只拷贝文件、列出文件夹下文件、删除文件及目录、获取文件及文件夹属性等)-java
4、HDFS-java操作类HDFSUtil及junit测试(HDFS的常见操作以及HA环境的配置)
5、HDFS API的RESTful风格–WebHDFS
6、HDFS的HttpFS-代理服务
7、大数据中常见的文件存储格式以及hadoop中支持的压缩算法
8、HDFS内存存储策略支持和“冷热温”存储
9、hadoop高可用HA集群部署及三种方式验证
10、HDFS小文件解决方案–Archive
11、hadoop环境下的Sequence File的读写与合并
12、HDFS Trash垃圾桶回收介绍与示例
13、HDFS Snapshot快照
14、HDFS 透明加密KMS
15、MapReduce介绍及wordcount
16、MapReduce的基本用法示例-自定义序列化、排序、分区、分组和topN
17、MapReduce的分区Partition介绍
18、MapReduce的计数器与通过MapReduce读取/写入数据库示例
19、Join操作map side join 和 reduce side join
20、MapReduce 工作流介绍
21、MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件
22、MapReduce使用Gzip压缩、Snappy压缩和Lzo压缩算法写文件和读取相应的文件
23、hadoop集群中yarn运行mapreduce的内存、CPU分配调度计算与优化
本文介绍hdfs的shell操作,本文的前提是hdfs的功能正常运行。
本文分为2个部分介绍,即语法格式与具体示例。
一、语法格式
HDFS是存取数据的分布式文件系统,那么对HDFS的操作,就是文件系统的基本操作,比如文件的创建、修改、删除、修改权限等,文件夹的创建、删除、重命名等。对HDFS的操作命令类似于Linux的shell对文件的操作,如ls、mkdir、rm等。
HDFS Shell CLI支持操作多种文件系统,包括本地文件系统(file:///)、分布式文件系统(hdfs://nn:8020)等操作的是什么文件系统取决于URL中的前缀协议。如果没有指定前缀,则将会读取环境变量中的fs.defaultFS属性,以该属性值作为默认文件系统
- hdfs dfs -ls file:/// #操作本地文件系统
- hdfs dfs -ls hdfs://server1:8020/ #操作HDFS分布式文件系统
- hdfs dfs -ls / #直接根目录,没有指定协议 将加载读取fs.defaultFS值
hadoop dfs、hdfs dfs、 hadoop fs 三者区别
- hadoop dfs 只能操作HDFS文件系统(包括与Local FS间的操作),不过已经Deprecated
- hdfs dfs 只能操作HDFS文件系统相关(包括与Local FS间的操作),常用
- hadoop fs 可操作任意文件系统,不仅仅是hdfs文件系统,使用范围更广
目前版本来看,官方最终推荐使用的是hadoop fs。当然hdfs dfs在市面上的使用也比较多。
语法格式
[root@server1 ~]# hdfs
Usage: hdfs [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
OPTIONS is none or any of:
--buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--daemon (start|status|stop) operate on a daemon
--debug turn on shell script debug mode
--help usage information
--hostnames list[,of,host,names] hosts to use in worker mode
--hosts filename list of hosts to use in worker mode
--loglevel level set the log4j level for this command
--workers turn on worker mode
SUBCOMMAND is one of:
Admin Commands:
cacheadmin configure the HDFS cache
crypto configure HDFS encryption zones
debug run a Debug Admin to execute HDFS debug commands
dfsadmin run a DFS admin client
dfsrouteradmin manage Router-based federation
ec run a HDFS ErasureCoding CLI
fsck run a DFS filesystem checking utility
haadmin run a DFS HA admin client
jmxget get JMX exported values from NameNode or DataNode.
oev apply the offline edits viewer to an edits file
oiv apply the offline fsimage viewer to an fsimage
oiv_legacy apply the offline fsimage viewer to a legacy fsimage
storagepolicies list/get/set block storage policies
Client Commands:
classpath prints the class path needed to get the hadoop jar and the required libraries
dfs run a filesystem command on the file system
envvars display computed Hadoop environment variables
fetchdt fetch a delegation token from the NameNode
getconf get config values from configuration
groups get the groups which users belong to
lsSnapshottableDir list all snapshottable dirs owned by the current user
snapshotDiff diff two snapshots of a directory or diff the current directory contents with a snapshot
version print the version
Daemon Commands:
balancer run a cluster balancing utility
datanode run a DFS datanode
dfsrouter run the DFS router
diskbalancer Distributes data evenly among disks on a given node
httpfs run HttpFS server, the HDFS HTTP Gateway
journalnode run the DFS journalnode
mover run a utility to move block replicas across storage types
namenode run the DFS namenode
nfs3 run an NFS version 3 gateway
portmap run a portmap service
secondarynamenode run the DFS secondary namenode
zkfc run the ZK Failover Controller daemon
SUBCOMMAND may print help when invoked w/o parameters or with -h.
# 查看HDFS中/parent/child目录下的文件或者文件夹
hdfs dfs -ls /parent/child
#所有HDFS命令都可以通过bin/hdfs脚本执行。
# 查看指定目录下的文件
hdfs dfs -ls hdfs://namenode:host/parent/child
# hdfs-site.xml中的fs.defaultFS中有配置
hdfs dfs -ls /parent/child
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
二、具体命令示例
1、mkdir命令
格式 : hdfs dfs -mkdir [-p]
作用 : 以中的URI作为参数,创建目录。使用-p参数可以递归创建目录
hdfs dfs -mkdir /dir1
hdfs dfs -mkdir /dir2
hdfs dfs -mkdir -p /aaa/bbb/ccc
[alanchan@server1 ~]$ hdfs dfs -mkdir /dir1
[alanchan@server1 ~]$ hdfs dfs -mkdir /dir2
[alanchan@server1 ~]$ hdfs dfs -mkdir -p /aaa/bbb/ccc
[alanchan@server1 ~]$ hadoop fs -ls /
Found 3 items
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir1
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
[alanchan@server1 ~]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir1
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
[alanchan@server1 ~]$ hadoop fs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa/bbb
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa/bbb/ccc
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir1
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
2、ls命令
格式: hdfs dfs -ls [-R] URI
作用:类似于Linux的ls命令,显示文件列表
hdfs dfs -ls -R /
-R:表示递归展示目录下的内容
[alanchan@server1 ~]$ hadoop fs -ls /
Found 4 items
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir1
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
[alanchan@server1 ~]$ hadoop fs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa/bbb
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa/bbb/ccc
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir1
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
3、put命令
-put参数可以将单个的源文件src或者多个源文件src从本地文件系统拷贝到目标文件系统中(对应的路径)。也可以从标准输入中读取输入,写入目标文件系统中。
语法格式:hadoop fs -put [-f] [-p] …
-f 覆盖目标文件(已存在下)
-p 保留访问和修改时间,所有权和权限。
localsrc 本地文件系统(客户端所在机器)
dst 目标文件系统(HDFS)
[alanchan@server1 sbin]$ hdfs dfs -put /usr/local/bigdata/hadoop-3.1.4/README.txt /dir1
[alanchan@server1 sbin]$ hdfs dfs -ls -R /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
- 1
- 2
- 3
4、 rm 命令
删除参数指定的文件和目录,参数可以有多个,删除目录需要加-r参数如果指定-skipTrash选项,那么在回收站可用的情况下,该选项将跳过回收站而直接删除文件;否则,在回收站可用时,在HDFS Shell 中执行此命令,会将文件暂时放到回收站中。
hdfs dfs -rm [-r] [-skipTrash] URI [URI…]
[alanchan@server1 sbin]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa/bbb
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:17 /aaa/bbb/ccc
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:43 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
drwxr-xr-x - alanchan supergroup 0 2022-08-26 12:34 /testhadoopcreate
[alanchan@server1 sbin]$ hdfs dfs -rm /aaa
rm: `/aaa': Is a directory
[alanchan@server1 sbin]$ hdfs dfs -rm -r /aaa
Deleted /aaa
[alanchan@server1 sbin]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:43 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:09 /dir2
drwxr-xr-x - alanchan supergroup 0 2022-08-26 12:34 /testhadoopcreate
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
5、moveFromLocal 命令
和put参数类似,但是源文件localsrc拷贝之后自身被删除
语法格式:hdfs dfs -moveFromLocal
[alanchan@server1 sbin]$ hdfs dfs -moveFromLocal /usr/local/bigdata/hadoop-3.1.4/README.txt /dir2
[alanchan@server1 sbin]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:43 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:52 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:52 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 12:34 /testhadoopcreate
[alanchan@server1 sbin]$ ls /usr/local/bigdata/hadoop-3.1.4
bin dfs etc include lib libexec LICENSE.txt logs NOTICE.txt sbin share
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
6、 -get
将文件拷贝到本地文件系统,可以通过指定-ignorecrc选项拷贝CRC校验失败的文件。-crc选项表示获取文件以及CRC校验文件。
语法格式:
hadoop fs -get [-f] [-p] …
下载文件到本地文件系统指定目录,localdst必须是目录
-f 覆盖目标文件(已存在下)
-p 保留访问和修改时间,所有权和权限。
hadoop fs -getmerge [-nl] [-skip-empty-file]
下载多个文件合并到本地文件系统的一个文件中。
-nl选项表示在每个文件末尾添加换行符
[alanchan@server1 sbin]$ cd /usr/local/bigdata
[alanchan@server1 bigdata]$ ll
drwxr-xr-x 11 alanchan root 4096 8月 26 13:52 hadoop-3.1.4
-rw-r--r-- 1 alanchan root 303134111 8月 23 16:49 hadoop-3.1.4-bin-snappy-CentOS7.tar.gz
[alanchan@server1 bigdata]$ hadoop fs -get /dir1/README.txt /usr/local/bigdata
[alanchan@server1 bigdata]$ ll
总用量 325876
drwxr-xr-x 11 alanchan root 4096 8月 26 13:52 hadoop-3.1.4
-rw-r--r-- 1 alanchan root 303134111 8月 23 16:49 hadoop-3.1.4-bin-snappy-CentOS7.tar.gz
-rw-r--r-- 1 alanchan root 1366 8月 26 14:21 README.txt
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
7、cat 命令
将参数所指示的文件内容输出到控制台。注意:对于大文件内容读取,慎重。
语法格式:
hdfs dfs -cat URI [uri …]
[alanchan@server1 sbin]$ hdfs dfs -cat /dir1/README.txt
For the latest information about Hadoop, please visit our website at:
http://hadoop.apache.org/core/
and our wiki, at:
http://wiki.apache.org/hadoop/
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
Hadoop Core uses the SSL libraries from the Jetty project written
by mortbay.org.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
8、head 命令
显示要输出的文件的开头的1KB数据。
语法格式:
hdfs dfs -head URI
[alanchan@server1 sbin]$ hdfs dfs -head /dir1/README.txt
For the latest information about Hadoop, please visit our website at:
http://hadoop.apache.org/core/
and our wiki, at:
http://wiki.apache.org/hadoop/
This distribution includes cryptographic software. The country in
which you currently reside may have restrictions on the import,
possession, use, and/or re-export to another country, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
9、tail 命令
显示文件结尾的1kb数据。
语法格式:
hdfs dfs -tail [-f] URI
#与Linux中一样,-f选项表示数据只要有变化也会输出到控制台。
[alanchan@server1 sbin]$ hdfs dfs -tail /dir1/README.txt
try, of
encryption software. BEFORE using any encryption software, please
check your country's laws, regulations and policies concerning the
import, possession, or use, and re-export of encryption software, to
see if this is permitted. See <http://www.wassenaar.org/> for more
information.
The U.S. Government Department of Commerce, Bureau of Industry and
Security (BIS), has classified this software as Export Commodity
Control Number (ECCN) 5D002.C.1, which includes information security
software using or performing cryptographic functions with asymmetric
algorithms. The form and manner of this Apache Software Foundation
distribution makes it eligible for export under the License Exception
ENC Technology Software Unrestricted (TSU) exception (see the BIS
Export Administration Regulations, Section 740.13) for both object
code and source code.
The following provides more details on the included cryptographic
software:
Hadoop Core uses the SSL libraries from the Jetty project written
by mortbay.org.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
10、 cp拷贝命令
将文件拷贝到目标路径中。如果 为目录的话,可以将多个文件拷贝到该目录下。
语法格式:
hdfs dfs -cp URI [URI …]
命令行选项:
-f 选项将覆盖目标,如果它已经存在
-p 选项将保留文件属性(时间戳、所有权、许可、ACL、XAttr)。
[alanchan@server1 sbin]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:43 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:52 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:52 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 12:34 /testhadoopcreate
[alanchan@server1 sbin]$ hdfs dfs -rm /dir2/README.txt
Deleted /dir2/README.txt
[alanchan@server1 sbin]$ hdfs dfs -cp /dir1/README.txt /dir2
[alanchan@server1 sbin]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:43 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:17 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:17 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 12:34 /testhadoopcreate
[alanchan@server1 sbin]$ hdfs dfs -cp /dir1/README.txt /dir2/README.txt /testhadoopcreate
cp: `/testhadoopcreate/README.txt': File exists
[alanchan@server1 sbin]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 13:43 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:17 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:17 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:18 /testhadoopcreate
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:18 /testhadoopcreate/README.txt
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
11、appendToFile 命令
追加一个或者多个文件到hdfs指定文件中.也可以从命令行读取输入
语法格式:
hadoop fs -appendToFile …
所有给定本地文件的内容追加到给定dst文件。
dst如果文件不存在,将创建该文件。
如果为-,则输入为从标准输入中读取。
[alanchan@server1 bigdata]$ echo 1 >>1.txt
[alanchan@server1 bigdata]$ ll
-rw-r--r-- 1 alanchan root 2 8月 26 14:24 1.txt
drwxr-xr-x 11 alanchan root 4096 8月 26 13:52 hadoop-3.1.4
-rw-r--r-- 1 alanchan root 303134111 8月 23 16:49 hadoop-3.1.4-bin-snappy-CentOS7.tar.gz
[alanchan@server1 bigdata]$ hdfs dfs -put 1.txt /dir1
[alanchan@server1 bigdata]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:26 /dir1
-rw-r--r-- 3 alanchan supergroup 2 2022-08-26 14:26 /dir1/1.txt
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:17 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:17 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:18 /testhadoopcreate
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:18 /testhadoopcreate/README.txt
[alanchan@server1 bigdata]$ hadoop fs -appendToFile 1.txt /dir/1.txt
[alanchan@server1 bigdata]$ hdfs dfs -ls -R /
-rw-r--r-- 3 alanchan supergroup 2 2022-08-26 14:28 /dir/1.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:26 /dir1
-rw-r--r-- 3 alanchan supergroup 2 2022-08-26 14:26 /dir1/1.txt
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:17 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:17 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:18 /testhadoopcreate
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:18 /testhadoopcreate/README.txt
[alanchan@server1 bigdata]$ hadoop fs -appendToFile 1.txt /dir1/1.txt
[alanchan@server1 bigdata]$ hadoop fs -cat /dir1/1.txt
1
1
[alanchan@server1 bigdata]$ cat 2.txt
[alanchan@server1 bigdata]$ echo 2 >>2.txt
[alanchan@server1 bigdata]$ cat 2.txt
2
[alanchan@server1 bigdata]$ hadoop fs -appendToFile 1.txt 2.txt /dir1/1.txt
[alanchan@server1 bigdata]$ hadoop fs -cat /dir1/1.txt
1
1
2
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
12、 df 命令
df命令用来查看HDFS空闲的空间。
hdfs dfs -df [-h] URI [URI …]
[alanchan@server1 bigdata]$ hdfs dfs -df /
Filesystem Size Used Available Use%
hdfs://server1:8020 940170657792 233472 785872048128 0%
[alanchan@server1 bigdata]$ hdfs dfs -df -h /
Filesystem Size Used Available Use%
hdfs://server1:8020 875.6 G 228 K 731.9 G 0%
- 1
- 2
- 3
- 4
- 5
- 6
13、du 命令
显示目录中所有文件大小,当只指定一个文件时,显示此文件的大小。
语法格式:
hdfs dfs -du [-s] [-h] [-v] [-x] URI [URI …]
命令选项:
-s:表示显示文件长度的汇总摘要,而不是单个文件的摘要。
-h:选项将以“人类可读”的方式格式化文件大小
-v:选项将列名显示为标题行。
-x:选项将从结果计算中排除快照。
[alanchan@server1 bigdata]$ hdfs dfs -du -h /
2 6 /dir
1.3 K 4.0 K /dir1
1.3 K 4.0 K /dir2
10 30 /testhadoopcreate
[alanchan@server1 bigdata]$ hdfs dfs -du -h -v /
SIZE DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS FULL_PATH_NAME
2 6 /dir
1.3 K 4.0 K /dir1
1.3 K 4.0 K /dir2
10 30 /testhadoopcreate
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
14、mv 命令
将hdfs上的文件从原路径移动到目标路径(移动之后文件删除),该命令不能跨文件系统。
hadoop fs -mv …
移动文件到指定文件夹下
可以使用该命令移动数据,重命名文件的名称
[alanchan@server1 bigdata]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:28 /dir
-rw-r--r-- 3 alanchan supergroup 2 2022-08-26 14:28 /dir/1.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:26 /dir1
-rw-r--r-- 3 alanchan supergroup 10 2022-08-26 14:30 /dir1/1.txt
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:17 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:17 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:43 /testhadoopcreate
[alanchan@server1 bigdata]$ hadoop fs -mv /dir1/1.txt /testhadoopcreate
[alanchan@server1 bigdata]$ hdfs dfs -ls -R /
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:28 /dir
-rw-r--r-- 3 alanchan supergroup 2 2022-08-26 14:28 /dir/1.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:46 /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:17 /dir2
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 14:17 /dir2/README.txt
drwxr-xr-x - alanchan supergroup 0 2022-08-26 14:46 /testhadoopcreate
-rw-r--r-- 3 alanchan supergroup 10 2022-08-26 14:30 /testhadoopcreate/1.txt
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
15、setrep 命令
更改文件的副本因子。 如果path是目录,则该命令以递归方式更改以path为根的目录树下所有文件的复制因子。
hadoop fs -setrep [-R] [-w]
修改指定文件的副本个数。
-R表示递归 修改文件夹下及其所有
-w 客户端是否等待副本修改完毕
[alanchan@server1 bigdata]$ hadoop fs -ls -R /dir1
-rw-r--r-- 3 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
[alanchan@server1 bigdata]$ hadoop fs -setrep -w 2 /dir1/README.txt
Replication 2 set: /dir1/README.txt
Waiting for /dir1/README.txt ...
WARNING: the waiting time may be long for DECREASING the number of replications.
. done
[alanchan@server1 bigdata]$ hadoop fs -ls -R /dir1
-rw-r--r-- 2 alanchan supergroup 1366 2022-08-26 13:43 /dir1/README.txt
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
16、checksum
返回文件的校验和信息。
[root@server1 ~]# hdfs dfs -checksum /source/comment_log/test.csv
/source/comment_log/test.csv MD5-of-0MD5-of-512CRC32C 000002000000000000000000d79e10d1da54356351f7c9776849c3bb
- 1
- 2
17、copyFromLocal
与put命令类似,将本地文件拷贝到HDFS。但put命令可以传多个文件、或者是标准输入(-)。
[root@server1 ~]# hdfs dfs -copyFromLocal test.csv /tmp
[root@server1 ~]# hdfs dfs -ls /tmp
Found 2 items
-rw-r--r-- 3 root supergroup 2821683 2022-10-16 09:24 /tmp/test_new.csv
-rw-r--r-- 3 root supergroup 24 2022-10-16 09:21 /tmp/test
- 1
- 2
- 3
- 4
- 5
18、copyToLocal
与get命令类似,但只拷贝到一个本地文件
[root@server1 ~]# hdfs dfs -copyToLocal /tmp/test
[root@server1 ~]# ll
total 34260
-rw-r--r-- 1 root root 32252088 Oct 15 23:11 test.csv
-rw-r--r-- 1 root root 2821683 Oct 15 23:29 server_new.csv
-rw-r--r-- 1 root root 24 Oct 16 09:28 test
- 1
- 2
- 3
- 4
- 5
- 6
19、count
计算与指定文件模式匹配的路径下的目录,文件和字节数。 获取配额和使用情况。 具有-count的输出列是:DIR_COUNT,FILE_COUNT,CONTENT_SIZE,PATHNAME
[root@server1 ~]# hdfs dfs -count -q -v -h /source
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
none inf none inf 6 1 33.4 M /source
- 1
- 2
- 3
20、find
查找与指定表达式匹配的所有文件,并对它们应用选定的操作。 如果未指定路径,则默认为当前工作目录。 如果未指定表达式,则默认为-print。
[root@server1 ~]# hdfs dfs -find / -name "test*" -print
/source/comment_log/test.csv
- 1
- 2
- 3
评论记录:
回复评论: