博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Unix cut command
阅读量:2347 次
发布时间:2019-05-10

本文共 18419 字,大约阅读时间需要 61 分钟。

原文链接:

 

The external cut command displays selected columns or fields from each line of a file. It is a UNIX equivalent to the relational algebra selection operation.  If the capabilities of cutare not enough, then the alternatives are  and  .

The cut command uses IFS (Input Field Separators) to determine where to split fields.  You can check it with set | grep IFS. You can also set it, for example

 IFS=" /t/n"

The most typical usage is cutting one of several columns from a file (often ) to create a new file.  For example:

cut -d ' ' -f 2-7

retrieves the second to seventh field assuming that each field is separated by a single (note: single) blank. Option -d specified a single character delimiter (in the example about it is a blank) which serves as field separator. option -f  which specifies range of fields included in the output (fields range from two to seven ).   Note that option -d presuppose usage of option -f.         

Cut can work in two modes:

  • column delimited selection (each column starts with certain fixed offset defined as range from-to)
  • separator-delimited selection (with column separator being  a single character like blank, comma, colon, etc).  In this mode cut uses a delimiter defined by -d option (as in example above). By default cut uses the value of delimiter stored in a shell variable called IFS (Input Field Separators) -- typically TAB.

Cut is essentially a simple text parsing tool and unless the task in hands is also simple you will be better off using  other, more flexible, text parsing tools instead. On modern computers difference between invocation of cut and invocation of awk is negligible.  You can also use   for the same task. If option -a ( autosplit mode) is specified, then each line in Perl is converted into array @F.  So Perl emulation of cut consist of writing a simple print statement that outputs the necessary fields.  The advantage of the Perl is that the columns can be counted from the last (using negative indexes).

The advantage of the Perl is that the columns can be counted from the last (using negative indexes).

Typical pages with Perl command line one-liners contain many interesting examples that can probably be adapted to your particular situation:    

Here is example on how to print first and the second from the last columns:

 perl -lane 'print "$F[0]:$F[-2]/n"'

Here's a simple one-line script that will print out the fourth word of every line, but also skip any line beginning with a # because it's a comment line.

perl -naF 'next if /^#/; print "$F[3]/n"'

The most popular modern usage of cut is probably connected with processing http and proxy logs (see ). 

Column selection mode

A column is one character position. In this mode cut acts as a generalized for files substr function. Classic Unix cat cannot count characters from the back of the line like Perl substr function,  but  can ). This type of selection is specified with -c option. List entries can be open (from the beginning like in -5, or to the end like in 6-), or closed (like 6-9).

cut -c 4,5,20 foo # cuts foo at columns 4, 5, and 20.

cut -c 1-5 a.dat | more  print the first 5 characters of every line in the file a.dat

cut -c -5 a.dat | more  #  same as above but using open range

Field selection mode

In this mode cut selects not characters but fields delimited by specific one character delimiter specified by  option -d.  The list of fields is specified with -f option ( -f [list] )

cut -d ":" -f1,7 /etc/passwd  # cuts fields 1 and 7 from /etc/passwd
cut -d ":" -f 1,6- /etc/passwd # cuts fields 1,  6 to the end from /etc/passwd

The default delimiter is TAB. If space is used as a delimiter, be sure to put it in quotes (-d " ").

Note: Another way to specify blank (or other shell-sensitive character) is  to use /  -- the following example prints the second field of every line in the file /etc/passwd

cut -f2 -d/  /etc/passwd | more 

Line suppresssion option

In field selection mode cut can suppress lines that contain no defined  in option -d delimiters (-s option). Unless this option is specified, lines with no delimiters will be included  in the output untouched

Complement selection (GNU cut only)

This is GNU cut option only. Option --complement converts the set of selected bytes, characters or fields to its complement.  It applies to the preceding option.  In this case you can specify not the list of fields of character columns to be retained, but those that needs to be excluded. In some cases that simplifies the writing of the selection range. 

For example instead of the example listed above:

    cut -d ":" -f 1,6- /etc/passwd # cuts fields 1 and 6 to the end on the line from /etc/passwd

you can specify:

    cut -d ":" -f 2-5 --complement  /etc/passwd # cuts fields 1 and 6 to the end on the line from /etc/passwd

By using pipes and output shell redirection operators you can create new files with a subset of columns or fields contained in the first file.

Usage in Shell

Sometimes cut is used in shell programming to select certain substrings from a variable, for example:

echo Argument 1 = [$1]c=`echo $1 | cut -c6-8`echo Characters 6 to 8 = [$c]

Output:

Argument 1 = [1234567890]Characters 6 to 8 = [678]

This is one of many ways to perform such a selection. In all but simplest cases   or Perl are better tools for the job.  If you are selecting fields of a shell variable, you should probably use the setcommand and echo the desired positional parameter into pipe.

For complex cases  is definitely a preferable tool. Moreover several Perl re-implementations of cut exists: see for example .

BTW Perl implementations are more flexible and less capricious that the C-written original Unix cut command.

As I mentioned before there are two variants of cut: the first in character column cut and the second is delimiter based (parsing) cut. In both cases option can be separated from the value by a space, for example

-d ' '

In other words POSIX and GNU implementations of cut uses "almost" standard logical lexical parsing of argument although most examples in the books use "old style" with arguments "glued" to options.   "Glued" style of specifying arguments is generally an anachronism.  Still quoting of delimiter might not always be possible even in modern versions for example most implementations of cut requires that delimiter /t (tab) be specified without quotes. You generally need to experiment with your particular implementation.

1. Character column cut

cut -c list [ file_list ]

Option:

-c list Display (cut) columns, specified in 
list, from the input data. Columns are counted from one, not from zero, so the first column is column 1. List can be separated from the option by space(s) but 
no spaces are allowed within the list. Multiple values must be comma (,) separated. The 
list defines the exact columns to display. For example, the 
-c 1,4,7 notation cuts columns 1, 4, and 7 of the input. The 
-c -10,50  would select columns 1  through 10 and 50 through end-of-line (please remember that columns are conted from one)

2. Delimiter-based  (parsing) cut

cut -f list [ -d char ] [ -s ] [ file_list ]

Options:

d char The character char is used as the field delimiter. It is usually quoted but can be escaped.  The default delimiter is a tab character. To use a character that has special meaning to the shell, you must quote the character so the shell does not interpret it. For example, to use a single space as a delimiter, type -d' '.

-f list  Selects (cuts) fields, specified in list, from the input data. Fields are counted from one, not from zero. No spaces are allowed within the list. Multiple values must be comma (,) separated. The list defines the exact field to display. The most practically important ranges are "open" ranges, were either starting field or the last field are not specified explicitly (omitted).  For example:

  •  Selection from the beginning of the line to a certain field is specified as -N, were N is the number of the filed. For example 
    -f -5
  •  Selection  from the certain filed to the end of the line (all fileds starting from N) is specified as N-. For example -f 5-

Specification can be complex and include both selected fields and ranges. For example, -f 1,4,7 would select fields 1, 4, and 7. The -f2,4-6,8 would select fields 2 to 6 (range) and field 8.

Limitations

Please remember that cut is good only for simple cases. In complex cases AWK and Perl actually save your time. Limitations are many. Among them:

  • Delimiter are single characters; they are not regular expressions. This leads to huge disappointment when you try to parse blank-delimited file with cut: multiple blanks are counted as multiple filed separators.
  • Syntax is irregular and sometimes tricky. For example one character delimiters can be quoted but escaped delimiters cannot be quoted.  
  • Semantic is the most basic. Cut is essentially a text parser and as such is suitable mainly for parsing colon delimited and similar files. Functionality does even match the level of Fortran IV format statement.

Examples

  1. [From AIX cut man page] To display several fields of each line of a file, enter: 
    cut   1,5 -d : /etc/passwd
    This displays the login name and full user name fields of the system password file. These are the first and fifth fields (-f 1,5) separated by colons (-d :).

    For example, if the /etc/passwd file looks like this:

    su:*:0:0:User with special privileges:/:/usr/bin/shdaemon:*:1:1::/etc:bin:*:2:2::/usr/bin:sys:*:3:3::/usr/src:adm:*:4:4:System Administrator:/var/adm:/usr/bin/shpierre:*:200:200:Pierre Harper:/home/pierre:/usr/bin/shjoan:*:202:200:Joan Brown:/home/joan:/usr/bin/sh

    The cut command produces:

    su:User with special privilegesdaemon:bin:sys:adm:System Administratorpierre:Pierre Harperjoan:Joan Brown
  2. [From AIX cut man page] To display fields using a blank separated list, enter:
    cut -f "1 2 3" -d : /etc/passwd

    The cut command produces:

    su:*:0daemon:*:1bin:*:2sys:*:3adm:*:4pierre:*:200joan:*:202
  3. [from   of ebook   by Hamish Whittal] Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats
 
The cut command has the ability to cut out characters or fields. cut uses delimiters.   
The cut command uses delimiters to determine where to split fields, so the first thing we need to understand about cut is how it determines what its delimiters are. By default, cut's delimiters are stored in a shell variable called IFS (Input Field Separators).
Typing:
set | grep IFS
will show you what the separator characters currently are; at present, IFS is either a tab, or a new line or a space.
Looking at the output of our free command, we successfully separated every field by a space (remember the tr command!)
Similarly, if our delimiter between fields was a comma, we could set the delimiter within cut to be a comma using the -d switch:
cut -d ","
The cut command lets one cut on the number of characters or on the number of fields. Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4
Why do you need to set -d " " even when IFS already specifies that a spaces is a IFS ?
If this does not work on your system, then you need to set the IFS variable.
Detour:
Setting shell variables is easy. If you use the bash or the Bourne shell (sh), then:
IFS=" /t/n"
In the csh or the ksh, it would be:
setenv IFS=" /t/n"
That ends this short detour.
At this point, it would be nice to save the output to a file. So let's append this to a file called mem.stats:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats
Every time you run this particular command it should append the output to the mem.stats file.
The -f switch allows us to cut based upon fields. If we were wanting to cut based upon characters (e.g. cut character 6-13 and 15, 17) we would use the -c switch.
To affect the above example:
free | tr -s ' ' | sed '/^Mem/!d' | cut -c6-13,15,17 >> mem.stats
First Example in stages:
1. For the next example I'd like you to make sure that you've logged on as a user (potentially root) on one of your virtual terminals.
How do you get to a virtual terminal? Ctrl-Alt plus F1 or F2 or F3 etcetera.
It should prompt you for a username and a password. Log in as root, or as yourself or as a different user and once you've logged in, switch back to your X terminal with Alt-F7. If you weren't working on X at the beginning of this session, then the Ctrl + Alt + F1 is not necessary. A simple Alt + F2 would open a new terminal, to return to the first terminal press Alt+F1.
2. Run the who command:
who
This will tell us who is logged on to the system. We could also run the w command:
w
This will not only tell us who is logged on to our system, but what they're doing. Let's use the w command, since we want to save information about what users are doing on our system. We may also want to save information about how long they've been idle and what time they logged on.
3. Find out who is logged on to your system. Pipe the output of the w command into the input of cut. This time however we're not going to use a delimiter to delimit fields but we're going to cut on characters. We could say:
w | cut -c1-8
This tells the cut command the first eight characters. Doing this you will see that it cuts up until the first digit of the second. So in my case the time is now
09:57:24
and it cuts off to
09:57:2
It also cuts off the user. So if you look at this, you're left with USER and all the users currently logged onto your system. And that's cutting exactly 8 characters.
4. To cut characters 4 to 8?
w | cut -c4-8
This will produce slightly bizarre-looking output.
So cut cannot only cut fields, it can cut exact characters and ranges of characters. We can cut any number of characters in a line.
Second Example in stages:
Often cutting characters in a line is less than optimal, since you never know how long your usernames might be. Really long usernames would be truncated which clearly would not be acceptable. Cutting on characters is rarely a long-term solution.. It may work because your name is Sam, but not if your name is Jabberwocky!
1. Let's do a final example using cut. Using our password file:
cat /etc/passwd
I'd like to know all usernames on the system, and what shell each is using.
The password file has 7 fields separated by a ':'. The first field is the login username, the second is the password which is an x (because it is kept in the shadow password file), the third field is the userid, the fourth is the group id, the fifth field is the comment, the sixth field is the users home directory and the seventh field 7 indicates the shell that the user is using. I'm interested in fields 1 and 7.
2. How would we extract the particular fields? Simple:[]
cat /etc/passwd |cut -d: -f1,7 cut -d -f1,7 cut -d" " -f 1,7
If we do this, we should end up with just the usernames and their shells. Isn't that a nifty trick?
3. Let's pipe that output to the sort command, to sort the usernames alphabetically:
cat /etc/passwd | cut -d: -f1,7 | sort
Third example in stages
So this is a fairly simple way to extract information out of files. The cut command doesn't only work with files, it also works with streams. We could do a listing which that would produce a number of fields. If you recall, we used the tr command earlier to squeeze spaces.
ls -al
If you look at this output, you will see lines of fields. Below is a quick summary of these fields and what they refer to.
field number
indication of
1
permissions of the file
2
number of links to the file
3
user id
4
group id
5
size of the file
6
month the file was modified
7
day the file was modified
8
time the file was modified
9
name of the file
I'm particularly interested in the size and the name of each file.
1. Let's try and use our cut command in the same way that we used it for the password file:
ls -al | cut -d' ' -f5,8
The output is not as expected. Because it is using a space to look for separate fields, and the output contains tabs. This presents us with a bit of a problem.
2. We could try using a /t (tab) for the delimiter instead of a space, however cut only accepts a single character (/t is two characters). An alternative way of inserting a special character like tab is to type Ctrl-v then hit the tab key.
^v +
That would replace the character by a tab.
ls -al | cut -d" " -f5,8
That makes the delimiter a tab. But, we still don't get what we want, so let's try squeezing multiple spaces into a single space in this particular output. Thus:
ls -la | tr -s ' ' | cut -d' ' -f5,8
3. And hopefully that should now produce the output we're after. If it produces the output we're after on your system, then we're ready for lift-off. If it doesn't, then try the command again.
Now what happens if we want to swap the name with the size? I'll leave that as an exercise for you.
Exercises:
Using the tr and the cut commands, perform the following:
Obtain the mount point, the percentage in use and the partition of that mount of you disk drive to produce the following:
/dev/hdb2 80% /home
Replace the spaces in your output above by colons (:)
Remove the /dev/shm line
Keep all output from the running of this command for later use.
As root, make the following change:[]
chmod o+r /dev/hda
Now, obtain the Model and Serial Number of your hard disk, using the command hdparm.
Obtain the stats (reads and writes etc.) on your drive using the iostat command, keeping the output as a comma separated value format file for later use

转载地址:http://onpvb.baihongyu.com/

你可能感兴趣的文章
彻底理解ThreadLocal
查看>>
localhost与127.0.0.1的区别
查看>>
windows下的host文件在哪里,有什么作用?
查看>>
操作系统之字符集
查看>>
OSI和TCP/IP
查看>>
Redis集群搭建最佳实践
查看>>
ZooKeeper原理及使用
查看>>
Zookeeper集群搭建
查看>>
利用TypePerf.exe查看性能
查看>>
分布式框架Dubbo
查看>>
解决PKIX:unable to find valid certification path to requested target 的问题
查看>>
hibernate.cfg.xml配置详解
查看>>
hibernate+proxool的数据库连接池配置方法
查看>>
eclipse中java项目转成Web项目
查看>>
Java项目svn的迁移
查看>>
Java 编程中异常处理的最佳实践
查看>>
Java异常处理机制
查看>>
Java:回调机制
查看>>
axis2创建web service
查看>>
Axis,axis2,Xfire以及cxf对比
查看>>