Linux grep command

Character classes and bracket expressions

A bracket expression is a list of characters enclosed by and . It matches any single character in that list; if the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression matches any single digit.

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale’s collating sequence and character set. For example, in the default C locale, is equivalent to . Many locales sort characters in dictionary order, and in these locales is often not equivalent to ; it might be equivalent to , for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are , , , , , , , , , , and . For example, ] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as . (Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most metacharacters lose their special meaning inside bracket expressions. To include a literal place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal , place it last.

Example usage

Let’s say want to quickly locate the phrase «our products» in HTML files on your machine. Let’s start by searching a single file. Here, our PATTERN is «our products» and our FILE is product-listing.html.

A single line was found containing our pattern, and grep outputs the entire matching line to the terminal. The line is longer than our terminal width so the text wraps around to the following lines, but this output corresponds to exactly one line in our FILE.

Note

The PATTERN is interpreted by grep as a regular expression. In the above example, all the characters we used (letters and a space) are interpreted literally in regular expressions, so only the exact phrase will be matched. Other characters have special meanings, however — some punctuation marks, for example. For more information, see: Regular expression quick reference.

If we use the —color option, our successful matches will be highlighted for us:

Viewing line numbers of successful matches

It will be even more useful if we know where the matching line appears in our file. If we specify the -n option, grep will prefix each matching line with the line number:

Our matching line is prefixed with «18:» which tells us this corresponds to line 18 in our file.

Performing case-insensitive grep searches

What if «our products» appears at the beginning of a sentence, or appears in all uppercase? We can specify the -i option to perform a case-insensitive match:

Using the -i option, grep finds a match on line 23 as well.

Searching multiple files using a wildcard

If we have multiple files to search, we can search them all using a wildcard in our FILE name. Instead of specifying product-listing.html, we can use an asterisk («*«) and the .html extension. When the command is executed, the shell expands the asterisk to the name of any file it finds (in the current directory) which ends in «.html«.

Notice that each line starts with the specific file where that match occurs.

Recursively searching subdirectories

We can extend our search to subdirectories and any files they contain using the -r option, which tells grep to perform its search recursively. Let’s change our FILE name to an asterisk («*«), so that it matches any file or directory name, and not only HTML files:

This gives us three additional matches. Notice that the directory name is included for any matching files that are not in the current directory.

Using regular expressions to perform more powerful searches

The true power of grep is that it can match regular expressions. (That’s what the «re» in «grep» stands for). Regular expressions use special characters in the PATTERN string to match a wider array of strings. Let’s look at a simple example.

Let’s say you want to find every occurrence of a phrase similar to «our products» in your HTML files, but the phrase should always start with «our» and end with «products». We can specify this PATTERN instead: «our.*products».

In regular expressions, the period («.«) is interpreted as a single-character wildcard. It means «any character that appears in this place will match.» The asterisk («*«) means «the preceding character, appearing zero or more times, will match.» So the combination «.*» will match any number of any character. For instance, «our amazing products«, «ours, the best-ever products«, and even «ourproducts» will match. And because we’re specifying the -i option, «OUR PRODUCTS» and «OuRpRoDuCtS will match as well. Let’s run the command with this regular expression, and see what additional matches we can get:

Here, we also got a match from the phrase «our fine products«.

Grep is a powerful tool to help you work with text files, and it gets even more powerful when you become comfortable using regular expressions.

Matching control options

-e PATTERN,—regexp=PATTERN Use PATTERN as the pattern to match. This can specify multiple search patterns, or to protect a pattern beginning with a dash ().
-f FILE, —file=FILE Obtain patterns from FILE, one per line.
-i, —ignore-case Ignore case distinctions in both the PATTERN and the input files.
-v, —invert-match Invert the sense of matching, to select non-matching lines.
-w, —word-regexp Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Or, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and underscores.
-x, —line-regexp Select only matches that exactly match the whole line.
-y The same as -i.

Выражения в квадратных скобках и Классы символов

В дополнение к совпадению любого символа в заданной позиции в нашем регулярном выражении, мы также, используя выражения в квадратных скобках, можем задать совпадение единичного символа из указанного набора символов. С выражениями в квадратных скобках мы можем указать набор символов для соответствия (включая символы, которые в противном случае были бы истолкованы как метасимволы). В этом примере, используя набор из двух символов:

grep -h 'zip' dirlist*.txt
bzip2
bzip2recover
gzip

мы найдём любые строчки, содержащие строки «bzip» или «gzip».

Набор может содержать любое количество символов, а метасимволы теряют своё специальное значение, когда помещаются внутрь квадратных скобок. Тем не менее, есть два случая в которых метасимволы, используемые внутри квадратных скобок, имеют различные значения. Первый – это каретка (^), которая используется для указания отрицания; второй – это тире (-), которое используется для указания диапазона символов.

Отрицание

Если первым символом выражения в квадратных скобках является каретка (^), то остальные символы принимаются как набор символов, которые не должны присутствовать в заданной позиции символа. Сделаем это изменив наш предыдущий пример:

grep -h 'zip' dirlist*.txt
bunzip2
gunzip
funzip
gpg-zip
mzip
p7zip
preunzip
prezip
prezip-bin
unzip
unzipsfx

С активированным отрицанием, мы получили список файлов, которые содержат строку «zip», перед которой идёт любой символ, кроме «b» или «g»

Обратите внимание, что zip не был найден. Отрицаемый набор символов всё равно требует символ на заданной позиции, но символ не должен быть членом инвертированного набора.

Символ каретки вызывает отрицание только если он является первым символом внутри выражения в квадратных скобках; в противном случае, он теряет своё специальное назначение и становится обычным символом из набора.

Традиционные диапазоны символов

Если мы хотим сконструировать регулярное выражение, которое должно найти каждый файл из нашего списка, начинающийся на заглавную букву, мы можем сделать следующее:

grep -h '^' dirlist*.txt
MAKEDEV
GET
HEAD
POST
VBoxClient
X
X11
Xorg
ModemManager
NetworkManager
VBoxControl
VBoxService

Суть в том, что мы разместили все 26 заглавных букв в выражение внутри квадратных скобок. Но мысль печатать их все не вызывает энтузиазма, поэтому есть другой путь:

grep -h '^' dirlist*.txt

Используя трёхсимвольный диапазон, мы можем сократить запись из 26 букв. Таким способом можно выразить любой диапазон символов, включая сразу несколько диапазонов, такие, как это выражение, которое соответствует всем именам файлов, начинающихся с букв и цифр:

grep -h '^' dirlist*.txt

В диапазонах символов мы видим, что символ чёрточки трактуется особым образом, поэтому как мы можем включить символ тире в выражение внутри квадратных скобок? Сделав его первым символом в выражении. Рассмотрим два примера:

grep -h '' dirlist*.txt

Это будет соответствовать каждому имени файла, содержащему заглавную букву. При этом:

grep -h '' dirlist*.txt

будет соответствовать каждому имени файла, содержащему тире, или заглавную «A», или заглавную «Z».

Классы символов POSIX

Подробнее о POSIX вы можете почитать в Википедии.

В POSIX имеются свои классы символов, которые вы можете использовать в регулярных выражениях:

Класс символов Описание
Алфавитно-цифровые символы. В ASCII эквивалентно:
То же самое, что и , с дополнительным символом подчёркивания (_).
Алфавитные символы. В ASCII эквивалентно:
Включает символы пробела и табуляции.
Управляющие коды ASCII. Включает ASCII символы с 0 до 31 и 127.
Цифры от нуля до девяти.
Видимые символы. В ASCII сюда включены символы с 33 по 126.
Буквы в нижнем регистре.
Символы пунктуации. В ASCII эквивалентно: [-!»#$%&'()*+,./:;?@_`{|}~]
Печатные символы. Все символы в плюс символ пробела.
Символы белых пробелов, включающих пробел, табуляцию, возврат каретки, новую строку, вертикальную табуляцию и разрыв страницы. В ASCII эквивалентно:
Символы в верхнем регистре.
Символы, используемые для выражения шестнадцатеричных чисел. В ASCII эквивалетно:

В этих выражениях квадратные скобки и двоеточия являются частью записи класса символов (диапазонов).

Внимание: в зависимости от настроек локали, , , и другие буквенные диапазоны могут включать буквы вашего алфавита, например, русского. Т.е

может соответствовать не , а .

Other options

—line-buffered Use line buffering on output. This can cause a performance penalty.
—mmap If possible, use the mmap system call to read input, instead of the default read system call. In some situations, —mmap yields better performance. However, —mmap can cause undefined behavior (including core dumps) if an input file shrinks while grep is operating, or if an I/O error occurs.
-U, —binary Treat the file(s) as binary. By default, under MS-DOS and MS-Windows, grep guesses the file type by looking at the contents of the first 32 KB read from the file. If grep decides the file is a text file, it strips the CR characters from the original file contents (to make regular expressions with ^ and $ work correctly). Specifying -U overrules this guesswork, causing all files to be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs at the end of each line, this causes some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows.
-z, —null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or —null option, this option can be used with commands like sort -z to process arbitrary file names.

Context line control

-A NUM,—after-context=NUM Print NUM lines of trailing context after matching lines. Places a line containing a group separator () between contiguous groups of matches. With the -o or —only-matching option, this has no effect and a warning is given.
-B NUM,—before-context=NUM Print NUM lines of leading context before matching lines. Places a line containing a group separator () between contiguous groups of matches. With the -o or —only-matching option, this has no effect and a warning is given.
-C NUM, NUM,—context=NUM Print NUM lines of output context. Places a line containing a group separator () between contiguous groups of matches. With the -o or —only-matching option, this has no effect and a warning is given.

Examples

Tip

If you haven’t already seen our section, we suggest reviewing that section first.

grep chope /etc/passwd

Search /etc/passwd for user chope.

grep "May 31 03" /etc/httpd/logs/error_log

Search the Apache error_log file for any error entries that happened on May 31st at 3 A.M. By adding quotes around the string, this allows you to place spaces in the grep search.

grep -r "computerhope" /www/

Recursively search the directory /www/, and all subdirectories, for any lines of any files which contain the string «computerhope«.

grep -w "hope" myfile.txt

Search the file myfile.txt for lines containing the word «hope«. Only lines containing the distinct word «hope» are matched. Lines where «hope» is part of a word (e.g., «hopes») are not be matched.

grep -cw "hope" myfile.txt

Same as previous command, but displays a count of how many lines were matched, rather than the matching lines themselves.

grep -cvw "hope" myfile.txt

Inverse of previous command: displays a count of the lines in myfile.txt which do not contain the word «hope».

grep -l "hope" /www/*

Display the file names (but not the matching lines themselves) of any files in /www/ (but not its subdirectories) whose contents include the string «hope«.

REGULAR EXPRESSIONS

grep understands three different versions of regular expression syntax: «basic» (BRE), «extended» (ERE) and «perl» (PRCE). In GNU grep, there is no difference
in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are
summarized afterwards. Perl regular expressions give additional functionality, and are documented in pcrepattern(3), but may not be available on every system.

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any meta-character with special meaning may be quoted by
preceding it with a backslash.

The period . matches any single character.

Character Classes and Bracket Expressions

bracket expression^not

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale’s collating sequence and character set. For
example, in the default C locale, is equivalent to . Many locales sort characters in dictionary order, and in these locales is typically not equivalent to ; it
might be equivalent to , for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

Finally, certain named classes of characters are predefined within bracket expressions, as follows. Their names are self explanatory, and they are , , , , , , , , , , and . For example, ] means the character class of numbers and letters in the current locale. In the C locale and ASCII character set encoding, this is the same as . (Note that the brackets in
these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression.) Most meta-characters lose their special meaning inside bracket expressions. To include a literal place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal — place it last.

Repetition

?
The preceding item is optional and matched at most once.
*
The preceding item will be matched zero or more times.
+
The preceding item will be matched one or more times.
{n}
The preceding item is matched exactly n times.
{n,}
The preceding item is matched n or more times.
{n,m}
The preceding item is matched at least n times, but not more than m times.

Basic vs Extended Regular Expressions

?+{|()\?\+\{\|\(\)

Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use to match a literal {.

GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification. For example, the command grep -E ‘{1’ searches for the two-character string {1 instead of reporting a syntax error in the regular expression. POSIX.2 allows this behavior as an extension, but portable
scripts should avoid it.

General output control

-c, —count Instead of the normal output, print a count of matching lines for each input file. With the -v, —invert-match option (see below), count non-matching lines.
—color[=WHEN],—colour[=WHEN] Surround the matched (non-empty) strings, matching lines, context lines, file names, line numbers, byte offsets, and separators (for fields and groups of context lines) with escape sequences to display them in color on the terminal. The colors are defined by the environment variable GREP_COLORS. The older environment variable GREP_COLOR is still supported, but its setting does not have priority. WHEN is never, always, or auto.
-L,—files-without-match Instead of the normal output, print the name of each input file from which no output would normally be printed. The scanning stops on the first match.
-l,—files-with-matches Instead of the normal output, print the name of each input file from which output would normally be printed. The scanning stops on the first match.
-m NUM,—max-count=NUM Stop reading a file after NUM matching lines. If the input is standard input from a regular file, and NUM matching lines are output, grep ensures that the standard input is positioned after the last matching line before exiting, regardless of the presence of trailing context lines. This enables a calling process to resume a search. When grep stops after NUM matching lines, it outputs any trailing context lines. When the -c or —count option is also used, grep does not output a count greater than NUM. When the -v or —invert-match option is also used, grep stops after outputting NUM non-matching lines.
-o, —only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
-q, —quiet, —silent Quiet; do not write anything to standard output. Exit immediately with zero status if any match is found, even if an error was detected. Also see the -s or —no-messages option.
-s, —no-messages Suppress error messages about nonexistent or unreadable files.

File and directory selection

-a, —text Process a binary file as if it were text; this is equivalent to the —binary-files=text option.
—binary-files=TYPE If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is binary, and grep normally outputs either a one-line message saying that a binary file matches, or no message if there is no match. If TYPE is without-match, grep assumes that a binary file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; this is equivalent to the -a option. Warning: grep —binary-files=text might output binary garbage, which can have nasty side effects if the output is a terminal and if the terminal driver interprets some of it as commands.
-D ACTION,—devices=ACTION If an input file is a device, FIFO or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read as if they were ordinary files. If ACTION is skip, devices are silently skipped.
-d ACTION,—directories=ACTION If an input file is a directory, use ACTION to process it. By default, ACTION is read, i.e., read directories as if they were ordinary files. If ACTION is skip, silently skip directories. If ACTION is recurse, read all files under each directory, recursively, following symbolic links only if they are on the command line. This is equivalent to the -r option.
—exclude=GLOB Skip files whose base name matches GLOB (using wildcard matching). A file-name glob can use *, ?, and as wildcards, and \ to quote a wildcard or backslash character literally.
—exclude-from=FILE Skip files whose base name matches any of the file-name globs read from FILE (using wildcard matching as described under —exclude).
—exclude-dir=DIR Exclude directories matching the pattern DIR from recursive searches.
-I Process a binary file as if it did not contain matching data; this is equivalent to the —binary-files=without-match option.
—include=GLOB Search only files whose base name matches GLOB (using wildcard matching as described under —exclude).
-r, —recursive Read all files under each directory, recursively, following symbolic links only if they are on the command line. This is equivalent to the -d recurse option.
-R,—dereference-recursive Read all files under each directory, recursively. Follow all symbolic links, unlike -r.

Basic vs. extended regular expressions

In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Traditional versions of egrep did not support the { metacharacter, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use to match a literal {.

GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification. For example, the command grep -E ‘{1’ searches for the two-character string {1 instead of reporting a syntax error in the regular expression. POSIX allows this behavior as an extension, but portable scripts should avoid it.

Regular expressions

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, using various operators to combine smaller expressions.

grep understands three different versions of regular expression syntax: «basic» (BRE), «extended» (ERE) and «perl» (PRCE). In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl regular expressions give additional functionality.

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.

The period (.) matches any single character.

REGULAR EXPRESSIONS

grep

understands three different versions of regular expression syntax:
«basic» (BRE), «extended» (ERE) and «perl» (PCRE).
In GNU
grep

there is no difference in available functionality between basic and
extended syntaxes.
In other implementations, basic regular expressions are less powerful.
The following description applies to extended regular expressions;
differences for basic regular expressions are summarized afterwards.
Perl-compatible regular expressions give additional functionality, and are
documented in pcresyntax(3) and pcrepattern(3), but work only if
PCRE is available in the system.

The fundamental building blocks are the regular expressions
that match a single character.
Most characters, including all letters and digits,
are regular expressions that match themselves.
Any meta-character with special meaning
may be quoted by preceding it with a backslash.

The period
.

matches any single character.
It is unspecified whether it matches an encoding error.

Character Classes and Bracket Expressions

bracket expression^not

Within a bracket expression, a
range expression

consists of two characters separated by a hyphen.
It matches any single character that sorts between the two characters,
inclusive, using the locale’s collating sequence and character set.
For example, in the default C locale,

is equivalent to
.

Many locales sort characters in dictionary order, and in these locales

is typically not equivalent to
;

it might be equivalent to
,

for example.
To obtain the traditional interpretation of bracket expressions,
you can use the C locale by setting the
LC_ALL

environment variable to the value
C.

Finally, certain named classes of characters are predefined within
bracket expressions, as follows.
Their names are self explanatory, and they are
,

,

,

,

,

,

,

,

,

,

,

and
.

For example,
]

means the character class of numbers and
letters in the current locale.
In the C locale and ASCII
character set encoding, this is the same as
.

(Note that the brackets in these class names are part of the symbolic
names, and must be included in addition to the brackets delimiting
the bracket expression.)
Most meta-characters lose their special meaning inside bracket expressions.
To include a literal

place it first in the list.
Similarly, to include a literal
^

place it anywhere but first.
Finally, to include a literal

place it last.

Repetition

?
The preceding item is optional and matched at most once.
*
The preceding item will be matched zero or more times.
+
The preceding item will be matched one or more times.
{n}

The preceding item is matched exactly
n

times.

{n,}

The preceding item is matched
n

or more times.

{,m}

The preceding item is matched at most
m

times.
This is a GNU extension.

{n,m}

The preceding item is matched at least
n

times, but not more than
m

times.

Рейтинг
( Пока оценок нет )
Понравилась статья? Поделиться с друзьями:
Мой редактор ОС
Добавить комментарий

;-) :| :x :twisted: :smile: :shock: :sad: :roll: :razz: :oops: :o :mrgreen: :lol: :idea: :grin: :evil: :cry: :cool: :arrow: :???: :?: :!: