( ) - the beginning and end of the group, for example (text). Mean a sequence. They are used to apply quantifiers not to one character, but to several, as well as to further use the found sequence. [ ] - beginning and end of character class description, for example [az]. The character class returns one character from a set. Repeaters can change this. { } - start and end of repeaters, eg {3,8} \ - escape character, accept metacharacter as a regular character, eg (\\, \., \[, \], \{, \}, \*). ^ - start of line (or start of text in multiline texts with (?m) flag), e.g. ^text text$ $- end of line (or end of text in multiline texts with (?m) flag), for example ^text text$ . - any character except line break @LF (default). With flag (?s) - any character | - the "or" character, usually within a group, eg (10|20) ? - the previous character is either available or not available, similarly for groups. After the repeat character - pattern greed - (.*?) * - repeat the previous character or group 0 or more times + - repeat the previous character or group 1 or more times
Metacharacters inside square brackets
The part of the pattern enclosed in square brackets is called the character class. Inside brackets, metacharacters lose their special meaning, except for metacharacters belonging to this class. Only 4 characters \ - ] [ need to be escaped. If the "-" character is at the end of the enumeration, it does not require escaping. The pattern can use range metacharacters, but not boundary metacharacters, such as \A, \B, \Z, \z, and the \b metacharacter means the backspace character 'backspace'. Note that ranges such as [a-z] use a UTF-8 sequence, not ASCII.
\ - escape character ^ - exclusion character, but in the case when it comes first, for example [^3] all but three - - spanning character, for example [az], i.e. all characters from a to z [ ]- beginning and end of character class description, for example [az]
wildcard metacharacters
\1 - \9 - link to the found group in the template itself and in the replacement template. Counting groups from the left by the opening bracket "(" $1 - $9 - reference to the found group in the replacement pattern $0 or \0 - the entire search pattern or all groups (9 is not a limit) \a - Chr(7) - a character with a decimal ASCII code 7 (bell) Plays a beep on output BEL (hex 07) \cn is the control character that is generated when pressing the key combination Ctrl+n, where n is a character, e.g. \cD corresponds to Ctrl+D. \cA = \001 , \cZ = \032, \cM = \r = \015 \e - Chr(27) - escape character (hex 1B) \f - Chr(12) page break (hex 0C) \h- [ \t] - any horizontal space, tab - Chr(9), Chr(32), Chr(160) \H - [^\h] - any character that is not a space or tab \K - to the left of \K previous match, i.e. text1 \K text2 , find text2 preceded by text1 . \n - @LF, Chr(10) - newline character (hex 0A) \N - [^\n] Any character that is not a newline character (not @LF). Doesn't work in 3.3.6.1 \Q ... \E - any metacharacters between \Q and \E are treated as text. Don't exclude errors: \QD:\Edit\1.txt\E \r - @CR, Chr(13) - carriage return character (hex 0D) \R- [\n\f\r\v] Chr(10), Chr(11), Chr(12), Chr(13) any of the line break characters \t - @TAB, Chr(9) tab character - tab ( hex 09) \v - [\r\n\f] Chr(10), Chr(11), Chr(12), Chr(13) vertical tab (@CR and @LF and page break) \V - [^ \v] - any character that is not Chr(10), Chr(11), Chr(12), Chr(13) vertical tab (line break) \x** - where * is any hexadecimal digit, e.g. \x41 matches latin letter 'A', \x50\x65\x72\x6C - Perl word \x{**..} - where * is any hexadecimal digit, for example \x{50}\x{65}\x{72}\ x{6C} is a Perl word. Try from \x{01} to \x{7F}, which in decimal means characters from 1 to 127. Or in UTF encoding \x{044F} is equal to the character "i" \*** - where * is any octal digit. For example, the sequence \120\145\162\154 represents a Perl word (\120 is the octal code of the letter P, \145 is the letter e, \162 is the letter r, \154 is the letter l). Space - \040. Try \001 to \177, which in decimal means characters from 1 to 127
Metacharacters for specifying character groups
\d - [0-9] - any decimal digit \D - [^0-9] any non-digit \s - [\f\n\r\t\v ] - empty character: Chr(9), Chr( 10), Chr(12), Chr(13), Chr(32) (page break, tab, carriage return, line feed, and space). \S - [^\f\n\r\t\v ] - any non-whitespace character \w - [0-9a-zA-Z_] - any alphanumeric character or underscore (Latin characters only) \W - [ ^0-9a-zA-Z_] - any non-word character
Character borders
\A - the beginning of the text, does not depend on the flag "(?m)" and therefore can occur only 1 time. \G - similar to \A , but multiple times if found consecutively from start. \ z - absolute end of text , does not depend on the flag "(?m)" and therefore can only occur once at the end of the line, does not depend on the "(?m)" flag and therefore can only occur once. \b - the beginning or end of a word, i.e. the boundary between characters, one of which satisfies \W, and the other - satisfies \w (only in English texts) \B
- the middle of a word, i.e. the boundary between characters both of which satisfy \W or both of which satisfy \w
Flags modifiers
Placed at the beginning of a regular expression or group.
The state of modifiers is off by default, so you need to enable it to use it.
Usage example: (?i)(Text) or ((?-i)Text), it is allowed to combine (?is)(Text) or ((?imsx)Text)
(?i) - case insensitive. This only works for Latin characters. (?-i) - cancels the previously included (?i) (?m) - in multiline text, the symbols ^ and $ mean the beginning and end of the line, respectively, for example ^(Text)\r$, otherwise the beginning and end of the text. LF character - line separator (?-m) - overrides previously included (?m) (?s) - dot character (.) additionally includes line break LF ("single line" mode) (?-s) - overrides the previously included (?s) (?x) - ignores spaces and tabs in the regular expression, except those in square brackets. The spaces make the regular expression easy to read. Allows at the end of reg. vyp. add a comment after the # symbol (?-x) - overrides the previously included (?x) (?J) - allow duplicate names (allows duplicates/double names). (?U) - invert greedy quantifiers (?-U) - overrides previously included (?U)
Group Flags
It is allowed to combine (?im-sx:Text) , sx flags are off
(?i:...) - group, case insensitive, for example (?i:Text) . This only works for Latin characters. (?-i:...) - the group is case-sensitive, for example (?-i:Text) (?:...) - excludes the group from the found ones, for example (?:Text) (?>...) - a group not included in the search, but has a super-greedy quantifier property, for example (?>Text)(Text)
These 4 groups have a fixed length, you cannot use * , + , {n, m} (?=...) in them - group not included in the search, but checking pattern match on the right , for example (Text)(?=Text) (?!...) - a group that is not included in the search, but checksthe pattern does not match on the right , for example (Text)(?!Text) (?<=... ) - a group not included in the search, but checkingpattern matching on the left , for example (?<=Text)(Text) , see also \K (?<!...) - a group not included in the search, but checking for non-matching pattern left , e.g. (?<!Text)(Text)
(?<name>...)- named link. Calling a named link \k<name> is the same as calling \1 or $1 (?#...) - a group containing a comment, for example (?# is a comment ). Completely ignored by the interpreter
Repeat previous element, applies to characters and groups (quantifiers)
{n} - repeat the previous character n times {n,} - repeat the previous character n or more times ( {n,}? - preferably the smallest capture) {n, m} - repeat the previous character n to m times ( {n, m}? - preferably the smallest capture) * - repeat the previous character 0 or more times. Same as {0,} . The largest grip that will match the rest of the pattern. + - repeat the previous character 1 or more times. Same as {1,} . The largest grip that will match the rest of the pattern. ? - the previous character is either present or not present. Same as {0,1}. The second meaning of the symbol ? after the repeat character .*? - greed, see below *? - repeat the previous character 0 or more times. Will be limited to the smallest grip that will allow the rest of the pattern to match. +? - repeat the previous character 1 or more times. Will be limited to the smallest grip that will allow the rest of the pattern to match. ?? - preferably the smallest capture, e.g. ([az]??)g for 'gg' returns two empty strings
Jealous or over-greedy quantification
Capture without returning to previous search steps. Captures anything that matches the previous character, without caring about matching the rest of the pattern. The character or range of characters following a supergreedy metacharacter must not be consumed by its range, otherwise such a pattern will never be found and is meaningless. The sole purpose of the overgreedy metacharacter is to speed up capture.
*+ - repeat the previous character 0 or more times. ++ - repeat the previous character 1 or more times. {n,}+ - repeat the previous character n or more times.
POSIX character classes
Example [[:upper:]]{2} - searches for repeated uppercase letters. Invert the range like this: [[:^digit:]]
[:alnum:] - letters and numbers [0-9A-Za-z] (like \w, but without "_") [:alpha:] - letters [A -Za-z] (without "_") [:ascii:] - characters from Chr(0) to Chr(127) [:blank:] - space and tab character Chr(9) and Chr(32), same as [\t ] [:cntrl:] - control characters from Chr(0) to Chr(31) and Chr(127) [:digit:] - decimal digits, same as \d, [0-9] [:graph: ] - the same as the characters displayed when printing [:print:], but except for the space (from Chr(33) to Chr(126) ) [:lower:] - uppercase letters [az] [:print:]- printable characters, including space (Chr(32) to Chr(126) ) [:punct:] - printable characters, excluding letters and numbers Chr=(33-47, 58-64, 91- 96, 123-126), those that are neither in [:alnum:] nor in [:cntrl:] [:space:] are whitespace characters (like \s, but including the VT character: Chr(11) ) from Chr(9) to Chr(13) and Chr(32). Same as [\f\n\r\t\v ]
[:upper:] - uppercase letters [AZ] [:word:] - word characters, same as \w [:xdigit:] - hexadecimal digits [0-9A -fa-f]
Conditional subpatterns
(?(if)then) - e.g. (?(?=[az])\d), (?(condition)pattern on_success) (?(if)then|else) - e.g. (?(?<=\d)a |b) or (?:(?>(?=[^az]*[az])())?(?:(?=\1)aa|(?!\1)1)), (?( condition) pattern_on_success|pattern_on_failure) (?=[\w]+)| (?R) - recursive call
Эти флаги не действуют в AutoIt3
\p любой символ пунктуации
\l - означает, что следующий символ регулярного выражения преобразуется в нижний регистр.
\u - означает, что следующий символ регулярного выражения преобразуется в верхний регистр.
\L...\Е - означает, что все символы в регулярном выражении между \L и \Е преобразуются в нижний регистр.
\U...\Е - означает, что все символы в регулярном выражении между \U и \Е преобразуются в верхний регистр.
\x - любой шестнадцатеричный символ
\< - начало слова, т. е. граница между символом, удовлетворяющим \W и символом, удовлетворяющим \w
\> - конец слова, т. е. граница между символом, удовлетворяющим \w и символом, удовлетворяющим \W
{,n} - повторить предыдущий символ от 0 до n раз
Примеры конструкций
.* - repetition of any character, which means the whole text [ ... ] - a single character of the set, for example [aeiou] - any of the lowercase vowels [^ ... ] - none of the characters of the set, for example [^aeiou] - neither one of the lowercase vowels [0-9A-Fa-f]{6} - Hexadecimal number, for example FF0000. [А-яЁё] - Range for Russian letters. Or like this [А-Яа-яЁё] (\r\n|\r|\n){2,} , replace with \1 - deleting empty lines (?<![А-яЁё])([А-яЁё] +) \1 , replace with \1 - remove repeated words [A-ZА-ЯЁЁ]{2,}?[a-za-yaё]+ - will detect files that have errors like "FIND" - not intentional repetition of the capital letters (.{35,}?[ ])(.*?), replace with '$0' & @CRLF - where @CRLF is a line break character. Check the "Calculate" flag. Carry out a line break at the boundary of the first whitespace after every 35 characters. (?si)(?:.*?)?(https?:\/\/[\w.:]+\/?(?:[\w\/?&=.~;\-+!*_ #%])*) - find links [A-Za-z0-9._-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}) - find mailboxes