Definition
- Regular expressions (regex) are a powerful tool in Python that are used for matching desired patterns in strings/text.
Features
- Regular expressions (regex) allow us to search, match, and manipulate strings based on patterns.
- We can test our regex patterns using tools like regex101.com.
‘re’ Package/Module Functions
- Python provides the ‘re‘ (regular expression) module/package, which contains several functions for working with regular expressions.
- To use regular expressions in Python, the
remodule is imported as below :-
import re
- Some common re module’s functions are as follows : –
- re.search(pattern, string) : Searches for a pattern anywhere in a string. Returns a match object if found, or None if not.
- re.match(pattern, string) : Searches for a pattern only at the beginning of the string. Returns a match object if found, or None if not.
- re.findall(pattern, string) : Finds all occurrences of the pattern in the string and returns them as a list. In other words, it returns all non-overlapping matches of a pattern as a list.
- re.finditer(pattern, string) : Returns an iterator yielding match objects for all matches.
- re.sub(pattern, string) : Replaces occurrences of a pattern with parts of the string that match with a replacement/given string.
- re.split(pattern, string) : Splits the string at each point where the pattern matches in a sentence/paragraph of text.
Terms Used in Regular Expression
-
Pattern : Pattern is a sequence of characters that defines a search pattern.
-
Metacharacters : It is a special character having specific meanings used in regex (e.g.,
.,*,+,?,[],{},()). -
Quantifiers : This term specifies how many times a character or group should occur (e.g.,
*,+,?,{n})in a word/sentence. -
Character Classes : This term defines the sets of characters (e.g.,
[a-z],[0-9],\d,\w). -
Anchors : This term defines the positions in the string (e.g.,
^for start,$for end). -
Groups : This term captures parts of the matched text from a word/sentence using parentheses
().
Flags in Regex
- Flags modify regex behavior.
- Some common flags used in regular expressions are as follows:-
re.IGNORECASE (re.I) → Case-insensitive matching
re.MULTILINE (re.M) → ^ and $ match start and end of each line
re.DOTALL (re.S) → . matches newlines too
Common Regex Patterns:
| Regex Symbols | Descriptions |
r |
Raw Values |
\d |
Matches any digit only from the given range(0-9). |
\D |
Matches any non-digit character. |
\w |
Matches only word character (a-z, A-Z, 0-9, _ ) but does not match spaces or special symbols like + – @ # $ etc. |
\W |
Matches any non-word character. |
\s |
Matches any whitespace character (space, tab, newline). |
\S |
Matches any non-whitespace character. |
. |
Matches any character except a newline. |
* |
Matches 0 or more occurrences of the preceding (before/back/behind of * sign) pattern. |
+ |
Matches 1 or more occurrences of the preceding(before/back/behind of + sign) pattern. |
? |
Matches 0 or 1 occurrence of the preceding(before/back/behind of ? sign) pattern. |
{n} |
Matches exactly n occurrences of the preceding(before/back/behind) pattern. |
{n,} |
Matches n or more occurrences of the preceding(before/back/behind of) pattern. |
{n,m} |
Matches between n and m occurrences of the preceding(before/back/behind of) pattern. |
^ |
Matches the start/beginning of the string. |
$ |
Matches the end of the string. |
[...] |
Matches any single character in the brackets (e.g., [a-z] for lowercase). |
[^...] |
Matches any single character NOT in the brackets. |
(...) |
Group patterns and capture the matched text. |
| ` | Acts as an OR operator |
| ( ) | Grouping |
![]()
0 Comments