Definition
- Regular expressions (regex) are a powerful tool in Python that are used for matching desired patterns in strings/text.
Features
- regex allow to search, match, and manipulate strings based on patterns.
- We can test our regex patterns using tools like regex101.com.
‘re’ Package/Module Functions
- Python provides the ‘re‘ (regular expression) module/package, which contains several functions for working with regular expressions.
- To use regular expressions in Python, the
re
module is imported as below :-
import re
- Some common re module’s functions are as follows : –
- re.search(pattern, string) : Searches for a pattern anywhere in a string. Returns a match object if found, or None if not.
- re.match(pattern, string) : Searches for a pattern only at the beginning of the string. Returns a match object if found, or None if not.
- re.findall(pattern, string) : Finds all occurrences of the pattern in the string and returns them as a list. In other words, it returns all non-overlapping matches of a pattern as a list.
- re.finditer(pattern, string) : Returns an iterator yielding match objects for all matches.
- re.sub(pattern, string) : Replaces occurrences of a pattern with parts of the string that match with a replacement/given string.
- re.split(pattern, string) : Splits the string at each point where the pattern matches in a sentence/paragraph of text.
Terms Used in Regular Expression
-
Pattern : Pattern is a sequence of characters that defines a search pattern.
-
Metacharacters : It is a special characters having specific meanings used in regex (e.g.,
.
,*
,+
,?
,[]
,{}
,()
). -
Quantifiers : This term specifies how many times a character or group should occur (e.g.,
*
,+
,?
,{n}
)in a word/sentence. -
Character Classes : This term defines the sets of characters (e.g.,
[a-z]
,[0-9]
,\d
,\w
). -
Anchors : This term defines the positions in the string (e.g.,
^
for start,$
for end). -
Groups : This term captures parts of the matched text from a word/sentence using parentheses
()
.
Regex Symbols
- . = Matches any character except a newline.
- ^ = Matches the beginning of the string.
- $ = Matches the end of the string.
- [] = Matches any single character inside the brackets (e.g., [a-z]).
- \d = Matches any digit (equivalent to [0-9]).
- \w = Matches any word character (letters, digits, and underscores).
- + = Matches one or more of the preceding element.
- * = Matches zero or more of the preceding element.
- ? = Matches zero or one of the preceding element.
Flags in Regex
- Flags modify regex behavior.
- Some common flags used in regular expression are follows :-
re.IGNORECASE (re.I) → Case-insensitive matching
re.MULTILINE (re.M) → ^ and $ match start and end of each line
re.DOTALL (re.S) → . matches newlines too
Common Regex Patterns:
Patterns | Descriptions |
r |
Raw Values |
\d |
Matches any digit (0-9). |
\D |
Matches any non-digit character. |
\w |
Matches any word character (a-z, A-Z, 0-9, _). |
\W |
Matches any non-word character. |
\s |
Matches any whitespace character (space, tab, newline). |
\S |
Matches any non-whitespace character. |
. |
Matches any character except a newline. |
* |
Matches 0 or more occurrences of the preceding pattern. |
+ |
Matches 1 or more occurrences of the preceding pattern. |
? |
Matches 0 or 1 occurrence of the preceding pattern. |
{n} |
Matches exactly n occurrences of the preceding pattern. |
{n,} |
Matches n or more occurrences of the preceding pattern. |
{n,m} |
Matches between n and m occurrences of the preceding pattern. |
^ |
Matches the start of a string. |
$ |
Matches the end of a string. |
[...] |
Matches any single character in the brackets (e.g., [a-z] for lowercase). |
[^...] |
Matches any single character NOT in the brackets. |
(...) |
Groups patterns and captures the matched text. |
` | Acts as an OR operator |
( ) | Grouping |
0 Comments