Definition

  • Regular expressions (regex) are a powerful tool in Python that are used for matching desired patterns in strings/text.

Features

  • regex allow to search, match, and manipulate strings based on patterns. 
  • We can test our regex patterns using tools like regex101.com.

‘re’ Package/Module Functions 

  • Python provides the ‘re‘ (regular expression) module/package, which contains several functions for working with regular expressions.
  • To use regular expressions in Python, the re module is imported as below :-

import re

  • Some common re module’s functions are as follows : –
    • re.search(pattern, string) : Searches for a pattern anywhere in a string. Returns a match object if found, or None if not.
    • re.match(pattern, string) : Searches for a pattern only at the beginning of the string. Returns a match object if found, or None if not.
    • re.findall(pattern, string) : Finds all occurrences of the pattern in the string and returns them as a list. In other words, it returns all non-overlapping matches of a pattern as a list.
    • re.finditer(pattern, string) : Returns an iterator yielding match objects for all matches.
    • re.sub(pattern, string)Replaces occurrences of a pattern with parts of the string that match  with a replacement/given string.
    • re.split(pattern, string) : Splits the string at each point where the pattern matches in a sentence/paragraph of text.

    Terms Used in Regular Expression

    • Pattern : Pattern is a sequence of characters that defines a search pattern.
    • Metacharacters : It is a special characters having specific meanings used in regex (e.g., ., *, +, ?, [], {}, ()).
    • Quantifiers : This term specifies how many times a character or group should occur (e.g., *, +, ?, {n})in a word/sentence.
    • Character Classes : This term defines the sets of characters (e.g., [a-z], [0-9], \d, \w).
    • Anchors : This term defines the positions in the string (e.g., ^ for start, $ for end).
    • Groups : This term captures parts of the matched text from a word/sentence using parentheses ().

    Regex Symbols 

    • .  = Matches any character except a newline.
    • ^ = Matches the beginning of the string.
    • $ = Matches the end of the string.
    • [] = Matches any single character inside the brackets (e.g., [a-z]).
    • \d = Matches any digit (equivalent to [0-9]).
    • \w = Matches any word character (letters, digits, and underscores).
    • + = Matches one or more of the preceding element.
    • * = Matches zero or more of the preceding element.
    • ? = Matches zero or one of the preceding element.

    Flags in Regex

    • Flags modify regex behavior.
    • Some common flags used in regular expression are follows :-
    re.IGNORECASE (re.I) → Case-insensitive matching
    re.MULTILINE (re.M) → ^ and $ match start and end of each line
    re.DOTALL (re.S) → . matches newlines too

    Common Regex Patterns:

    Patterns Descriptions
    r Raw Values
    \d Matches any digit (0-9).
    \D Matches any non-digit character.
    \w Matches any word character (a-z, A-Z, 0-9, _).
    \W Matches any non-word character.
    \s Matches any whitespace character (space, tab, newline).
    \S Matches any non-whitespace character.
    . Matches any character except a newline.
    * Matches 0 or more occurrences of the preceding pattern.
    + Matches 1 or more occurrences of the preceding pattern.
    ? Matches 0 or 1 occurrence of the preceding pattern.
    {n} Matches exactly n occurrences of the preceding pattern.
    {n,} Matches n or more occurrences of the preceding pattern.
    {n,m} Matches between n and m occurrences of the preceding pattern.
    ^ Matches the start of a string.
    $ Matches the end of a string.
    [...] Matches any single character in the brackets (e.g., [a-z] for lowercase).
    [^...] Matches any single character NOT in the brackets.
    (...) Groups patterns and captures the matched text.
    ` Acts as an OR operator 
    ( ) Grouping

    Loading

    Categories: Python Theory

    0 Comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.