Definition

  • Regular expressions (regex) are a powerful tool in Python that are used for matching desired patterns in strings/text.

Features

  • Regular expressions (regex) allow us to search, match, and manipulate strings based on patterns. 
  • We can test our regex patterns using tools like regex101.com.

‘re’ Package/Module Functions 

  • Python provides the ‘re‘ (regular expression) module/package, which contains several functions for working with regular expressions.
  • To use regular expressions in Python, the re module is imported as below :-

import re

  • Some common re module’s functions are as follows : –
    • re.search(pattern, string) : Searches for a pattern anywhere in a string. Returns a match object if found, or None if not.
    • re.match(pattern, string) : Searches for a pattern only at the beginning of the string. Returns a match object if found, or None if not.
    • re.findall(pattern, string) : Finds all occurrences of the pattern in the string and returns them as a list. In other words, it returns all non-overlapping matches of a pattern as a list.
    • re.finditer(pattern, string) : Returns an iterator yielding match objects for all matches.
    • re.sub(pattern, string)Replaces occurrences of a pattern with parts of the string that match  with a replacement/given string.
    • re.split(pattern, string) : Splits the string at each point where the pattern matches in a sentence/paragraph of text.

    Terms Used in Regular Expression

    • Pattern : Pattern is a sequence of characters that defines a search pattern.
    • Metacharacters : It is a special character having specific meanings used in regex (e.g., ., *, +, ?, [], {}, ()).
    • Quantifiers : This term specifies how many times a character or group should occur (e.g., *, +, ?, {n})in a word/sentence.
    • Character Classes : This term defines the sets of characters (e.g., [a-z], [0-9], \d, \w).
    • Anchors : This term defines the positions in the string (e.g., ^ for start, $ for end).
    • Groups : This term captures parts of the matched text from a word/sentence using parentheses ().

      Flags in Regex

      • Flags modify regex behavior.
      • Some common flags used in regular expressions are as follows:-
      re.IGNORECASE (re.I) → Case-insensitive matching
      re.MULTILINE (re.M) → ^ and $ match start and end of each line
      re.DOTALL (re.S) → . matches newlines too

      Common Regex Patterns:

      Regex Symbols Descriptions
      r Raw Values
      \d Matches any digit only from the given range(0-9).
      \D Matches any non-digit character.
      \w Matches only word character (a-z, A-Z, 0-9, _ ) but does not match spaces or special symbols like + – @ # $ etc.
      \W Matches any non-word character.
      \s Matches any whitespace character (space, tab, newline).
      \S Matches any non-whitespace character.
      . Matches any character except a newline.
      * Matches 0 or more occurrences of the preceding (before/back/behind of * sign) pattern.
      + Matches 1 or more occurrences of the preceding(before/back/behind of + sign) pattern.
      ? Matches 0 or 1 occurrence of the preceding(before/back/behind of ? sign) pattern.
      {n} Matches exactly n occurrences of the preceding(before/back/behind) pattern.
      {n,} Matches n or more occurrences of the preceding(before/back/behind of) pattern.
      {n,m} Matches between n and m occurrences of the preceding(before/back/behind of) pattern.
      ^ Matches the start/beginning of the string.
      $ Matches the end of the string.
      [...] Matches any single character in the brackets (e.g., [a-z] for lowercase).
      [^...] Matches any single character NOT in the brackets.
      (...) Group patterns and capture the matched text.
      ` Acts as an OR operator 
      ( ) Grouping

      Loading

      Categories: Python Theory

      0 Comments

      Leave a Reply

      Your email address will not be published. Required fields are marked *

      This site uses Akismet to reduce spam. Learn how your comment data is processed.