Python Compile Regex Pattern using re.compile()

Posted in /  

Python Compile Regex Pattern using re.compile()

Vinay Khatri
Last updated on September 29, 2022

    With the help of the Python re.compile() method, we can compile a string pattern to a regex pattern object. Later we can apply the methods like match() , search() , find() , findall() , and finditer() on that regex pattern object to find in the targeted string pattern.  If we define the re.compile() method in simple terms, we can say that the re.compile() method converts a string into a regex pattern object that supports methods to find whether a specific string pattern is present in thee the targeted string or not.

    In this Python tutorial, we will discuss the re.compile() method in detail with the help of some examples. So let’s get started with How to use the Python re.compile() method

    How to use the Python re.compile() method

    The compile() method comes under Python’s re module so to use it, we first need to import the re module in our Python script.

    The syntax of the re.compile() Method

    import re
    
    re.compile(pattern, flag=0)

    Arguments The re.compile() method can accept two arguments

    • Pattern: Pattern is the string data value that we want to convert into the regex object.
    • Flag: Flag is the regex flag that defines the behavior of the regex object. It is an optional argument whose default value is 0 means no flag is raised on the object. The flag value can be a regex flag such as re.I mean ignore the case. So whenever we find a match pattern on the regex object, which has a flag of re.I it will ignore the case and match the pattern.

    How to compile a regular expression in Python

    Step 1: import the re module

    The first step is to import the re module in the script

    import re

    Step 2: Write the regular expression pattern in raw string

    To write the regular expression pattern in raw string, we use the r prefix. Example

    pattern =re.compile(r"\+91[0-9]{10}")

    Step 3: Compile the string pattern using re.compile method.

    The re.compile() method compiles the string pattern to a regular expression object and returns it.

    regex_obj = re.compile(pattern) 

    Step 4: Match the regular expression object with the targeted string.

    target_string = "My Mobile No +917364728264"
    result = regex_obj.match(target_string)

    Example: How to compile the regular expression

    Now let’s write a Python script that demonstrates the typical example case of re.compile() method. Let’s say we have a string that contains many mobile numbers of users from various countries. And we only need to match and extract those Indian mobile numbers whose country code starts from +91.

    So for this, we will use the regular expression pattern

    r"\+91[0-9]{10}"

    • In the pattern, \+ means escape the + symbol and treat it as a character, not a special symbol.
    • 91 means the match string must contain 91.
    • And [0-9]{10} means the number can be between 0 to 9(inclusive), and it can occur precisely ten times.

    Example

    import re
    
    #regular expression pattern for mobile number starts with +91
    pattern = r"\+91[0-9]{10}"
    
    #targeted string
    string = "+918748374837, +998473847374, +894848374373, +918374759273, +928473658294"
    
    #create a regex object of the pattern
    regex_obj = re.compile(pattern)
    
    #find all the patterns that matched in the targeted string
    result = regex_obj.findall(string)
    print(result)

    Output

    ['+918748374837', '+918374759273']

    Behind the code

    In the above example, we define the regular string expression pattern , which can match all the 10-digit mobile numbers starting with +91 . Then we also define an identifier targeted string string from which we want to match the pattern and extract the information.

    The re.compile(pattern) statement creates a regular expression object out of the specified pattern. And the findall() method returns a list of all the patterns from the string that match the specified pattern. The above program is equivalent to

    import re
    
    #regular expression pattern for mobile number starts with +91
    pattern = r"\+91[0-9]{10}"
    
    #targeted string
    string = "+918748374837, +998473847374, +894848374373, +918374759273, +928473658294"
    
    #find all the patterns that matched in the targeted string
    result = re.findall(pattern, string)
    
    print(result)

    Output

    ['+918748374837', '+918374759273']

    If we get the same result without even compiling the pattern to a regex object, why use the compile() method? Let’s discuss it.

    Why and when to re.compile() method

    By compiling a regular expression pattern, we get an object, and it acts like an identifier data that we can use anywhere in the program when we want to match the same patterns. The compiled object comes in very handy, and it works efficiently when we have to use the same pattern matching multiple times.

    Save Time and increase the readability of the code

    With compile() method we can initialize a regular expression object of a pattern, and if we need to match the same pattern multiple times in a program, we can do that with ease.

    Example

    import re
    
    #regular expression pattern for mobile number starts with +91
    pattern = r"\+91[0-9]{10}"
    
    #targeted strings
    student1 = "Rohan: +917363536475"
    student2 = "Rahul: +917468890483, +919846274959"
    student3 = "Aman: +917483659048, +918754627894"
    
    #create a regex object of pattern
    find_mobile = re.compile(pattern) 
    
    #find all the patterns that matched in the targeted string
    print("Rohan Mobile Numbers:", find_mobile.findall(student1))
    print("Rahul Mobile Numbers:", find_mobile.findall(student2))
    print("Aman Mobile Numbers:", find_mobile.findall(student3))

    Output

    Rohan Mobile Numbers: ['+917363536475']
    Rahul Mobile Numbers: ['+917468890483', '+919846274959']
    Aman Mobile Numbers: ['+917483659048', '+918754627894']

    Conclusion

    The most prominent reason why we use the re.compile() method is to initialize a regular expression object pattern that we are going to use multiple times in our program. By compiling the pattern once we reduce the possibility of typos or mistakes, we can commit while writing the program. Just by compiling a pattern once, we do not need to worry about writing the pattern again and again and checking it multiple times. With the appropriate object name, we do not have to worry about what this pattern object is supposed to match.

    People are also reading:

    Leave a Comment on this Post

    0 Comments