Python Regex Replace Pattern in a string using re.sub()

Posted in

Python Regex Replace Pattern in a string using re.sub()
vinaykhatri

Vinay Khatri
Last updated on September 17, 2024

    In this Python article, you will learn how to perform the search and replace operations on a Python string using regular expressions. The regular expression in Python provides more methods and techniques to match the patterns and perform operations on a string than a simple string method.

    This tutorial discusses regular expression sub() and subn() methods that can search and replace patterns in a string. With the help of these two methods, we can search and replace one or more than one occurrence of a specific pattern from a string. By the end of this tutorial, you will build a solid understanding of the following methods.

    Methods Description
    re.sub(pattern, replacement, string) This method replaces all the occurrence of pattern from the string with replacement .
    re.sub(pattern, replacement, string, count =n) This method will replace the first n occurrences of pattern from a string with replacement.
    re.subn(pattern, replacement, string) It can also replace all the occurrence of pattern from the string with replacement, same as sub() method. But it returns a Tuple and sub() returns a string.

    Python re.sub() method

    With the help of re.sub() method we can replace or substitute a specific regular expression pattern from a string.

    How to use re.sub() method?

    In order to use re.sub() we first need to understand its syntax and the value it returns.

    re.sub() syntax:

    import re
    
    re.sub(pattern, repl, string, count = 0, flags=0)

    Arguments

    The sub() method can accept 5 arguments out of which the first three are mandatory and the other 2 are optional.

    • pattern: It is a regular expression pattern that we want to find in the targeted string.
    • repl: It can be a string or a function that we want to substitute or replace in a string.
    • If the repl is a string, the sub() method will replace all the matched pattern with the given repl string
      • If repl is a function, the sub() method will return all the matched pattern with the value returned by the function.
    • string: It is the targeted string value from where we want to replace the string pattern.
    • count: It is an integer value that defines the maximum number of pattern occurrences that needs to be replaced by the sub() method. It is an optional argument, which default value is 0 means replace all the occurrences of the pattern.
    • flags: flags arguments define the flag conditions for the replacement. It is an optional argument value which is 0 means no flag is raised. But if you want, you can raise some flags using the flags argument such as re.I for case-insensitive matching, re.A for ASCII-Only matching.

    Return value

    The sub() method return a string by replacing the specified number of occurrence from the targeted string. If the pattern is not to be found in the targeted string, the complete string is returned without any changes.

    Example 1

    Let’s say we have given a string and we need to replace all the UK, and England, and Britain with the word the United Kingdom.

    import re
    
    string = '''UK is an island country located on northwestern coast Europe. English is the main Language of England. The Capital of Britain is London'''
    
    #regular expression pattern
    pattern = 'UK|Britain|England'
    
    #string to replace
    
    repl = 'United Kingdom'
    
    #replace UK, England and Britain with United Kingdom
    replaced_string = re.sub(pattern, repl, string)
    
    print(replaced_string)

    Output

    United Kingdom is an island country located on the northwestern coast of Europe. English is the main Language of United Kingdom. The Capital of United Kingdom is London

    Example 2

    If the repl contains any escape character, sub() will process it accordingly.

    import re
    
    string = '''UK
     is an island country located on northwestern coast Europe. English is the main Language of England
    . The Capital of Britain
     is London'''
    
    #regular expression pattern
    pattern = 'UK\n|Britain\n|England\n'
    
    #string to replace
    repl = 'United Kingdom'
    
    #replace UK, England and Britain with United Kingdom
    replaced_string = re.sub(pattern, repl, string)
    
    print(replaced_string)

    Output

    United Kingdom is an island country located on northwestern coast Europe. English is the main Language of United Kingdom. The Capital of United Kingdom is London

    In the above example, in the string we have a new line after UK , England and Britain . And in the pattern we have specified that we only need to replace those UK\n|Britain\n|England\n with United Kingdom .

    Python re.sub() examples

    Let’s discuss some more examples of re.sub() method.

    Example 1: Replace all the whitespaces with underscores

    Suppose we have a string string and we need to replace all the whitespaces with underscore _. The whitespace in the string can be represented using \s escape characters.

    import re
    
    string = '  Hello World Welcome to TechgeekBuzz  '
    
    #regular expression pattern
    pattern = r'\s'
    
    #string to replace
    repl = '_'
    
    #replace all whitespace by _
    replaced_string = re.sub(pattern, repl, string)
    
    print(replaced_string)

    Output

    __Hello_World_Welcome_to_TechgeekBuzz__

    Example 2: Remove all the whitespaces from a string

    To remove all the spaces we can set the pattern value to r '\s' and repl value to ''. But if we want to remove specific spaces we for we have different patterns.

    1. \s+ pattern for removing single or multiple spaces.
    2. ^\s+ pattern for removing leading spaces.
    3. \s+$ pattern for removing trailing spaces.
    4. ^\s+|\s+$ pattern for removing leading and trailing spaces.

    1. Remove all the spaces from the string.

    import re
    
    string = '  Hello World Welcome to TechgeekBuzz  .'
    
    #regular expression pattern
    pattern = r'\s'
    
    #string to replace
    repl = ''
    
    #replace all whitespace
    replaced_string = re.sub(pattern, repl, string)
    
    print(replaced_string)

    Output

    HelloWorldWelcometoTechgeekBuzz.

    2. Remove the leading whitespaces from a string in Python

    import re
    
    string = '  Hello World Welcome to TechgeekBuzz  .'
    
    #regular expression pattern
    pattern = r'^\s+'
    
    #string to replace
    repl = ''
    
    #replace leading whitespace
    replaced_string = re.sub(pattern, repl, string)
    
    print(replaced_string)

    Output

    Hello World Welcome to TechgeekBuzz  .

    3. Remove all the trailing whitespaces from a string in Python

    import re
    
    string = '  Hello World Welcome to TechgeekBuzz   '
    
    #regular expression pattern
    pattern = r'\s+$'
    
    #string to replace
    repl = ''
    
    #replace trailing  whitespace
    replaced_string = re.sub(pattern, repl, string)
    
    print(f"'{replaced_string}'")

    Output

    '  Hello World Welcome to TechgeekBuzz'

    4. Remove the leading and trailing whitespaces from a string in Python

    import re
    
    string = '  Hello World Welcome to TechgeekBuzz   '
    
    #regular expression pattern
    pattern = r'^\s+|\s+$'
    
    #string to replace
    repl = ''
    
    #replace leading and trailing  whitespace
    replaced_string = re.sub(pattern, repl, string)
    
    print(f"'{replaced_string}'")

    Output

    'Hello World Welcome to TechgeekBuzz'

    5. Replace multiple whitespaces with single whitespace using regex

    import re
    
    string = '  Hello World   Welcome   to    TechgeekBuzz   '
    
    #regular expression pattern
    pattern = r'\s+'
    
    #string to replace
    repl = ' '
    
    #replace multiple whitespaces with single
    replaced_string = re.sub(pattern, repl, string)
    
    print(f"'{replaced_string}'")

    Output

    ' Hello World Welcome to TechgeekBuzz '

    How to limit the maximum number of pattern occurrences to be replaced.

    The sub() method also accepts an optional argument count that can limit the number of replacement. sub(pattern, repl, string, count =0 ) By default, the value of count is 0 , which means it can replace all the occurrence of the pattern in the string. But by setting it to a positive integer value we can limit the replacement numbers.

    Example

    Replace the first 3 whitespaces with underscores in a string.

    import re
    
    string = 'Hello World Welcome to TechgeekBuzz.'
    
    #regular expression pattern
    pattern = r'\s+'
    
    #string to replace
    repl = '_'
    
    #replace first 3 whitespaces by _
    replaced_string = re.sub(pattern, repl, string, count =3)
    
    print(f"'{replaced_string}'")

    Output

    'Hello_World_Welcome_to TechgeekBuzz.'

    Regex Replacement function

    By now we were only using the string value for the repl argument. The repl argument can be a string or a function , now let’s see how to use a function as a repl argument.

    Example

    import re
    def digit_to_word(match_obj):
        digi_words = {'1': 'one', '2': 'two', '3': 'three', '4': 'four', '5': 'five',
                      '6': 'six', '7': 'seven', '8': 'eight', '9': 'nine', '10': 'ten'}
        digit = match_obj.group()
    
        return digi_words[digit]
    
    string = 'There are 3 red balls, 2 green balls and 5 black balls in the bag'
    
    # regular expression pattern for digits
    pattern = r'[0-9]'
    
    # function to call for replacement
    repl = digit_to_word
    
    replaced_string = re.sub(pattern, repl, string, count=3)
    
    print(f"'{replaced_string}'")

    Output

    'There are three red balls, two green balls and five black balls in the bag'

    Python re.subn() method

    The re.subn() method is similar to the sub() method. Similar to the sub() method the subn() method can also replace the specific regex pattern from a string with the replacement string or function. The only difference between sub() and subn() is, the subn() return a tuple of two values.

    1. The first value is the new value of the targeted string with replacement.
    2. And the second value is the number of replacement that has been made on the string.

    Example

    Let’s see an example where we Uppercase the Capitalize names of the string using the subn() method and see the number of replacements applied.

    import re
    
    def cap_to_upper(match_obj):
        name = match_obj.group()
        return name.upper()
    
    string = 'class 10 has 3 toppers Rahul, Jay and Raj 
    
    # regular for capitalize words
    pattern = r'[A-Z]+[a-z]*'
    
    # function to call for replacement
    repl = cap_to_upper
    
    result = re.subn(pattern, repl, string)
    
    new_string = result[0]
    
    changes  = result[1]
    
    print("Replaced String: ", new_string)
    
    print('Number of changes: ',changes)

    Output

    Replaced String:  class 10 has 3 toppers RAHUL, JAY and RAJ 
    Number of changes:  3

    Conclusion

    In this Python Regular Expression tutorial, you learned how to replace a specific string pattern with a targeted string. To do this, we learned two regex methods sub() and subn() . The sub() method accept a regular expression pattern, and replace all the matched pattern of the string with the replacement string or function, and return the newly replaced string. The subn() method is similar to the sub() method, but it returns a tuple containing two items, the new replaced string and the number of replacement made on the string.

    People are also reading:

    Leave a Comment on this Post

    0 Comments