[Answer]-Python string clean up: How can I remove the excess of newlines of a string in python?

1πŸ‘

βœ…

A regexp is one way. Using your updated sample input:

>>> a = "This is my sample text.\r\n\r\n\r\n\r\n\r\n Here start another sample text"
>>> import re
>>> re.sub(r'(\r\n){2,}','\r\n', a)
'This is my sample text.\r\n Here start another sample text'

r'(\r\n)+' would work too, I just like using the 2+ lower bound to avoid some replacements of singleton \r\n substrings with the same substring.

Or you can use the splitlines method on the string and rejoin after filtering:

>>> '\r\n'.join(line for line in a.splitlines() if line)

0πŸ‘

As an example, if you know what you want to replace:

>>> a = 'string with \n a \n\n few too many\n\n\n lines'
>>> a.replace('\n'*2, '\n') # Replaces \n\n with just \n
'string with \n a \n few too many\n\n lines'
>>> a.replace('\n'*3, '') # Replaces \n\n\n with nothing...
'string with \n a \n\n few too many lines'

Or, using regular expression to find what you want

>>> import re
>>> re.findall(r'.*([\n]+).*', a)
['\n', '\n\n', '\n\n\n']
πŸ‘€jakebrinkmann

0πŸ‘

import re 
a = 'string with \n a \n\n few too many\n\n\n lines'
re.sub('\n+', '\n', a)
πŸ‘€Pran

0πŸ‘

To use a regex to replace multiple occurrences of newline with a single one (or something else you prefer such as a period, tab or whatever), try:

import re
testme = 'Some text.\nSome more text.\n\nEven more text.\n\n\n\n\nThe End'
print re.sub('\n+', '\n', testme)

Note that β€˜\n’ is a single-character (a newline), not two characters (literal backslash and β€˜n’).

You can of course compile the regex in advance if you intend to re-use it:

pattern = re.compile('\n+')
print pattern.sub('\n', testme)
πŸ‘€langton

0πŸ‘

Best I could do, but Peter DeGlopper’s was better.

import re
s = '\n' * 9 + 'abc' + '\n'*10
# s == '\n\n\n\n\n\n\n\n\nabc\n\n\n\n\n\n\n\n\n\n\n'
lines = re.compile('\n+')
excess_lines = lines.findall(s)
# excess_lines == ['\n' * 9, '\n' * 10]
# I feel as though there is a better way, but this works

def cmplen(first, second):
    '''
    Function to order strings in descending order by length
    Needed so that we replace longer strings of new lines first
    '''

    if len(first) < len(second):
        return 1
    elif len(first) > len(second):
        return -1
    else:
        return 0

excess_lines.sort(cmp=cmplen)
# excess_lines == ['\n' * 10, '\n' * 9]
for lines in excess_lines:
    s = s.replace(lines, '\n')

# s = '\nabc\n'

This solution feels dirty and inelegant, but it works. You need to sort by string length because if you have a string β€˜\n\n\n aaaaaaa \n\n\n\n’ and do a replace(), the \n\n\n will replace \n\n\n\n with \n\n, and not be caught later on.

πŸ‘€limasxgoesto0

Leave a comment