2015-05-04

Read file pattern (Python)

Here's a pattern that I've been using a lot to read files in Python 2.7:
import codecs
badrow_count = 0
goodrow_count = 0
rowcount = 0

filename = "somefile.txt"
encoding = "utf-8"
# see https://docs.python.org/2/library/codecs.html#standard-encodings
# for encodings

decode_error_handler = 'strict'
# this is the default
# see https://docs.python.org/2/library/codecs.html#codec-base-classes 
# for decoding error callbacks

f = codecs.open(filename=filename, mode='rU', encoding=encoding, 
    errors=decode_error_handler)
eof = False
while not eof:
    row = u''
    try:
        row = f.next()
    except UnicodeDecodeError as e:
        badrow_count += 1
        # do other things on this row
    except StopIteration:
        eof = True
    #except Exception as e:
        # handle other issues    
    else:
        goodrow_count += 1
        # do other stuff with row
    finally:
        if not eof:
            rowcounter += 1
        else:
            break
I prefer this to:
for row in f:
primarily in order to catch unicode decoder errors.