fundamentals.files.fileChunker

Iterate through large line-based files in batches of lines

Author

David Young

Module Contents

Classes

fileChunker

The fileChunker iterator - iterate over large line-based files to reduce memory footprint

API

class fundamentals.files.fileChunker.fileChunker(filepath, batchSize)[source][source]

Bases: object

The fileChunker iterator - iterate over large line-based files to reduce memory footprint

Key Arguments

  • filepath – path to the large file to iterate over

  • batchSize – size of the chunks to return in lines

Usage

To setup your logger, settings and database connections, please use the fundamentals package (see tutorial here https://fundamentals.readthedocs.io/en/master/initialisation.html).

To initiate a fileChunker iterator and then process the file in batches of 100000 lines, use the following:

from fundamentals.files import fileChunker
fc = fileChunker(
    filepath="/path/to/large/file.csv",
    batchSize=100000
)
for i in fc:
    print len(i)

Initialization

__iter__()[source][source]
__next__()[source][source]