`fundamentals.files.fileChunker`¶

Iterate through large line-based files in batches of lines

Author: David Young

Module Contents¶

Classes¶

fileChunker

The fileChunker iterator - iterate over large line-based files to reduce memory footprint

API¶

class fundamentals.files.fileChunker.fileChunker(filepath, batchSize)[source][source]¶

Bases: object

The fileChunker iterator - iterate over large line-based files to reduce memory footprint

Key Arguments

filepath – path to the large file to iterate over
batchSize – size of the chunks to return in lines

Usage

To setup your logger, settings and database connections, please use the fundamentals package (see tutorial here https://fundamentals.readthedocs.io/en/master/initialisation.html).

To initiate a fileChunker iterator and then process the file in batches of 100000 lines, use the following:

from fundamentals.files import fileChunker
fc = fileChunker(
    filepath="/path/to/large/file.csv",
    batchSize=100000
)
for i in fc:
    print len(i)

Initialization

__iter__()[source][source]¶

__next__()[source][source]¶

fundamentals.files.fileChunker¶

Module Contents¶

Classes¶

API¶

`fundamentals.files.fileChunker`¶