fileChunker (class)

class fileChunker(filepath, batchSize)[source]

Bases: object

The fileChunker iterator - iterate over large line-based files to reduce memory footprint

Key Arguments

  • filepath – path to the large file to iterate over

  • batchSize – size of the chunks to return in lines

Usage

To setup your logger, settings and database connections, please use the fundamentals package (see tutorial here).

To initiate a fileChunker iterator and then process the file in batches of 100000 lines, use the following:

from fundamentals.files import fileChunker
fc = fileChunker(
    filepath="/path/to/large/file.csv",
    batchSize=100000
)
for i in fc:
    print len(i)

Methods