fundamentals.files.fileChunker¶
Iterate through large line-based files in batches of lines
- Author
David Young
Module Contents¶
Classes¶
The fileChunker iterator - iterate over large line-based files to reduce memory footprint |
API¶
- class fundamentals.files.fileChunker.fileChunker(filepath, batchSize)[source][source]¶
Bases:
objectThe fileChunker iterator - iterate over large line-based files to reduce memory footprint
Key Arguments
filepath– path to the large file to iterate overbatchSize– size of the chunks to return in lines
Usage
To setup your logger, settings and database connections, please use the
fundamentalspackage (see tutorial here https://fundamentals.readthedocs.io/en/master/initialisation.html).To initiate a fileChunker iterator and then process the file in batches of 100000 lines, use the following:
from fundamentals.files import fileChunker fc = fileChunker( filepath="/path/to/large/file.csv", batchSize=100000 ) for i in fc: print len(i)
Initialization