Chunking, processing & merging dataset in Pandas/Python -

there large dataset, containing strings. want open via read_fwf using widths, this:

widths = [3, 7, ..., 9, 7] tp = pandas.read_fwf(file, widths=widths, header=none)

it me mark data, system crashes (works nrows=20000). decided chunk (e.g. 20000 rows), this:

cs = 20000 chunk in pd.read_fwf(file, widths=widths, header=none, chunksize=ch) ...:  <some code using chunk>

my question is: should in loop merge (concatenate?) chunks in .csv file after processing of chunk (marking row, dropping or modyfiing column)? or there way?

i'm going assume since reading entire file

tp = pandas.read_fwf(file, widths=widths, header=none)

fails reading in chunks works, file big read @ once , encountered memoryerror.

in case, if can process data in chunks, concatenate results in csv, use chunk.to_csv write csv in chunks:

filename = ... chunk in pd.read_fwf(file, widths=widths, header=none, chunksize=ch)     # process chunk     chunk.to_csv(filename, mode='a')

note mode='a' opens file in append mode, output of each chunk.to_csv call appended same file.

Autos

Search This Blog

Chunking, processing & merging dataset in Pandas/Python -