java - Proper handling of Gigantic input & processing strings -
i've been looking days on solving key problems i'm running into, , have not found answer problem yet.
i'm embarking on academic (/learning) project involves reading 3-50mb plain-text files on regular basis, , across millions of records (my current set ~800,000 records)
assuming file can't split()
chunks, what's best way pass this chunk between functions? pass-by-value leads me think (and, believe, see) passing 50mb file function, , returning 20-30mb result set, means have used wasted on 100mb memory passing file that's waiting reclaimed @ gc. (technically, file can split(), split()s each 10mb large @ times, , each must held while processing)
i've made significant changes overall project recently, , want design processing portion better time. previous method read , processed data in driver itself--without data container. when attempted use data container, ended similar results. here's first method used:
- read entire 3-50 mb+ file string
- regex/split 4-15 chunks (determined xml-like tags in file)
- pass 1-3 chunks function (looking data)
- pass 4-5 more chunks function b (looking different data, won't exist in function chunks)
- collect results in driver function
- stitched result set, , wrote disk (i know should create-and-append instead)
i can split read, however, splits can 5mb in size each (or more), , need keep of them in memory until file done processing (in case step 3 changes how step 4 works).. , worse, input readline()'s might 1-2mb long (before \n
).
so, kind of design strategy best handling these huge input files, , huge strings?
pass-by-value leads me think (and, believe, see) passing 50mb file function, , returning 20-30mb result set, means have used wasted on 100mb memory passing file that's waiting reclaimed @ gc.
incorrect. java passes references value, not entire string
. pass (reference to) string along start , end indices of section of string want process.
void read() { string input = /*your code here*/; process(input, 37, 17576); } process(string input, int startindex, int endindex) { /*your code here, e.g. for(int = startindex; < endindex; i++) { //do stuff }*/ }
also, if read
, process
in same class, can make string class field:
string input; void read() { input = /*your code here*/; process(37, 17576); } process(int startindex, int endindex) { /*your code here, e.g. for(int = startindex; < endindex; i++) { //do stuff }*/ }
Comments
Post a Comment