java - Proper handling of Gigantic input & processing strings -


i've been looking days on solving key problems i'm running into, , have not found answer problem yet.

i'm embarking on academic (/learning) project involves reading 3-50mb plain-text files on regular basis, , across millions of records (my current set ~800,000 records)

assuming file can't split() chunks, what's best way pass this chunk between functions? pass-by-value leads me think (and, believe, see) passing 50mb file function, , returning 20-30mb result set, means have used wasted on 100mb memory passing file that's waiting reclaimed @ gc. (technically, file can split(), split()s each 10mb large @ times, , each must held while processing)

i've made significant changes overall project recently, , want design processing portion better time. previous method read , processed data in driver itself--without data container. when attempted use data container, ended similar results. here's first method used:

  1. read entire 3-50 mb+ file string
  2. regex/split 4-15 chunks (determined xml-like tags in file)
  3. pass 1-3 chunks function (looking data)
  4. pass 4-5 more chunks function b (looking different data, won't exist in function chunks)
  5. collect results in driver function
  6. stitched result set, , wrote disk (i know should create-and-append instead)

i can split read, however, splits can 5mb in size each (or more), , need keep of them in memory until file done processing (in case step 3 changes how step 4 works).. , worse, input readline()'s might 1-2mb long (before \n).

so, kind of design strategy best handling these huge input files, , huge strings?

pass-by-value leads me think (and, believe, see) passing 50mb file function, , returning 20-30mb result set, means have used wasted on 100mb memory passing file that's waiting reclaimed @ gc.

incorrect. java passes references value, not entire string. pass (reference to) string along start , end indices of section of string want process.

void read() {     string input = /*your code here*/;     process(input, 37, 17576); }  process(string input, int startindex, int endindex) {     /*your code here, e.g.     for(int = startindex; < endindex; i++)     {         //do stuff     }*/ } 

also, if read , process in same class, can make string class field:

string input;  void read() {     input = /*your code here*/;     process(37, 17576); }  process(int startindex, int endindex) {     /*your code here, e.g.     for(int = startindex; < endindex; i++)     {         //do stuff     }*/ } 

Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -