c# - Loading large TIF file into string with .NET in a memory-efficient way -


i have existing code has been used years upload xml , tif file pair via httpwebrequest post request. problem is, on large tif files chews through memory flock of beavers attacking forest. started digging code today in attempt make more memory-efficient.

the existing code loads xml , tif content string object, converted byte array , fed http request. many string concatenations involved throughout. tif file loaded , converted string object this, br2 binaryreader object:

system.text.encoding.default.getstring(br2.readbytes(tifbytecount)) 

i know using encoding.default not wise, changing require working client change decoding of file submissions, time. change base64 encoding when make change. anyway...

the first item changed of string concatenations, because figured bogging things down, when working tif-string object. i'm using stringbuilder object , appending everything.

i searched "byte array string conversion" , tried several different results found, including this one , this one, both used different encoding existing code.

i used system.text.encoding.default.decoder object decode entire tif file char[] array @ 1 time. didn't improve memory @ all, did @ least use same encoding.

the file i've been testing today 185 mb tif file. while testing on dev machine, windows physical memory usage start @ 2 gb used, , climb 5+ gb , max out @ 5.99 gb , promptly lock until debugger killed itself. far tell loading single instance of tif file memory, couldn't understand why 185 mb using 4 gb of memory.

anyway, next tried loading in tif file in smaller chunks. 1000 bytes @ time. looked promising initially. used 2 gb of memory when loading last <1000 bytes of file. on last chunk of bytes though (in case 928 bytes), line charcount = dc.getcharcount(ba2, x, (int)filestream2.length - x) caused memory momentarily spike 1 gb, following line chars2 = new char[(int)filestream2.length - x] increased memory 700 mb, , following line charsdecodedcount = dc.getchars(ba2, x, (int)filestream2.length - x, chars2, 0) pushed memory max , locked system.

the code below shows last approach tried - 1 described in previous paragraph.

binaryreader br2 = new binaryreader(filestream2); byte[] ba2 = br2.readbytes((int)filestream2.length); char[] chars2 = null;  if ((int)filestream2.length > 1000) {     (int x = 0; x < (int)filestream2.length; x += 1000)     {         if (x + 1000 > (int)filestream2.length)         {             charcount = dc.getcharcount(ba2, x, (int)filestream2.length - x);             chars2 = new char[(int)filestream2.length - x];             charsdecodedcount = dc.getchars(ba2, x, (int)filestream2.length - x, chars2, 0);         }         else         {              charcount = dc.getcharcount(ba2, x, 1000);              chars2 = new char[charcount];              charsdecodedcount = dc.getchars(ba2, x, 1000, chars2, 0);         }          sbrequest.append(chars2);         chars2 = null;     } } else {     charcount = dc.getcharcount(ba2, 0, ba2.length);     chars2 = new char[charcount];     charsdecodedcount = dc.getchars(ba2, 0, ba2.length, chars2, 0);     sbrequest.append(chars2); } 

i have feeling i'm missing obvious. i'd appreciate advice on resolving this. i'd able load in 185 mb tif file without using 4 gb of memory!

few major issues in current code:

byte[] ba2 = br2.readbytes((int)filestream2.length); 

this read entire file memory.

dc.getcharcount(...) dc.getchars(...) 

these methods use internal buffers they'll increasing memory usage more said.

you're not "loading in tif file in smaller chunks 1000 bytes @ time". you're loading entire file memory , decoding bytes 1000 bytes @ time.

if want make method use little memory possible, suggest work streams. here's example:

using (var fs = new filestream("tif file", filemode.open)) {     var request = (httpwebrequest)webrequest.create("address");     request.method = "post";     request.contentlength = fs.length;      using (stream poststream = request.getrequeststream())     {         // write other contents wanted write here         // ...          // copyto uses buffer of 4096 bytes default,         // read 4096 bytes memory @ time.         fs.copyto(poststream);         poststream.close(); // not sure if necessary since we're in using block     }      using (httpwebresponse response = request.getresponse()) // might need cast httpwebrequest     using (stream responsestream = response.getresponsestream())     using (var streamreader = new streamreader(responsestream))     {         string response = encoding.utf8.getstring(streamreader.readtoend());         // stuff response     } } 

you may better performance large read filestream specifying fileoptions.sequentialscan in constructor. flag "indicates file accessed sequentially beginning end. system can use hint optimize file caching." [1] can find further details flag here.

using (var fs = new filestream("tif file", filemode.open, fileaccess.read, fileshare.read, 4096, fileoptions.sequentialseek)) 

Comments