-1

I'm trying to load 2.5 million rows of data from a csv file as a string. This is x64 (not 32bit) for visual studio (Debug = Any CPU, Release = Any CPU. Windows is also 64 bit.)

I've preloaded the empty database I'm populating:

public static string[,] allNames = new string[2500000, 12];

I then proceed to fill it. I previously only had 2 million rows, and it was working fine. I then adding more rows, and got the memory error.

Looking at task manager, I could clearly see of my 16gb I had 4gb still free, but I went and bought 16gb more ram anyways as I couldn't figure out what the problem could be.

Lo and behold, the error still comes up. I tried splitting the data into two different dataframes, and it still gives the error at the same spot. Removing one or two columns, makes it go a bit farther, or till the end if it's one of the columns with longer strings (longest strings are almost 1000 characters long).

This is Visual Studio 2019. Any help would be great!

Charlieface
  • 52,284
  • 6
  • 19
  • 43
roushrsh
  • 91
  • 6
  • 1
    It has likely nothing to do with Visual Studio; it's your app that's having problems. Please remove the Visual Studio tag. There's a project setting that says "prefer 32-bit". Make sure it's not checked. In all likelihood, you don't need all that data in memory. Can you describe your situation better? Can you include enough of your code that we can tell what you are doing, preferably as a [mcve]? – Flydog57 Dec 13 '21 at 05:26
  • 1
    What possible reason could you have for loading all that into a single array in memory? What are you doing with that array? – Jeremy Lakeman Dec 13 '21 at 05:37
  • Without seeing your code we can't really help – Charlieface Dec 13 '21 at 17:33
  • @Flydog57 you're a life saver. It was the prefer 32-bit tag. Thank you so so much. I talked to multiple people about this and no one mentioned that could be the cause. – roushrsh Dec 13 '21 at 17:49
  • @Jeremy Lakeman, I'm predicting some molecular interactions (unique per each 2.5 million + molecules) in real time and using those for some predictions. Theoretically, I could calculate them on the fly, but I would need a few thousand features generated every 10 ms or so on a very slow cpu. Obtaining them already stored in memory was the faster way. – roushrsh Dec 13 '21 at 17:51
  • @Flydog57 Please put your comment as an answer and I'll check it thanks! – roushrsh Dec 13 '21 at 18:00
  • Do you need each row as a string array? or would it be better as a struct? Just because the caller is using `.allNames[1,2]` doesn't mean it has to actually be a single array (https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/indexers/using-indexers). – Jeremy Lakeman Dec 13 '21 at 23:53

2 Answers2

1

There's a project setting that says "prefer 32-bit". Make sure it's not checked.

You should really look at refactoring you application so that you don't read all of your data into memory at once. None of us here can help you with this question unless you include more information

Flydog57
  • 6,851
  • 2
  • 17
  • 18
  • Thanks again for the help! You're right, someone mentioned to me it would make more sense to stream in the parts I require in real-time as they're determinable. My calculation overhead is just so high for the time frame I require everything, that any extra processing step for my 4 slow cores is ideally avoided, but I will think more on it. – roushrsh Dec 13 '21 at 19:25
0

Solution 1:
You can split your csv file into 3 files each file should contain max 1 million of records and then you can upload it one by one.

If still problem exists then keep 5 lakh records per file.

Your problem will be resolved now for temporarily.

You can see here max size of string. What is the maximum possible length of a .NET string?

Solution 2:
Instead of string object you can use StringBuilder object.

Sandeep Jadhav
  • 815
  • 1
  • 10
  • 27