Questions tagged [large-data-volumes]

302 questions
1
vote
0 answers

Memory issues with N large matrices

I have to create N large matrices, of size M x M, with M = 100'000, on a cluster. I can create them one by one. Usually I would first define a tensor mat_all = torch.zeros((N,M,M)) And then I would fill mat_all as follows: for i in range(N): …
1
vote
1 answer

Error tokenizing data when I use a high number of files

I am trying to create a dataframe for each file from a list of >3,000 files. When I use a small number of files my code works fine, but when I try bigger numbers (>300 files) I keep getting the same error: ParserError: Error tokenizing data. C…
1
vote
2 answers

Editing large text files on linux ( 5 - 10gb)

Basically, i need a file of specified format and large size(Around 10gb). To get this, i am copying the contents of my original file into the same file, multiple times, to increase its size. I dont care about the contents of the file as long as…
Chander Shivdasani
  • 9,878
  • 20
  • 76
  • 107
1
vote
1 answer

Splitting large data file in python

I output a large array into a text file in python. I then read it in excel to plot the data. Currently, the file I am writing is too large to read in excel. I use file open and close functions and write the data in array ( please refer the…
1
vote
1 answer

Running large data set on EC2, worried about storage

I am running some sequence analysis via EC2. I expect my output files to be over 2 Tb. Before I run my command I want to make sure I have enough room.I changed my instance type to one for data processing d2.4xlarge. My question: If I am running my…
morinajc
  • 15
  • 5
1
vote
0 answers

How to convert -v flag to Dockerfile VOLUME argument

I'm attempting to create a MongoDB Docker container using a data directory on my host machine. This works just fine if I run docker from the command line: docker run \ --detach \ --name db_backup \ --publish 27017:27017 \ --volume…
Pardoner
  • 1,011
  • 4
  • 16
  • 28
1
vote
1 answer

Tips for working with large quantity .txt files (and overall large size) - python?

I'm working on a script to parse txt files and store them into a pandas dataframe that I can export to a CSV. My script works easily when I was using <100 of my files - but now when trying to run it on the full sample, I'm running into a lot of…
1
vote
2 answers

How can I create a large file with random but sensible English words?

I want to test my wordcount software based on MapReduce framework with a very large file (over 1GB) but I don't know how can I generate it. Are there any tools to create a large file with random but sensible english sentences? Thanks
Antonio1996
  • 736
  • 2
  • 7
  • 22
1
vote
4 answers

What method of data validation is most appropriate for large data sets

I have a large database and want to implement a feature which would allow a user to do a bulk update of information. The user downloads an excel file, makes the changes and the system accepts the excel file. The user uses a web interface (ASP.NET)…
Llyle
  • 5,980
  • 6
  • 39
  • 56
1
vote
0 answers

How to Configure Flocker on centos 7.2?

I used https://media-glass.es/flocker-on-centos-7-17a8bfaa6509 as reference for installing ,But the yum repo is not available now. And if you have any alternatives for flocker please share. I am looking for an open-source container data volume…
1
vote
1 answer

Powershell - Out of Memory Error When Running Script Against Large Directory

So I have a script that gets the filename of songs contained in a CSV list and checks a directory to see if the file exists, then exports the missing information if there is any. The CSV file looks something like this: Now, my script seems to work…
1
vote
3 answers

Reindexing a large SQL Server database to Lucene

We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app. These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to…
Andrey
  • 20,487
  • 26
  • 108
  • 176
1
vote
3 answers

How to Throttle DataStage

I work on a project where we run a number of DataStage sequences can be run in parallel, one in particular is poorly performing and takes a lot of resources, impacting the shared environment. Performance tuning initiative is in progress but will…
MattH
  • 4,166
  • 2
  • 29
  • 33
1
vote
2 answers

Load Dataset to SQL script (in powershell)

I would like to extract a large dataset from an SQL server database on one server and then include that dataset on another database on a different server. As link server is not an option, I have tried a powershell script, and would like something…
Ole Lynge
  • 4,457
  • 8
  • 43
  • 57
1
vote
0 answers

Include loader to jQuery data-table exports

I am using the jQuery data tables plugin to show the table fields. And also I am using buttons(Copy) to copy whole data-table, as my data-table has large set of data. It's taking little amount of time to copy it. I want to add loader while copy is…
sujit tiwari
  • 274
  • 3
  • 14