1

I know this is a pretty common question but I wasn't able to find an answer useful for my problem. If there is something similar I will delete this post.

I'm working with Octave on the movies.csv from the Kaggle's 5000 Movies Database and I would delete all the lines with zeros within the budget or revenue cell. I had some issues reading the columns through the file, so I've copied and pasted the revenue column close to the budget one - surely I would like to know why Octave identify the part of the text as an autonomous column, but now it's not my most urgent trouble.

Update: The matrix contains numeric and strings values, and I would keep all the data of the lines with budget/revenue greater than zero. Here there's a sample of it, hoping it's understandable. I'm working on a file already without the header, but I left it for a better comprehension.

budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count                                                                                                                                                                                                                                                                                                                                                                            
237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""name"": ""Fantasy""}, {""id"": 878, ""name"": ""Science Fiction""}]",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"": 2964, ""name"": ""future""}, {""id"": 3386, ""name"": ""space war""}, {""id"": 3388, ""name"": ""space colony""}, {""id"": 3679, ""name"": ""society""}, {""id"": 3801, ""name"": ""space travel""}, {""id"": 9685, ""name"": ""futuristic""}, {""id"": 9840, ""name"": ""romance""}, {""id"": 9882, ""name"": ""space""}, {""id"": 9951, ""name"": ""alien""}, {""id"": 10148, ""name"": ""tribe""}, {""id"": 10158, ""name"": ""alien planet""}, {""id"": 10987, ""name"": ""cgi""}, {""id"": 11399, ""name"": ""marine""}, {""id"": 13065, ""name"": ""soldier""}, {""id"": 14643, ""name"": ""battle""}, {""id"": 14720, ""name"": ""love affair""}, {""id"": 165431, ""name"": ""anti war""}, {""id"": 193554, ""name"": ""power relations""}, {""id"": 206690, ""name"": ""mind and soul""}, {""id"": 209714, ""name"": ""3d""}]",en,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289}, {""name"": ""Twentieth Century Fox Film Corporation"", ""id"": 306}, {""name"": ""Dune Entertainment"", ""id"": 444}, {""name"": ""Lightstorm Entertainment"", ""id"": 574}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}, {""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}]",2009-12-10,2787965087,162,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso_639_1"": ""es"", ""name"": ""Espa\u00f1ol""}]",Released,Enter the World of Pandora.,Avatar,7.2,11800                                                                                                                                                                                                                                                                                                                                                                          
300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""name"": ""Fantasy""}, {""id"": 28, ""name"": ""Action""}]",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""name"": ""drug abuse""}, {""id"": 911, ""name"": ""exotic island""}, {""id"": 1319, ""name"": ""east india trading company""}, {""id"": 2038, ""name"": ""love of one's life""}, {""id"": 2052, ""name"": ""traitor""}, {""id"": 2580, ""name"": ""shipwreck""}, {""id"": 2660, ""name"": ""strong woman""}, {""id"": 3799, ""name"": ""ship""}, {""id"": 5740, ""name"": ""alliance""}, {""id"": 5941, ""name"": ""calypso""}, {""id"": 6155, ""name"": ""afterlife""}, {""id"": 6211, ""name"": ""fighter""}, {""id"": 12988, ""name"": ""pirate""}, {""id"": 157186, ""name"": ""swashbuckler""}, {""id"": 179430, ""name"": ""aftercreditsstinger""}]",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""name"": ""Jerry Bruckheimer Films"", ""id"": 130}, {""name"": ""Second Mate Productions"", ""id"": 19936}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2007-05-19,961000000,169,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500                                                                                                                                                                                                                                                                                                                                                                         
245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 80, ""name"": ""Crime""}]",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name"": ""based on novel""}, {""id"": 4289, ""name"": ""secret agent""}, {""id"": 9663, ""name"": ""sequel""}, {""id"": 14555, ""name"": ""mi6""}, {""id"": 156095, ""name"": ""british secret service""}, {""id"": 158431, ""name"": ""united kingdom""}]",en,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""name"": ""Danjaq"", ""id"": 10761}, {""name"": ""B24"", ""id"": 69434}]","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}, {""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2015-10-26,880674609,148,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""}, {""iso_639_1"": ""en"", ""name"": ""English""}, {""iso_639_1"": ""es"", ""name"": ""Espa\u00f1ol""}, {""iso_639_1"": ""it"", ""name"": ""Italiano""}, {""iso_639_1"": ""de"", ""name"": ""Deutsch""}]",Released,A Plan No One Escapes,Spectre,6.3,4466                                                                                                                                                                                                                                                                                                                                                                           
250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""name"": ""Crime""}, {""id"": 18, ""name"": ""Drama""}, {""id"": 53, ""name"": ""Thriller""}]",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853, ""name"": ""crime fighter""}, {""id"": 949, ""name"": ""terrorist""}, {""id"": 1308, ""name"": ""secret identity""}, {""id"": 1437, ""name"": ""burglar""}, {""id"": 3051, ""name"": ""hostage drama""}, {""id"": 3562, ""name"": ""time bomb""}, {""id"": 6969, ""name"": ""gotham city""}, {""id"": 7002, ""name"": ""vigilante""}, {""id"": 9665, ""name"": ""cover-up""}, {""id"": 9715, ""name"": ""superhero""}, {""id"": 9990, ""name"": ""villainess""}, {""id"": 10044, ""name"": ""tragic hero""}, {""id"": 13015, ""name"": ""terrorism""}, {""id"": 14796, ""name"": ""destruction""}, {""id"": 18933, ""name"": ""catwoman""}, {""id"": 156082, ""name"": ""cat burglar""}, {""id"": 156395, ""name"": ""imax""}, {""id"": 173272, ""name"": ""flood""}, {""id"": 179093, ""name"": ""criminal underworld""}, {""id"": 230775, ""name"": ""batman""}]",en,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""name"": ""Warner Bros."", ""id"": 6194}, {""name"": ""DC Entertainment"", ""id"": 9993}, {""name"": ""Syncopy"", ""id"": 9996}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2012-07-16,1084939099,165,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106                                                                                                                                                                                                                                                                                                                                                                            
260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 878, ""name"": ""Science Fiction""}]",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"": 839, ""name"": ""mars""}, {""id"": 1456, ""name"": ""medallion""}, {""id"": 3801, ""name"": ""space travel""}, {""id"": 7376, ""name"": ""princess""}, {""id"": 9951, ""name"": ""alien""}, {""id"": 10028, ""name"": ""steampunk""}, {""id"": 10539, ""name"": ""martian""}, {""id"": 10685, ""name"": ""escape""}, {""id"": 161511, ""name"": ""edgar rice burroughs""}, {""id"": 163252, ""name"": ""alien race""}, {""id"": 179102, ""name"": ""superhuman strength""}, {""id"": 190320, ""name"": ""mars civilization""}, {""id"": 195446, ""name"": ""sword and planet""}, {""id"": 207928, ""name"": ""19th century""}, {""id"": 209714, ""name"": ""3d""}]",en,John Carter,"John Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands.",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2012-03-07,284139100,132,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124                                                                                                                                                                                                                                                                                                                                                                            
258000000,"[{""id"": 14, ""name"": ""Fantasy""}, {""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}]",http://www.sonypictures.com/movies/spider-man3/,559,"[{""id"": 851, ""name"": ""dual identity""}, {""id"": 1453, ""name"": ""amnesia""}, {""id"": 1965, ""name"": ""sandstorm""}, {""id"": 2038, ""name"": ""love of one's life""}, {""id"": 3446, ""name"": ""forgiveness""}, {""id"": 3986, ""name"": ""spider""}, {""id"": 4391, ""name"": ""wretch""}, {""id"": 4959, ""name"": ""death of a friend""}, {""id"": 5776, ""name"": ""egomania""}, {""id"": 5789, ""name"": ""sand""}, {""id"": 5857, ""name"": ""narcism""}, {""id"": 6062, ""name"": ""hostility""}, {""id"": 8828, ""name"": ""marvel comic""}, {""id"": 9663, ""name"": ""sequel""}, {""id"": 9715, ""name"": ""superhero""}, {""id"": 9748, ""name"": ""revenge""}]",en,Spider-Man 3,"The seemingly invincible Spider-Man goes up against an all-new crop of villain – including the shape-shifting Sandman. While Spider-Man’s superpowers are altered by an alien organism, his alter ego, Peter Parker, deals with nemesis Eddie Brock and also gets caught up in a love triangle.",115.699814,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""name"": ""Laura Ziskin Productions"", ""id"": 326}, {""name"": ""Marvel Enterprises"", ""id"": 19551}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2007-05-01,890871626,139,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""}]",Released,The battle within.,Spider-Man 3,5.9,3576                                                                                                                                                                                                                                                                                                                                                                            
260000000,"[{""id"": 16, ""name"": ""Animation""}, {""id"": 10751, ""name"": ""Family""}]",http://disney.go.com/disneypictures/tangled/,38757,"[{""id"": 1562, ""name"": ""hostage""}, {""id"": 2343, ""name"": ""magic""}, {""id"": 2673, ""name"": ""horse""}, {""id"": 3205, ""name"": ""fairy tale""}, {""id"": 4344, ""name"": ""musical""}, {""id"": 7376, ""name"": ""princess""}, {""id"": 10336, ""name"": ""animation""}, {""id"": 33787, ""name"": ""tower""}, {""id"": 155658, ""name"": ""blonde woman""}, {""id"": 162219, ""name"": ""selfishness""}, {""id"": 163545, ""name"": ""healing power""}, {""id"": 179411, ""name"": ""based on fairy tale""}, {""id"": 179431, ""name"": ""duringcreditsstinger""}, {""id"": 215258, ""name"": ""healing gift""}, {""id"": 234183, ""name"": ""animal sidekick""}]",en,Tangled,"When the kingdom's most wanted-and most charming-bandit Flynn Rider hides out in a mysterious tower, he's taken hostage by Rapunzel, a beautiful and feisty tower-bound teen with 70 feet of magical, golden hair. Flynn's curious captor, who's looking for her ticket out of the tower where she's been locked away for years, strikes a deal with the handsome thief and the unlikely duo sets off on an action-packed escapade, complete with a super-cop horse, an over-protective chameleon and a gruff gang of pub thugs.",48.681969,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""name"": ""Walt Disney Animation Studios"", ""id"": 6125}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2010-11-24,591794936,100,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,They're taking adventure to new lengths.,Tangled,7.4,3330                                                                                                                                                                                                                                                                                                                                                                         
280000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 878, ""name"": ""Science Fiction""}]",http://marvel.com/movies/movie/193/avengers_age_of_ultron,99861,"[{""id"": 8828, ""name"": ""marvel comic""}, {""id"": 9663, ""name"": ""sequel""}, {""id"": 9715, ""name"": ""superhero""}, {""id"": 9717, ""name"": ""based on comic book""}, {""id"": 10629, ""name"": ""vision""}, {""id"": 155030, ""name"": ""superhero team""}, {""id"": 179431, ""name"": ""duringcreditsstinger""}, {""id"": 180547, ""name"": ""marvel cinematic universe""}, {""id"": 209714, ""name"": ""3d""}]",en,Avengers: Age of Ultron,"When Tony Stark tries to jumpstart a dormant peacekeeping program, things go awry and Earth’s Mightiest Heroes are put to the ultimate test as the fate of the planet hangs in the balance. As the villainous Ultron emerges, it is up to The Avengers to stop him from enacting his terrible plans, and soon uneasy alliances and unexpected action pave the way for an epic and unique global adventure.",134.279229,"[{""name"": ""Marvel Studios"", ""id"": 420}, {""name"": ""Prime Focus"", ""id"": 15357}, {""name"": ""Revolution Sun Studios"", ""id"": 76043}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2015-04-22,1405403694,141,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,A New Age Has Come.,Avengers: Age of Ultron,7.3,6767                                                                                                                                                                                                                                                                                                                                                                          
250000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""name"": ""Fantasy""}, {""id"": 10751, ""name"": ""Family""}]",http://harrypotter.warnerbros.com/harrypotterandthehalf-bloodprince/dvd/index.html,767,"[{""id"": 616, ""name"": ""witch""}, {""id"": 2343, ""name"": ""magic""}, {""id"": 3872, ""name"": ""broom""}, {""id"": 3884, ""name"": ""school of witchcraft""}, {""id"": 6333, ""name"": ""wizardry""}, {""id"": 10164, ""name"": ""apparition""}, {""id"": 10791, ""name"": ""teenage crush""}, {""id"": 12564, ""name"": ""werewolf""}]",en,Harry Potter and the Half-Blood Prince,"As Harry begins his sixth year at Hogwarts, he discovers an old book marked as 'Property of the Half-Blood Prince', and begins to learn more about Lord Voldemort's dark past.",98.885637,"[{""name"": ""Warner Bros."", ""id"": 6194}, {""name"": ""Heyday Films"", ""id"": 7364}]","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}, {""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2009-07-07,933959197,153,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Dark Secrets Revealed,Harry Potter and the Half-Blood Prince,7.4,5293                                                                                                                                                                                                                                                                                                                                                                           

So this is my code, which prints a new file with many zeros without an understandable pattern (or so it seems to me).

fid = fopen('original_filepath', 'r');
fout = fopen('new_filepath', 'w+');

tline = fgetl(fid);
while ~feof(fid)    #here at first I used ischar but
                    #it returned an invalid stream number for the ending tline
     dollars = strread(tline, '%f', 'delimiter', ',');
     budget = dollars(1);
     revenue = dollars(2);
     if budget = 0 || revenue = 0
         fprintf(fout, '%s\n', tline);
     end
  tline = fgetl(fid);
end

fclose(fid);
fclose(fout);

I know that strread is not recommended, but textscan results more problematic for the content parting. Or maybe I am too obsessed having something like dollars(k), that I find really handy.

Prn
  • 13
  • 5
  • 1
    Since someone needs to register to download the CSV it's very unlikely that someone will help here. Why aren't you using csvread or dlmread? There ale also many import functions in [IO package](https://octave.sourceforge.io/io/overview.html). If you want someone to actively help you, you should create a small input CSV with a few lines which shows your problem. See also [MCVE] – Andy Oct 07 '17 at 18:03
  • Thank for your answer. It's possible to see a preview of the database, so I thought it was simpler looking on the Kaggle website. I will update my question. I did not know (and I did not found anything about) dlmread, I will see the documentation. – Prn Oct 08 '17 at 14:22
  • A screenshot of the CSV is not the best option because it doesn't allow to write code to load it. I see that this CSV contains JSON data so you're also interested to decode these using jsonlab or rapidjson-octave? – Andy Oct 08 '17 at 17:05
  • Thanks, in my experience with Octave I didn't know it had a decoder of JSON data. Indeed before this work I only used csv files without JSON data and my initial intent concerned only the cleaning of the matrix, but yes, I would be very interested to try new tools. However after my attempts trying to work on this database and the answers to this post I'm thinking that my main need is to deepen my theoretical knowledge of Octave, because this quite simple real-case has been more problematic than I imagined. – Prn Oct 08 '17 at 18:02
  • The main problem is, that you don't upload the CSV (the first 10 lines would be sufficient). So anyone who might help you looks at this question and thinks "What? I have to register on some site to help someone on SO? No way" – Andy Oct 08 '17 at 18:10
  • Hi, I've updated the question, adding a sample of the matrix. I wish the formatting is appropriate. – Prn Oct 09 '17 at 12:10
  • It looks like you've removed the delimiter. From your code example it looks like the CSV delimiter is "," but you pasted snipped comes withcout delimiter (likely a copy&paste problem form Excel or something? I can only guess that the "," delimiter, which is also in the JSON data causes the troubles you see – Andy Oct 09 '17 at 12:36
  • You're right, I've formatted the sample like in Excel. Is it better? – Prn Oct 09 '17 at 13:33

2 Answers2

1

There were multiple things wrong in your code. Please try this (untested) code and step into the line

fprintf(fout, '%s\n', [num2str(budget), ',', num2str(revenue)]);

to see whether the correct values are read from source and written to destination. Please update your question, if you keep having problems.

fid = fopen('original_filepath', 'r');
fout = fopen('new_filepath', 'w+');


while ~feof(fid)    #here at first I used ischar but
                    #it returned an invalid stream number for the ending tline

    tline = fgetl(fid); % read current line
    dollars = strread(tline, '%f', 'delimiter', ',');
    budget = dollars(1);
    revenue = dollars(2);
    if ~(budget*revenue == 0)  % ensure neither budget nor revenue are zero
        fprintf(fout, '%s\n', [num2str(budget), ',', num2str(revenue)]);
    end


    tline = fgetl(fid);
end

fclose(fid);
fclose(fout);

Please also consider using csvread() instead of strread as your data seems to be in csv format.

Marcus
  • 450
  • 4
  • 14
  • Thank you. I tried to run the code with your edit. It keeps only budget/revenue values different from zero, but messing the sequence. Besides looking to the data I think it trims a little too much lines. I thought about csvread but my file has also not-numeric values. However I get the importance of the preview, as Andy pointed out, and I will edit my post. I would be grateful if you can show me (even briefly) which are my other errors. – Prn Oct 08 '17 at 14:58
0

Using octave-forge io package:

pkg load io
c = csv2cell ("prn.csv");

header = c(1, :);
c(1, :) = []; % strip header

% get column ids
budget_col = find (strcmp (header, "budget"))
revenue_col = find (strcmp (header, "revenue"))

budget = cell2mat (c(:, budget_col));
revenue = cell2mat (c(:, revenue_col));

% remove line where budget or revenue are zero
c (budget == 0 | revenue == 0, :) = [];

% show remaining homepage
c (:, 3)

gives

ans = 
{
  [1,1] = http://www.avatarmovie.com/
  [2,1] = http://disney.go.com/disneypictures/pirates/
  [3,1] = http://www.sonypictures.com/movies/spectre/
  [4,1] = http://www.thedarkknightrises.com/
  [5,1] = http://movies.disney.com/john-carter
  [6,1] = http://www.sonypictures.com/movies/spider-man3/
  [7,1] = http://disney.go.com/disneypictures/tangled/
  [8,1] = http://marvel.com/movies/movie/193/avengers_age_of_ultron
  [9,1] = http://harrypotter.warnerbros.com/harrypotterandthehalf-bloodprince/dvd/index.html
}

Georg W.
  • 1,292
  • 10
  • 27
Andy
  • 7,931
  • 4
  • 25
  • 45