0

I have a task which requires me to read a CSV file in Java. I have done reading it but I think I do not store them in the way I wanted which enable me to access them on the later tasks such as analyzing some of the data, buidling a graph etc. The CSV file contains several variables in the header and some of the variables are in numbers and some are in alphabets, which mean I would need to store them in Integer or String format.

Do note that I did not use any library such as openCSV to read the file as I am a beginner and trying to get familiar with the basic Java.

Below is the nycflight13 which I read and store the data. The instruction given is to not take in any rows that contain the word "NA".

`    public class nycflights13 {

        public static void main(String[] args) {
            // TODO Auto-generated method stub
            List<Flights> NYC13 = readFileFromCSV("flights.csv");

            for(Flights a: NYC13) {
                System.out.println(a);
            }
        }

        public static List<Flights> readFileFromCSV (String fileName){
            List<Flights> flightData = new ArrayList <> (); 
            Path pathToFile = Paths.get(fileName);

            try(BufferedReader br = Files.newBufferedReader(pathToFile,
                    StandardCharsets.US_ASCII)){
                br.readLine();
                String line = br.readLine();

                while (line != null) {
                    String [] variable = line.split(",");

                    //convert string array to list
                    List<String> list = Arrays.asList(variable);
                    if(list.contains("NA")) { //Do not take in rows containing "NA"
                        break;
                    } else {
                        Flights dataset = createFlights(variable);
                        flightData.add(dataset);
                    }
                    line = br.readLine();
                }
            }catch (IOException ioe) {
                ioe.printStackTrace();
            }

            return flightData;
        }



        private static Flights createFlights (String [] metadata) {
            int year = Integer.parseInt(metadata[1]); //convert string into int
            int month = Integer.parseInt(metadata[2]); //convert string into int
            int day = Integer.parseInt(metadata[3]); //convert string into int
            int dep_time = Integer.parseInt(metadata[4]); //convert string into int
            String carrier = metadata[10];
            String flight = metadata[11];
            String origin = metadata[13];
            String dest = metadata[14]; 

            return new Flights(year, month, day, dep_time,carrier, flight, origin, dest);
        }

    }`

Below is my class Flights (I have way more variables than what I showed here):

class Flights {
        private int year; 
        private int month; 
        private int day; 
        private int dep_time;
        private String carrier; 
        private String flight; 
        private String origin;
        private String dest; 

        public Flights(int year, int month, int day, int dep_time, String carrier, String flight, String String origin, String dest) {
            this.year = year; 
            this.month = month; 
            this.day = day; 
            this.dep_time = dep_time;
            this.carrier = carrier; 
            this.flight = flight; 
            this.origin = origin; 
            this.dest = dest; 
        }

        public int getYear() {return year;}
        public void setYear(int year) {this.year = year;}

        public int getMonth() {return month;}
        public void setMonth(int month) {this.month = month; }

        public int getDay() {return day;}
        public void setDay(int day) {this.day = day; }

        public int getdep_time() {return dep_time;}
        public void setdep_time(int dep_time) {this.dep_time = dep_time; }

        ............
        .............
        ...........


        @Override
        public String toString() {
           return "Flights [year=" + year +", month=" + month +", day=" + day +", dep_time=" + 
               dep_time +
                ", carrier=" + carrier + ", flight=" + flight +", origin=" + origin +", dest=" + dest 
              +", air_time=" + air_time +", distance=" + distance +", 
                 hour=" + hour +", minute=" + minute +
                 ", time_hour=" + time_hour +"]";
`

The above code will give me result as below:

Flights [year=2013, month=1, day=1, dep_time=926, sched_dep_time=929, dep_delay=-3, arr_time=1404, sched_arr_time=1421, arr_delay=-17, carrier="B6", flight=215, tailnum="N775JB", origin="EWR", dest="SJU", air_time=191, distance=1608, hour=9, minute=29, time_hour=2013-01-01 09:00:00]

Flights [year=2013, month=1, day=1, dep_time=926, sched_dep_time=922, dep_delay=4, arr_time=1221, sched_arr_time=1219, arr_delay=2, carrier="B6", flight=57, tailnum="N534JB", origin="JFK", dest="PBI", air_time=151, distance=1028, hour=9, minute=22, time_hour=2013-01-01 09:00:00]

Flights [year=2013, month=1, day=1, dep_time=926, sched_dep_time=928, dep_delay=-2, arr_time=1233, sched_arr_time=1220, arr_delay=13, carrier="UA", flight=1597, tailnum="N27733", origin="EWR", dest="EGE", air_time=287, distance=1726, hour=9, minute=28, time_hour=2013-01-01 09:00:00]

Flights [year=2013, month=1, day=1, dep_time=927, sched_dep_time=930, dep_delay=-3, arr_time=1231, sched_arr_time=1257, arr_delay=-26, carrier="DL", flight=1335, tailnum="N951DL", origin="LGA", dest="RSW", air_time=166, distance=1080, hour=9, minute=30, time_hour=2013-01-01 09:00:00]

I have a few questions:

  1. My csv data actually contains more than 300k rows of data but with the code that I built as above, I only manage to print like 280 lines. Is it the code went wrong? or we have an upper limit in eclipse in printing lines.

  2. I would like to know how can I access to a particular variables from the List<Flights> such as carrier or month to calculate the total size of carrier or to count the frequency of the month.

  3. What is the correct ways to store data with multiple variables? and able to access them in another class. OR ways to improve my current code.

Appreciate for the feedback and the times. Thanks a million.

boonboon93
  • 41
  • 4
  • How do you count the number of lines!”? – Thorbjørn Ravn Andersen Apr 12 '20 at 14:11
  • @ThorbjørnRavnAndersen You mean how I know I only printed 280 lines? I copied the results from the console and pasted them in excel to count the lines. – boonboon93 Apr 12 '20 at 14:15
  • 1
    The console has as default a limited size so it may not hold the complete output. Add a line counter to the output instead. – Thorbjørn Ravn Andersen Apr 12 '20 at 14:27
  • @ThorbjørnRavnAndersen I have tried to print `System.out.print(NYC13.size())` and I only got 471 which is off from the 300k rows of data I should have. I also use count++ to count the line read and I got 471 as well. Could you advise which part went wrong? Thanks alot! – boonboon93 Apr 12 '20 at 14:48

1 Answers1

2

Answering your queries:

  1. If you did not get any error or exception while executing the code, you need not worry. Eclipse has default console buffer size which is limited. Refer - https://javarevisited.blogspot.com/2013/03/how-to-increase-console-buffer-size-in.html

  2. Now that you have read the data, you should go ahead and save it in a Database. Once you have the data in database you can run all sorts of query you want to get datas that satisfy your conditions.

  3. I did not understand what you mean by 'ways to store data with multiple variables'. Could you clarify ?

  • Thanks for your feedback. The #3 question is actually related to #2 which I want to access certain variables later. Would need some advise on how to save them in a database or could you kindly provide reading material on this? Have no ideas how to start. – boonboon93 Apr 12 '20 at 14:38
  • @boonboon93 Try https://www.codejava.net/java-se/jdbc/connect-to-mysql-database-via-jdbc You will get several articles, just try to google 'connect stanalone java to mysql eclipse' – Trishul Singh Choudhary Apr 12 '20 at 14:47