1

I am a little stuck here and need some advice for how best to solve my problem: I have a CSV file with a lot of sales data in it(402,000+ lines). When there are no sales for a certain product in a store, there is no data.

Therefore I want to find those periods where there are no sales for a product in a store > 3 days and add them to a watchlist(file/list).

The unique identifier is the SKU + storename combined. So for example the SKU+storename would be 801Whangarei PAK n SAVE. And its missing sales from the 5/03/2019 to 7/03/2019 and I would want that added to a list.

The data looks as below: Sales data file image

public static void main(String[] args) throws IOException {

String line;
String[] elements;
String storename;
String sku;
String date;

    BufferedReader order_lines = new BufferedReader(new FileReader("order_lines_file.csv"));
    int counter = 0;
    while ((line = order_lines.readLine()) != null){
        counter++;
        elements = line.split(",");
            date = elements[5].trim();
            storename = elements[2].trim();
            sku = elements[1].trim();
            String key = storename + sku;
            System.out.println(key);
    }
    System.out.println("End Of File Reached");  
}

}

I have really only got as far as reading the file as I know I will just end up wasting a long time trying to figure this out.

Pseudocode would look something like this:

FOREACH (Unique( sku + storename) IN Sales data){ 
if((current date - next date) > 3 days){
 list.add(dates between current date & next date ) 
}
else{ next date = current date 
}

Thanks in advance for your help! Let me know if you need any more information.

  • Are you open/able to use a database here? – Tim Biegeleisen Mar 18 '19 at 01:51
  • Not at the moment - However, I am looking to store this sales data file in an SQL database sometime in the near future. – WillEyedowin Mar 18 '19 at 01:52
  • Well you are asking for an operation which is fairly well suited to a database. So, that would be my preferred way of answering your question. – Tim Biegeleisen Mar 18 '19 at 01:58
  • Are you able to explain briefly how this could be solved with a database? Might help me make the transition if its rather trivial. – WillEyedowin Mar 18 '19 at 02:00
  • One general database strategy would be left joining a calendar table to a table having your data. This would easily reveal the missing dates. – Tim Biegeleisen Mar 18 '19 at 02:07
  • @WillEyedowin, can you please elaborate the conditions you want ? – Sachith Dickwella Mar 18 '19 at 02:50
  • @SachithDickwella I want a list that shows where there is sales data missing for a SKU in a Region on certain dates. In the sales data file, there is no data(a gap) for when a SKU in a Region does not sell anything. So, in my head you could do it a few ways: Filter through the CSV while looking at one SKU + Region and finding what the next date that product sells. If that date is 3 days or more different, then add those dates to a list and move on. Or you create a unique list of sales for each SKU+ region and sort it by date then calculate differences between each row. – WillEyedowin Mar 18 '19 at 03:17
  • @SachithDickwella in the photo I linked - there is one example of this(Theres likely thousands). The bottom 2 lines show that this SKU in this Region has no sales data between the 4/03/2019 to 8/03/2019. I would add: 801Whangarei PAKnSAVE 5/03/2019, 801Whangarei PAKnSAVE 6/03/2019,801Whangarei PAKnSAVE 7/03/2019 all to a list as they have no sales data. Hope that helps? – WillEyedowin Mar 18 '19 at 03:20
  • Heres some pseudo code: FOREACH (CONCAT(SKU,Region) IN Sales data){ If((current date - next date) > 3 days){ list.add(dates between current date & next date ) else{ next date = current date } – WillEyedowin Mar 18 '19 at 03:21

1 Answers1

0

I solved this by just sorting the data by SKU + storename then by date - this means the next line will hold the next available sales data for that SKU+storename. If the next date is more than 3 days from the current line date then add that to a seperate file.

Here is the question I asked which helped me: JAVA Sort CSV file by columns and then by a date column correctly