1

After loading our initial facts into the cube, we then load a second file that adds measures to the existing facts (so no new facts are created by the second file). We use a Handler to do this.

When the second file is removed from the filesystem, we would like to remove just the relevant measures from the facts.

Is there a way for us to plug into the Directory/File Watcher mechanism to accomplish this?

obrienk
  • 481
  • 5
  • 10
  • 1
    Could you add more information regarding your loading architecture? Do you use relational stores/ ActivePivot store ? It seems that you have the initial facts separated from your measures. Are your measures contained in a separate store linked to the store containing the initial facts ? Is their any reason why you need to separate the initial facts from your measures ? – David May 09 '13 at 16:10
  • Hi David, Both files are being used against a single Relational (ActivePivot) store. We receive the data spread across multiple files, so load the first file to create the facts and then add the extra measures to the existing facts using the second file. Does that answer your question? – obrienk May 13 '13 at 10:51

2 Answers2

1

If we understand correctly, and to simplify the usecase, your dataset has two measures A and B. For the same records one file brings measure 'A' and another file brings measure 'B'. And you want to freely update or delete the data for measure A or B independently.

There are several ways you can achieve this.

First you could decouple the measures: instead of records that bear both A and B fields, you would have two records with a generic "value" field, and a "mesure type" field to distinguish between both measure types. This design is flexible because you can introduce a new measure 'C' later, itself fed from another file.

The most elegant option is probably to use the ActivePivot Distributed Architecture, with Polymorphic Distribution. You would setup two independent cubes, one holding only the 'A' measure, another cube with the 'B' measure. Then join the cubes together with polymorphic distribution, ActivePivot will merge them together on the fly and present both measure as if they belonged to the same (virtual) cube.

Finally the quick and dirty solution: configure your measures as 'nullable' fields in ActivePivot. This way when you want to erase measure 'A', you actually write 'null' to the 'A' fields of your records.

Antoine CHAMBILLE
  • 1,676
  • 2
  • 13
  • 29
  • Thanks Antoine, but out problem isn't how to store the empty values, but how to detect when the file is deleted. Since the addition of the measures is done inside a Handler, when the file is removed, it isn't automatically removed when we remove the file. We're extending DefaultTransactionHandler, but we don't get any call to the submitStore() method on file deletion. Perhaps we're missing something in our configuration. (to be clear, we do get a call to submitStore() when we add a file) – obrienk May 14 '13 at 15:24
  • 2
    You cannot really trigger deletion of data in ActivePivot when you notice that a file has been deleted: at that point the file does not exist anymore and you cannot read it to locate the keys of the records you may want to delete. ActivePivot users generally copy the file into some 'deleted' directory, so that it is detected by another file watcher, and pushed by the CSV Source into a special transaction handler that peforms the deletions. – Antoine CHAMBILLE May 14 '13 at 15:27
1

You could extend

.CSVSource.onFileAction(IFileWatcher watcher, Collection<String> added, Collection<String> modified, Collection<String> deleted)

by calling super.onFileAction(...) which will process the added and modified files, and add more logic to handle deleted files.

This can be done by updating the facts which has contributed a deleted file in their deletedFile field. Such a field could be filled automatically by adding the FILEPATH metadata in your LoadInstructions.csv file:

Format,FilePattern,FilePath,MetaData
FormatName,formatRegex.csv,someFolder,FILEPATH=N/A

and having a field like:

<field name="FILEPATH" type="string" indexation="dictionary" nullable="true" defaultValue="N/A" />
blacelle
  • 2,199
  • 1
  • 19
  • 28