Flume Spooling Directory Source has no ability for deleting ignored files. It deletes immediatly/never only processed file(s).
There are three way to produce a solution for this problem.
First, you can fix the problem explicitly (with shell script or any other small program which can be find the file which have ignored pattern and delete it). In my opinion it is not a good way to do it.
Second, you can write your own custom spooling directory source with implementing the Flume Source Interface. It requires a lot of effort and a hard challenge for this kind of small problem.
Third, abusive solution, you can use Morphline Interceptor. Morphline interceptor is mentioned in this part of the Flume User Guide. Also you may want to take a look at Morphline Reference
Interceptors get the event from source, do some process, and finally forward it to the channel as you know.
If you choose the third solution you have to use kite-sdk for to do this. You have to add the Cloudera's Kite Morphlines Core dependency to your FLUME_CLASSPATH using flume-env.sh or simply add the jar in $APACHE_FLUME_HOME/lib
In this solution, your example Flume configuration will be:
a1.channels = ch-1
a1.sources = src-1
a1.sinks = k1
a1.sources.src-1.interceptors = morph
a1.sources.src-1.type = spooldir
a1.sources.src-1.channels = ch-1
a1.sources.src-1.spoolDir = /spool/dir
a1.sources.src-1.fileHeader = true
a1.sources.src-1.ignoredPattern = 'whatever'
a1.sources.src-1.interceptors.morph.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
a1.sources.src-1.interceptors.morph.morphlineFile = /etc/flume-ng/conf/morphline.conf
a1.sources.src-1.interceptors.morph.morphlineId = morphline1
a1.sinks.k1.type = file_roll
a1.sinks.k1.channel = ch-1
a1.sinks.k1.sink.directory = /roll/dir
Then you can create a custom morphline interceptor file as $APACHE_FLUME_HOME/conf/morphline.conf
In this conf file you can process what if you want, just be careful about the record object is returned to the child process.
It is also not a good solution but you can write your Java Code for doing any process during the Flume's transactions. On each event you can check the directory and if the file is unnecessary for you you can delete it. (You must be sure about the user which is run the java process have permissions in this directory)
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**"]
commands : [
{
readJson { }
}
{
java {
imports : """
import java.io.File;
import java.io.IOException;
"""
code : """
try {
// This code from my flume agent, you may want to use it, but it is not necessary
// JsonNode rootNode = (JsonNode) record.getFirstValue(Fields.ATTACHMENT_BODY);
// You can traverse in the relevant directory
// and find the ignored pattern manually
// then you can delete it with java code
//Second part of my code
//String rootNodeStr = rootNode.toString();
//record.put("rootNodeStr", rootNodeStr.getBytes(StandardCharsets.UTF_8));
}
} catch (IOException e) {
logger.error("So sad",e);
}
return child.process(record);
"""
}
}
{
setValues {
_attachment_body : "@{rootNodeStr}"
}
}
]
}
]
I hope it would be helpful.