File — provides access to file systems
The File component provides access to file systems, allowing files to be processed by any other Apache Camel components or messages from other components to be saved to disk.
The URI format for a file endpoint is one of:
file:directoryName
[?options
] file://directoryName
[?options
]
Table 7, “Common file options” list the options that can be set on any file endpoint.
Table 7. Common file options
Name | Default Value | Description |
---|---|---|
autoCreate
| true
| Automatically create missing directories in the file's pathname. For the file consumer, that means creating the starting directory. For the file producer, it means the directory to where the files should be written. |
bufferSize
| 128kb | Write buffer sized in bytes. |
fileName
| null
| Use an expression language to dynamically set the filename. For consumers,
it's used as a filename filter. For producers, it's used to evaluate the filename
to write. If an
expression is set, it take precedence over the CamelFileName header.
(Note: The header itself can also be an expression).
The expression options support both
String and Expression types. If the expression is
a String type, it is always evaluated
using the file language. If the expression is an Expression type, the
specified Expression type is
used - this allows you, for instance, to use OGNL expressions.
For the consumer, you can use it to filter filenames, so you can for instance consume
today's file using the file language syntax:
mydata-${date:now:yyyyMMdd}.txt . |
flatten
| false
| Flatten is used to flatten the file name path to strip any leading paths, so it's
just the file name. This allows you to consume recursively into
sub-directories, but when you eg write the files to another directory
they will be written in a single directory. Setting this to
true on the producer enforces that any file name
received in CamelFileName header will be stripped for
any leading paths. |
charset
| null
| Specifies the encoding of the file, and camel will set the Exchange.CHARSET_NAME
property with the value. |
Table 8, “File consumer options” list the options that can be set on a file consuming endpoint.
Table 8. File consumer options
Name | Default Value | Description |
---|---|---|
initialDelay
| 1000
| Milliseconds before polling the file/directory starts. |
delay
| 500
| Milliseconds before the next poll of the file/directory. |
useFixedDelay
| false
| Set to true to use fixed delay between pools, otherwise fixed rate
is used. See ScheduledExecutorService in JDK for details. |
recursive
| false
| If a directory, will look for files in all the sub-directories as well. |
delete
| false
| If true , the file will be deleted after it is processed |
noop
| false
| If true , the file is not moved or deleted in any way. This option is
good for readonly data, or for ETL type requirements. If
noop=true , Apache Camel will set idempotent=true as
well, to avoid consuming the same files over and over again. |
preMove
| null
| Use an expression to dynamically set the filename when moving it
before processing. For example, to move in-progress
files into the order directory set this value to
order . |
move
| .camel
| Use an expression to dynamically set the filename when moving it after processing. To move files into a .done
subdirectory just enter .done . |
moveFailed
| null
| Use an expression to dynamically set the filename when moving failed files after processing. To move files into a error
subdirectory just enter error . Note:
When moving the files to another location it can/will handle the error when you move it to another location so Apache Camel cannot
pick up the file again. |
include
| null
| Is used to include files, if filename matches the regex pattern. |
exclude
| null
| Is used to exclude files, if filename matches the regex pattern. |
antInclude
| null
| Ant-style filter inclusion. For example, antInclude=*/.txt . You
can use comma-delimited format to specify multiple inclusion. This
option is also available in the FTP component. |
antExclude
| null
| Ant-style filter exclusion. For example, antExclude=*/.txt .
antExclude takes precedence over antInclude when both are used. You can
use comma-delimited format to specify multiple exclusions. This option
is also available in the FTP component. |
idempotent
| false
| Option to use the Idempotent Consumer EIP pattern
to let Apache Camel skip already processed files. Will by default use a memory based LRUCache
that holds 1000 entries. If noop=true then idempotent will be enabled
as well to avoid consuming the same files over and over again. |
idempotentRepository
| null
| Pluggable repository as a
org.apache.camel.processor.idempotent.MessageIdRepository class.
Will by default use MemoryMessageIdRepository if none is specified and
idempotent is true . |
inProgressRepository
| memory
| Pluggable in-progress repository as a
org.apache.camel.processor.idempotent.MessageIdRepository class. The in-progress
repository is used to account the current in progress files being consumed. By default a
memory based repository is used. |
filter
| null
| Pluggable filter as a As of Apache Camel 2.10, you can also filter directories using
the This option is also available in the FTP component. |
sorter
| null
| Pluggable sorter as a java.util.Comparator<org.apache.camel.component.file.GenericFile> class. |
sortBy
| null
| Built-in sort using the File Language. Supports nested sorts, so you can have a sort by file name and as a 2nd group sort by modified date. See sorting section below for details. |
readLock
| markerFile
|
Used by consumer, to poll only files if it has exclusive read-lock on the file (i.e. the file is not in-progress or being written). Apache Camel will wait until the file lock is granted. The
As of Apache Camel 2.10, the |
readLockTimeout
| 0
| Optional timeout in milliseconds for the read-lock, if supported by the read-lock. If
the read-lock could not be granted and the timeout triggered, then
Apache Camel will skip the file. At next poll Apache Camel, will try the
file again, and this time maybe the read-lock could be granted.
Currently fileLock , changed and
rename support the timeout. |
exclusiveReadLockStrategy
| null
| Pluggable read-lock as an implementation of the
GenericFileExclusiveReadLockStrategy interface. |
maxMessagesPerPoll
| 0
| An integer that defines the maximum number of messages to gather per poll. By
default (0 ), no maximum is set. Can be used to set a limit of,
for example, 1000 to avoid having the server read thousands of files as it starts
up. To disable this option, set it to 0 or a negative integer. |
eagerMaxMessagesPerPoll
| true
| Specifies whether the limit defined by This option is also available in the FTP component. |
processStrategy
| null
| A pluggable
GenericFileProcessStrategy allowing
you to implement your own readLock option or similar. Can also be used
when special conditions must be met before a file can be consumed, such as a special
ready file exists. If this option is set then the
readLock option does not apply. |
consumer.bridgeErrorHandler
| false
| Enables the consumer to bridge over to the Camel error handler, so exceptions that occur while the consumer attempts to picked up files are processed as messages and handled by the route's error handler. By default ( This option is also available in the FTP component. |
scheduledExecutorService
| null
| Enables you to configure a custom thread pool that multiple file consumers can share, reducing the overall number of threads in a JVM. By default, each consumer has its own single-threaded thread pool. This option is also available in the FTP component. |
startingDirectoryMustExist
| false
| Whether the starting directory must exist. Mind that the autoCreate option
is default enabled, which means the starting directory is normally auto-created if it
doesn't exist. You can disable autoCreate and enable this to ensure the
starting directory must exist. Will throw an exception, if the directory doesn't exist.
|
directoryMustExist
| false
| Similar to startingDirectoryMustExist but this applies during polling
recursive sub-directories. |
Table 9, “File producer options” list the options that can be set on a file producing endpoint.
Table 9. File producer options
Name | Default Value | Description |
---|---|---|
fileExist
| Override
|
Specifies what to do if a file with the same name already exists. The following values can be specified:
|
tempPrefix
| null
| This option is used to write the file using a temporary name and then, after the write is complete, rename it to the real name. Can be used to identify files being written and also avoid consumers (not using exclusive read locks) reading in progress files. Is often used by FTP when uploading big files. |
tempFileName
| null
| The same as
tempPrefix option but offering a more fine grained control on the
naming of the temporary filename as it uses the File
Language. |
keepLastModified | false | Specifies if the file will keep the last modified time stamp from
the source file (if any). The Exchange.FILE_LAST_MODIFIED
header is used to store the time stamp. If the time stamp exists and the option is
enabled it will set this time stamp in the exchange header on the written file. |
eagerDeleteTargetFile
| true
| Specifies whether or not to eagerly delete any existing target file.
This option only applies when you use fileExists=Override and
the tempFileName option. |
The following headers are supported by this component:
Table 10. File producer headers
Header | Description |
---|---|
CamelFileName
| Specifies the name of the file to write (relative to the endpoint directory). The name
can be a String ; a String with a File Language or Simple
expression; or an Expression object. If it's
null then Apache Camel will auto-generate a filename based on the message
unique ID. |
CamelFileNameProduced
| The actual absolute filepath (path + name) for the output file that was written. This header is set by Camel and its purpose is providing end-users with the name of the file that was written. |
Table 11. File consumer headers
Header | Description |
---|---|
CamelFileName
| Name of the consumed file as a relative file path with offset from the starting directory configured on the endpoint. |
CamelFileNameOnly
| Only the file name (the name with no leading paths). |
CamelFileAbsolute
| A boolean option specifying whether the consumed file denotes an
absolute path or not. Should normally be false for relative paths.
Absolute paths should normally not be used but we added to the move option to allow moving
files to absolute paths. But can be used elsewhere as well. |
CamelFileAbsolutePath
| The absolute path to the file. For relative files this path holds the relative path instead. |
CamelFilePath
| The file path. For relative files this is the starting directory + the relative filename. For absolute files this is the absolute path. |
CamelFileRelativePath
| The relative path. |
CamelFileParent
| The parent path. |
CamelFileLength
| A long value containing the file size. |
CamelFileLastModified
| A Date value containing the last modified timestamp of the file.
|
As the file consumer is BatchConsumer
it supports batching the files it
polls. By batching it means that Apache Camel will add some properties to the exchange
so you know the number of files polled the current index in that order.
Table 12. Exchange properties used by a file consumer
Property | Description |
---|---|
CamelBatchSize
| The total number of files that was polled in this batch. |
CamelBatchIndex
| The current index of the batch. Starts from 0. |
CamelBatchComplete
| A boolean value indicating the last exchange
in the batch. Is only true for the last entry. |
This allows you for instance to know how many files exists in this batch and for instance let the Aggregator aggregate this number of files.