Skip to main content

Sources

AWSSource - Complete extractor module from AWS S3

This AWS extractor performs from a list of sources the extraction of your document content. Many options (suffix, prefix...) exist to optimally specify the documents you want to take into account

Mandatory settings
KeyTypeDescription
AWS connection providerAWSConnectionProviderMust have AmazonS3FullAccess permission
Source bucketsString listBuckets where folders are stored
Optional settings
KeyTypeDescriptionDefault value
Accept quotes in valuesBooleanIf enabled, this option will accept quotes in values
AWS start-after keyStringAbsolute path of S3 object to start after
ARN key for KMS encryptionString
New column names to setString listIf empty, populated from first line
Replace empty titlesBooleanIf enabled, any empty title in the CSV file will be replaced by the default value. If several titles miss, the default title will be suffixed with an incremental index.
AWS suffixStringS3-object will be extracted if its key has such suffix
Number of lines to skipIntegerThis option helps to skip lines, meaning their data will not be processed. By default, only the 1st line is skipped considering it surely consists in the headers row

Ex/ In a file of 10 lines, putting '3' in the input field will skip the 1st, 2nd and 3rd lines

1
Default column titleStringDefault value used for untitled columns. Will be incremented with a number if many. Will only be used if the replace empty titles option is enabled.Untitled
Continue processing CSV on failBooleanIf enabled, the following errors will not trigger an exception:
- CSV file does not exist
- CSV file is empty (no line)
- CSV file has only headers and no line for documents.

Note that if you give 5 CSV paths and the number 3rd is in error, only the Fast2 logs will provide information regarding the failing CSV file.
Source foldersString listFolders in the S3 bucket(s) containing the files to migrate
AWS prefixStringS3-object will be extracted if its key has such prefix
Stop at first error in CSVBooleanFast2 will automatically be stopped at the first error encountered in the CSVfalse
Column headers in first CSV file onlyBooleanOnly read column definitions from the first parsed CSV filefalse
Documents per punnet from CSVIntegerNumber of documents each punnet will carry when processing a CSV file

Ex/ By setting this value to 2, each punnet created will contain 2 documents

1
CSV separatorStringSeparator between each value. This option will be ignored if 'Process files as list of punnets' is disabled.,
Process files as list of punnetsBooleanThe expected format is a CSV file (1 row for headers, next rows for 1 punnet each), but the .csv extension is not mandatory. Only single-documents punnets will be created (ex/ not working for multiversions documents). Multivalue data will be concatenated to one whole String value. The first line of the file will be considered as CSV header line.
extraColumnsString list

AlfrescoRestSource - Alfresco extractor using Alfresco REST protocol

This task relies on the Alfresco public REST API (with v1.0.4 of the Alfresco REST client) to retrieve documents and metadata into a given Alfresco instance

Mandatory settings
KeyTypeDescription
CMIS query or AFTS queryStringQuery used to retrieve the objects from Alfresco

Ex/ SELECT * FROM cmis:document WHERE cmis:name LIKE 'test%' or cm:title:'test%'

Alfresco connection providerAlfrescoRESTConnectionProvider
Optional settings
KeyTypeDescriptionDefault value
Max item to return per callIntegerSet the paging max items threshold to specify the number of Alfresco objects to retrieve per call.100
Fields to extractStringThe less the better ! Only the 'id' is necessary to start the migration workflow. Separate the different values with a comma, no space. Use properties from com.alfresco.client.api.common.constant.PublicAPIConstant library.

Ex/ id,name

id

AlfrescoSource - Alfresco extractor using CMIS technology

Through an SQL query, this alfresco extractor will use the CMIS technology to fetch the content, the metadata and the annotations of your documents from a given Alfresco repository

Mandatory settings
KeyTypeDescription
SQL query to extract documentsStringFast2 will retrieve all documents, folder, references, items and metadata matching this query. If the query is exhaustively specifying data to extract, uncheck the 'Extract document properties'. The data cmis:objectId will be mandatory.

Ex/ SELECT * FROM cmis:document

Alfresco connection providerAlfrescoCMISConnectionProviderCMIS version must be 1.1
Optional settings
KeyTypeDescriptionDefault value
Property HelperPropertyHelper
Number of items per result pageIntegerMaximum number of results provided1
Number of documents per punnetInteger1
Extract document propertiesBooleantrue
Keep folder structure within documentBooleanrequires extractProperties to be truetrue
Extract document contentBooleanDoes not work asynchronouslyfalse

BlankSource - Empty punnet generator

This source builds a punnet list containing one or more empty documents. Each document will only contain its identifier : documentId. This punnet can then be enriched by other steps in the processing chain.

Mandatory settings
KeyTypeDescription
Document IDsDocumentIdListSource list of documents to extract from their IDs
Optional settings
KeyTypeDescriptionDefault value
Document per punnetIntegerNumber of documents each punnet punnet must carry on

Ex/ The input file includes 10 lines meaning 10 document identifiers to extract. By setting this value to 2, Fast2 will create 5 punnets, each containing 2 documents

1

CMODSource - Complete extraction module from a CMOD environment

This task is used to extract documents in the Content-Manager On Demand ECM. One CMOD document is equivalent of 1 punnet of 1 document. Indexes, optional content and annotations will also be extracted. A WAL request is made to find the corresponding documentId in ImageServices. The metadata extraction is then carried out. Relative data are stored in each document of the punnet being processed.Note: All Image Services properties are exported systematically. This task is not a real source task. The documents to be extracted are identified by an BlankSource task generating a set of empty Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.This task relies on the 'libCMOD.dll' library. This library must be in a directory of the Windows PATH. In the wrapper.conf or hmi-wrapper.conf file, activate the use of this library: wrapper.java.library.path. <increment> = ../libCMOD/dll32For the moment, only 32-bit libraries are configured

Mandatory settings
KeyTypeDescription
CMOD connection providerCMODConnectionProvider
Folders to extractString listList of CMOD folders which will be scanned. Additional level(s) of filter can be used with the SQL query down below.
Optional settings
KeyTypeDescriptionDefault value
SQL query to extract documentsStringEnter here the WHERE clause used to filter documents. Since this request is made on the indexes of CMOD documents, the property used to filter out the documents need to be indexed in CMOD prior to any extraction.

Ex/ WHERE Date = '2012-11-14'

Extract document annotationsBooleanThe document annotation will be extracted during the processfalse
Number of documents per punnetInteger1
Extract document contentBooleanThe document content will be extracted during the processfalse
Maximum results countInteger2000

CMSource - Complete extractor from Content Manager solution

Mandatory settings
KeyTypeDescription
CM connection providerCMConnectionProvider
SQL queryStringSelect precisely documents you want to extract through a classic SQL query
Optional settings
KeyTypeDescriptionDefault value
Extract standard system propertiesBooleanfalse
Extract advanced system properties from DKDDO objectBooleanfalse
Maximum results returned by the queryIntegerSet to 0 to disable limiting number of results0
Number of documents per PunnetIntegerSet the number of documents each punnet will hold1)
Extract custom propertiesBooleanfalse
Query typeIntegerSee com.ibm.mm.beans.CMBBaseConstant for further details. Default value is XPath (7)7

CSVSource - CSV file parser

This task can be used to start a migration from a CSV file. By default, the first line of your file is considered as the column headers. Whether the column values are surrounded with double-quotes (\_) or not, the CSVSource task will process either way. If you need to force the document ID for the whole process, use the metadata documentId.

Mandatory settings
KeyTypeDescription
CSV pathsString listList of paths to CSV files to be parsed. Check out the following examples for allowed formats

Ex/
C:/samples/myDocument.csv
C:\\samples\\myDocument.csv
C:\\\\samples\\\\myDocument.csv
\"C:\\samples\\myDocument.csv\"
C:/samples/${map}.csv

Optional settings
KeyTypeDescriptionDefault value
Accept quotes in valuesBooleanIf enabled, this option will accept quotes in values
CSV file path metadataStringPunnet property name containing the CSV file path. Set to empty or null to disable
File name for error CSV fileStringThis option might be useful when you need to have a specific file name where to register the lines in error of your CSV file. The name can both be linked to some workflow properties surrounded with ${...} (ex/ campaign, punnetId, etc) or hard-written. Warning: This value can be overwritten by the Associate CSV-error file with original CSV filename optionlines_in_error.csv
New column names to setString listIf empty, populated from first line
Replace empty titlesBooleanIf enabled, any empty title in the CSV file will be replaced by the default value. If several titles miss, the default title will be suffixed with an incremental index.
Folder path for error CSV fileStringThe error file will be stored in your system. You can choose where by configuring this very field. Here as well you can set the path either with workflow properties (${...}) or hard-write it./csv_errors/
Number of lines to skipIntegerThis option helps to skip lines, meaning their data will not be processed. By default, only the 1st line is skipped considering it surely consists in the headers row

Ex/ In a file of 10 lines, putting '3' in the input field will skip the 1st, 2nd and 3rd lines

1
Default column titleStringDefault value used for untitled columns. Will be incremented with a number if many. Will only be used if the replace empty titles option is enabled.Untitled
Generate hash of CSV contentBooleanThe hash of the content will be generated and stored in the punnet among a property named hashDatafalse
Continue processing CSV on failBooleanIf enabled, the following errors will not trigger an exception:
- CSV file does not exist
- CSV file is empty (no line)
- CSV file has only headers and no line for documents.

Note that if you give 5 CSV paths and the number 3rd is in error, only the Fast2 logs will provide information regarding the failing CSV file.
File encodingStringCSV encoding character setUTF-8
Associate CSV-errors file with original CSV filenameBooleanThis checkbox allows you to match your error file with your original CSV file, just suffixing the original name with '_KO'. That way, if you use multiple files, all the lines in error will be grouped by file name. Using this option overwrite the File name for error CSV file, but still can be used in addition of the Folder path for error CSV filefalse
Stop at first error in CSVBooleanFast2 will automatically be stopped at the first error encountered in the CSVfalse
File scanner (Deprecated)FileScannerTHIS OPTIONS IS DEPRECATED, consider using the 'CSV paths' instead.
Column of document IDStringColumn header of the metadata to set as the document IDdocumentId
Document property name containing CSV file pathStringSet to empty or null to disable
Move to path when finishedStringConsider using ${variable} syntax
Column headers in first CSV file onlyBooleanOnly read column definitions from the first parsed CSV filefalse
Documents per punnet from CSVIntegerNumber of documents each punnet will carry when processing a CSV file

Ex/ By setting this value to 2, each punnet created will contain 2 documents

1
CSV separatorStringSeparator between each value. This option will be ignored if 'Process files as list of punnets' is disabled.,
Extra columnsString listList of the form target=function:arg1:arg2:...

DctmSource - Complete extractor from Documentum

This connector will extract basic information from the source Documentum repository. Since Documentum architecture involves particular port and access management, a worker should be started on the same server where Documentum is running.

Make sure to check the basic requirements at the setup for Documentum on the official Fast2 documentation.

Mandatory settings
KeyTypeDescription
Connection information to Documentum RepositoryDctmConnectionProvider
The DQL Query to run to fetch documentsStringThe less attributes you fetch, the faster the query will be executed on the Documentum side.

Ex/ SELECT r_object_id FROM dm_document WHERE ...

Connection information to Documentum server machineDctmSshProvider
Optional settings
KeyTypeDescriptionDefault value
Batch sizeIntegerIf size is <1, the size will be defined from the Documentum server-side.50
SSH clientDctmSshClientSSH client used to establish the connection with the Documentum server

EmbeddedDbSourceRest - Perform requests on Fast2 database without any size restriction

This task is used to retrieve punnets from a previously executed campaign.

Mandatory settings
KeyTypeDescriptionDefault value
Embedded db portInteger1790
Embedded db hostnameStringlocalhost
Embedded db schemeStringhttp
Campaign nameStringThe campaign name that you would like to have the data.

Ex/ myMap_Run1

Step IdStringWill return the punnets of this task (UUID of the step)

FileNet35Source - Complete extractor from FileNet 3.5

The FileNet35Source retrieves existing documents from the FileNet P8 3.5 ECM through a query. This punnet will contain the metadata of the recovered document, its content and annotations

Mandatory settings
KeyTypeDescription
FileNet 3.5 connection providerFileNet35ConnectionProviderConnection parameters to the FileNet instance
SQL queryStringSQL query corresponding to the list of documents to extract
Optional settings
KeyTypeDescriptionDefault value
Attribute used for Document IDsStringName of the FileNet P8 3.5 attribute corresponding to the values ​​retrieved in the Document IDs listId
Empty punnet when no resultBooleanAn empty punnet will be created even if the result of the query is nullfalse
Documents per punnetIntegerNumber of documents each punnet punnet must carry on

Ex/ By setting this value to 2, each punnet created will contained 2 documents

1
Document IDsDocumentIdListSource list of documents to extract from their IDs

FileNetSource - Complete extractor from FileNet P8

The FileNetSource source retrieves existing documents from the FileNet P8 5.x ECM through an SQL query. This punnet will contain the metadata of the recovered document, security information and parent folders.

Mandatory settings
KeyTypeDescription
Object store nameString listName of the repository to extract from
SQL queryStringSQL query corresponding to the list of documents to extract
FileNet connection providerFileNetConnectionProviderConnection parameters to the FileNet instance
Optional settings
KeyTypeDescriptionDefault value
Number of entries per result pageIntegerNumber of results returned per page by the FileNet P8 query1000
Documents per punnetIntegerNumber of documents each punnet punnet must carry on

Ex/ By setting this value to 2, each punnet created will contained 2 documents

1
Extract object type propertiesBooleanThe FileNet P8 metadata of the document which are Object type will be saved at the punnet levelfalse
Extract FileNet system propertiesBooleanSystem metadata during extraction is saved at the punnet levelfalse
Properties to extractString listExhaustive list of FileNet metadata to extract. If empty, all properties will be extracted.
Extract FileNet securityBooleanThe security of the document will be saved at the punnet levelfalse
Extract documents instance informationsBooleanThe fetchInstance method makes a round trip to the server to retrieve the property values of the ObjectStore objectfalse
Extract folders absolute pathBooleanThe absolute path of the folder inside the FileNet instance will be extracted during the processfalse
Throw error if no resultBooleanThrow exception when SQL Query finds no result.

FlowerSource - Flower extractor

Allows components extraction from Flower using JSON formatted Flower request. Components can be documents, folders, virtual folders or tasks.

Mandatory settings
KeyTypeDescription
FlowerDocs connection providerFlowerDocsConnectionProvider
Flower component categoryStringChoose among DOCUMENT, TASK, FOLDER or VIRTUAL_FOLDER
JSON Flower Search RequestStringPatterns can be used too

LocalSource - A generic broker for wildcarded punnet lists

This class will search for local files to analyze them from a defined path

Mandatory settings
KeyTypeDescription
Files pathsString listList of paths to files to be parsed. Patterns ${...} are not supported. The threshold can be maxed-out, exclusions are not supported.

Ex/
C:/samples/myDocument.txt -> retrieve only one document
C:\\samples\\myDocument.txt
C:\\\\samples\\\\myDocument.txt
\"C:\\samples\\myDocument.txt\"
C:/samples/*.* -> retrieve all files directly at the root of the samples/ folder, no matter their extension
C:/samples/** -> retrieve all files directly at the root of the samples/ folder, as well as file inside subfolders
C:/samples/**/*.yes -> retrieve all files directly at the root of the samples/ folder, as well as file inside subfolders, whose extension is .yes.

Optional settings
KeyTypeDescriptionDefault value
File scanner (Deprecated)FileScannerTHIS OPTIONS IS DEPRECATED, consider using the 'Files paths' instead.
Fallback XML/Json parsingBooleanIf true, the file will be added as document content in the punnet when XML parsing fails. Consider adding this file as a regular file (not an XML)false
Skip parse exceptionsBooleanThe task does not throw an error when XML parsing fails. Do not stop parsing and resume to next candidatefalse
XSL Stylesheet pathStringThe XSL stylesheet file to use when parsing XML files
Number of files per punnetIntegerIf the files are not in XML format, the punnet will contain as many documents as defined in this option1
Allow any kind of fileBooleanAll types of files can be added. Otherwise, only XML-based Punnet descriptions are allowedtrue
Skip XML parsingBooleanThe XML file will not be parsed before being added to the punnet. Not recommended in most casesfalse
Maximum number of files scannedIntegerIf this field is completed, the number of files scanned will not exceed the value filled in. Leave empty to retrieve all files matching input pattern filter

MailSource - Complete extractor from mail box

The MailSource task extracts messages from an e-mail box. Each extracted message will correspond to a punnet, one document per punnet

Mandatory settings
KeyTypeDescription
MailBox connection providerMailBoxProvider
Optional settings
KeyTypeDescriptionDefault value
Search in HeadersStringEnter a pair of header and pattern to search separated by a colon :.

Ex/ cc:copy

Header namesString listList of header names (case-sensitive) to retrieve from the mail. Message-Id, Subject, From, To, Cc and Date are added by default
Start IdIntegerIndex from which the first message should be extracted1
Update document with mail root folder nameStringName of the metadata to add to the document. If filled, the full name of the source folder is indexed in this metadata. Set to null or empty to disable updating
Folders to scanString listList of files to scan in the mailbox. If filled, override root folder name from MailBox connection provider configuration
AND condition for searchBooleanChecking this options will only retrieve messages matching all search conditions possible (unread messages, text in header, body or subject). If unchecked, the 'OR' operand will be applied.
Forbidden charactersStringList of characters to remove from Message-Id when building the DocumentId`<>:"/\
Search in SubjectString
Search in BodyString
Only unread messagesBoolean

OpenTextSource - OpenText extractor using OpenText REST protocol

Mandatory settings
KeyTypeDescription
OpenText credentialsOpenTextCredentials
OpenText clientOpenTextRestClient
Node IdInteger
Optional settings
KeyTypeDescriptionDefault value
Order by named columnStringFormat can be 'name' or 'asc_name' or 'desc_name'. If the prefix of asc or desc is not used then asc will be assumed

Ex/ asc_name

Ticket periodIntegerTime in seconds between two ticket creation60

RandomSource - Random punnet generator

Randomly produces punnets containing documents, metadata, content...

Mandatory settings
KeyTypeDescriptionDefault value
Number of punnet to generateIntegerIf 'minimum punnet number' is set, this value here will be considered as the higher threshold1000
Optional settings
KeyTypeDescriptionDefault value
Maximum document numberIntegerExcluded1
Minimum metadata numberIntegerIncluded1
Minimum punnet numberIntegerIf not set, the number of generated punnets will be exactly the number set at 'Number of punnets to generate'
Maximum number of metadata valuesIntegerIncluded6000
Minimum number of metadata valuesIntegerIncluded0
Maximum metadata numberIntegerExcluded10
Minimum document numberIntegerIncluded1

SQLSource - Complete extractor from SQL database

Extract and map to punnet or document layout specified properties

Mandatory settings
KeyTypeDescription
SQL connection providerSQLQueryGenericCaller
SQL queryStringSelect precisely documents you want to extract through a classic SQL query
Optional settings
KeyTypeDescriptionDefault value
Property name to group by documentStringColumn used to group lines by document. If used set an 'ORDER BY' in your sql query
SQL mapping for punnetString/String mapMapping of SQL properties to punnet metadata. Use 'punnetId' for Punnet Id
Allow duplicates dataBoolean
Property name to group by punnetStringColumn used to group lines by punnet. If used set an 'ORDER BY' in your sql query
SQL mapping for documentString/String mapMapping of SQL properties to document metadata. Use 'documentId' for Document Id, otherwise the first column will be used as documentId
Push remaining, non-mapped columns as document propertiesBooleantrue

ZipSource -