Skip to main content

Content sources

AWSContentSource - Extract content from AWS S3 bucket

Mandatory settings
KeyTypeDescription
AWS access credentialsAWSConnectionProviderCredentials of the user (must have been granted AmazonS3FullAccess permission).
Optional settings
KeyTypeDescriptionDefault value
ARN key for getAwsPrefixKMS encryptionString
Bucket nameStringName of the S3 bucket where the content is stored.${bucket}
Content path (S3 object key)StringPath leading to S3 object corresponding to the content you intend to extract from the bucket. To use this options, you must enable the content extraction option.

Ex/ ${contentPath}

Extract contentsBooleanAll existing contents of documents will be replaced by the newly found contents, retrieved from the S3 bucket. If the S3 objects are parsed as punnets, then the contents will be attached based on the 'Content path' input field.
Process s3 objects as punnetsBoolean

AlfrescoContentExtractor - Alfresco content extractor using CMIS technology

This alfresco extractor will use the CMIS technology to fetch your document content from a given Alfresco repository

Mandatory settings
KeyTypeDescription
Alfresco connection providerAlfrescoCMISConnectionProviderCMIS version must be 1.1
Optional settings
KeyTypeDescriptionDefault value
Property HelperPropertyHelper
Extract document contentBooleantrue

AlfrescoRestContentExtractor - Alfresco content extractor using Alfresco REST protocol

This task relies on the Alfresco public REST API (with v1.0.4 of the Alfresco REST client) to retrieve documents and metadata into a given Alfresco instance

Mandatory settings
KeyTypeDescription
Alfresco connection providerAlfrescoRESTConnectionProvider
Optional settings
KeyTypeDescriptionDefault value
Date formatStringE MMM dd HH:mm:ss Z YYYY
CMIS queryStringCMIS SQL query, pattern resolvable, to fetch document based on alternative data. Using this feature will create new docs in the punnet with corresponding ID of documents. Consider following this task with a secondary AlfrescoRestContentExtractor task to extract data and contents.
Extract contentBoolean
Extract all versionsBooleanExtract the superseded versions of the documents matching the query
Extract parent siteBooleanIf the document is not stored in an Alfresco site, nothing will happen. Otherwise, the site details will be attached to the punnet dataset.
Map permissionsBooleanMap permissions to either the document, folder or site.
Map parent folderBooleanMap direct parent folder info onto the related document.
Extract folders as treeBooleanExtract folders as tree, with all parent folders. This option must be selected if you wish to map permissions of parent folders.
Extract users as email addressesBoolean

AlfrescoRestSiteExtractor - Alfresco Site extractor using Alfresco REST protocol

This task relies on the Alfresco public REST API (with v1.0.4 of the Alfresco REST client) to retrieve sites into a given Alfresco instance.

Mandatory settings
KeyTypeDescriptionDefault value
Alfresco connection providerAlfrescoRESTConnectionProvider
AFTS queryStringQuery used to retrieve all sites from AlfrescoTYPE:"st:site"

CMContentExtractor - Basic content extractor from Content Manager

This class is dedicated to the extraction of content for the Content Manager solution. You'll have the possiblity to extract annotations, custom properties or even logs.

Mandatory settings
KeyTypeDescription
CM connection providerCMConnectionProvider
Optional settings
KeyTypeDescriptionDefault value
Extact history logsBooleantrue
Extract standard system propertiesBooleantrue
Extract advanced system properties from DKDDO objectBooleantrue
Extract document annotationBooleanfalse
Extract note logsBooleanfalse
Extract custom propertiesBooleantrue
Extract note logs as annotationsBooleanfalse
Extract document contentBooleantrue

CMODContentExtractor - Basic CMOD content extractor

Mandatory settings
KeyTypeDescription
CMOD Connection SettingsCMODConnectionProvider
Optional settings
KeyTypeDescriptionDefault value
Pattern to store resource filesString${resourceId}
Export attached CMOD resourcesBooleantrue

DctmContentExtractor - Extract document-related details from Documentum

This Documentum connector is designed for extraction of document versions, metadata, folders and content (only the 1st content of a document) from a Documentum repository. Multiversion documents will be retrieved from the shared 'i_chronicle_id'. Since Documentum architecture involves particular port and access management, a worker should be started on the same server where Documentum is running;

Make sure to check the basic requirements at the setup for Documentum on the official Fast2 documentation.

Optional settings
KeyTypeDescriptionDefault value
Connexion information to Documentum RepositoryDctmConnectionProvider
Extract foldersBooleantrue
Map empty or unset propertiesBooleanAttach Documentum metadata onto document dataset even if the value is missing or unset.
Extract renditionsBooleanCheck this option to extract renditions of each document. They will be attached as side-contents in the document, with properties populated from original renditions properties.
Whitelist for metadata to extractStringAll values need to be separated by comma ,.
Extract metadataBooleantrue
Continue on failBooleanIf true, any error which occurs during extraction of either metadata, content or folders will trigger an exception. Otherwise, the error will be found in the logs.
Extract contentBooleantrue
Extract all versionsBoolean

FileNet35ContentSource - Extract content from FileNet 3.5

Use this task to retrieve content of documents to extract from a given FileNet instance. This task needs to be preceeded by a FileNet35Source task.

Mandatory settings
KeyTypeDescription
FileNet 3.5 connection providerFileNet35ConnectionProviderConnection parameters to the FileNet instance
Optional settings
KeyTypeDescriptionDefault value
Ignore documents with zero-sized contentBooleanDocument without any content will not be processedfalse

FileNetContentExtractor - Extract document content from FileNet P8

This task is not a real source task. The documents to be extracted are identified by an BlankSource task generating a set of 'empty' Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.

Mandatory settings
KeyTypeDescription
FileNet connection providerFileNetConnectionProviderConnection parameters to the FileNet instance
Optional settings
KeyTypeDescriptionDefault value
Property Helper to usePropertyHelper
Extract object type propertiesBooleanThe FileNet P8 metadata of the document which are Object type will be saved at the punnet levelfalse
Compound parent data for children referencesStringName of the parent document property under which the children properties will be stored.
Object store nameStringName of the repository to extract from
Compound children data to recordStringName of the child property to store in the parent. Consider setting parent data name as well.
Extract FileNet system propertiesBooleanSave the FileNet system properties as document metadatafalse
Default mimetypeStringDefault mimetype to set if the one from FileNet is empty
Skip annotation exceptionsBooleanExtract documents even if related annotations are in exception like null contentfalse
Extract FileNet securityBooleanThe security of the document will be saved at the punnet levelfalse
SQL fetch queryStringUse this SQL to fetch documents based on your criteria.

Ex/ SELECT [Id],[DocumentTitle] FROM Document WHERE [Property] = '${myCriterion}'

Extract folders absolute pathBooleanThe absolute path of the folder inside the FileNet instance will be extracted during the processfalse
Extract contentBooleanThe document content will be extracted during the processtrue
Extract all versionsBooleanExtract the superseded versions of the documents matching the query
Extract annotationsBooleanAll annotations owned by the document will be extractedtrue

FlowerContentExtractor -

Mandatory settings
KeyTypeDescription
Flower component category (DOCUMENT, TASK, FOLDER or VIRTUAL_FOLDER)String
Optional settings
KeyTypeDescriptionDefault value
Extract document annotationsBooleanfalse
Extract component factsBooleanfalse
FlowerDocsConnectionProvider
Extract document file contentBooleanfalse

IDMISContentExtractor - ImageServices WAL JNI-bridged Extractor

This task extracts documents from the Panagon Image Services ECM (indexes, optional content and annotations). One punnet of one document for each ECM document. However, it's not a real source task. The documents to be extracted are identified by a BlankSource task generating a set of empty Punnets, i.e. containing only documents each bearing a document number (documentId) to extract.

Mandatory settings
KeyTypeDescription
PasswordStringPassword of the aforementioned username
Connection organizationStringOrganization name for the connection
Connection domainStringDomain name of the connection
UsernameStringLogin with scope to access the docbase with proper rights
Optional settings
KeyTypeDescriptionDefault value
Annotations in ARender formatBooleanConvert annotations to ARender formatfalse
Annotation converterParseISAnnotationSpecific converter from IS format. Allow to resize the extracted annotations
Annotations in raw formatBooleanSave annotation contents in raw format inside the punnetfalse
Version of libIDMISStringThis task is based on the WAL library and on the specific Fast2 library 'libIDMIS.dll'. This library must be in a directory of the Windows PATH. In the wrapper.conf or hmi-wrapper.conf file, activate the use of this library: wrapper.java.library.path.increment = ../libIDMIS/w32For the moment, only 32-bit libraries are configuredlibIDMIS-1.0.15
Test scenariosBooleanEmpty testing stub instead of libIDMISfalse
Connection terminalStringTerminal name for the connection
Use opacity for annotationsBooleanfalse
Unrecognized annotation file pathStringPath of the alternative annotation xml file for unrecognized annotation. If not specified the punnet will go in exception
Extract document contentBooleanThe document will be extracted with its contenttrue
Extract document annotationBooleanThe associated annotations will be extractedtrue

MDOParserExternalContent - Parse FWTF (Fixed Width Text File) with external content to a punnet description

An MDO file is a flat file defined such as: each line corresponds to a document and each line contains information about the document The extraction of information from each line is based on a CSV configuration file, which provides the name of the metadata to be inserted into the punnet document, as well as its characteristics.

It consists of the following columns, separated by a comma:

  • Field: name of the metadata to add \n
  • Length: length of the metadata. If the value is greater than this length, then it will be truncated. If the value is lower, it will be completed by spaces on the right \n
  • Offset: position in MDO file \n
  • Mandatory: Y / N \n
  • Occurs: number of occurrences allowed for the field. The successive values ​​of the field will then be added to the values ​​of the metadata (respecting the Length parameter for each one) \n
  • Type: Type of metadata to add to the punnet document \n

The MDOParserExternalContent task is used to retrieve external content for each document. To do this, the name of the column defining the content path is specified in the task settings.

Mandatory settings
KeyTypeDescription
MDO format specification file pathStringCSV configuration absolute file path containing MDO format specification
Optional settings
KeyTypeDescriptionDefault value
File scannerFileScannerRecovers your files
Date formatStringDate format used in MDO file. Must be the same for each line of the documentyyyy-MM-dd
Property name containing path contentStringName of the field in the configuration file that contains the path to the content. If not filled, the content will not be saved in the punnet
Create one punnet for each document of FWTFBooleanIf true then a punnet with one document will be created for each entry in the MDO file. Otherwise, one punnet will be created containing as many documents as there are entries in the MDO filefalse
Dataline property nameStringName of the metadata that will contain the MDO line read. If not specified, the line read will not be saved in the punnet
contentLocationAbsoluteBoolean
Last punnet property nameStringData name indicating which punnet is the last of document in punnet. If null, data isn't added in punnet. For multipunnet case only

MDOParserInternalContent - FWTF (Fixed Width Text File) parser with internal content

Like the MDOParserExternalContent task, the MDOParserExternalContent source allows you to parse each line of the MDO file in Punnet. The difference between these two tasks is that the content is stored inside the MDO itself. The start and end of the content is defined by a tag specified in the task settings

Mandatory settings
KeyTypeDescription
MDO format specification file pathStringCSV configuration absolute file path containing MDO format specification
Optional settings
KeyTypeDescriptionDefault value
File scannerFileScannerRecovers your files
Date formatStringDate format used in MDO file. Must be the same for each line of the documentyyyy-MM-dd
End tagStringEnd tag property name signifying the end of the content
Create one punnet for each document of FWTFBooleanIf true then a punnet with one document will be created for each entry in the MDO file. Otherwise, one punnet will be created containing as many documents as there are entries in the MDO filefalse
Dataline property nameStringName of the metadata that will contain the MDO line read. If not specified, the line read will not be saved in the punnet
Last punnet property nameStringData name indicating which punnet is the last of document in punnet. If null, data isn't added in punnet. For multipunnet case only
Original text content property nameStringData name containing original text content. If null, data isn't added in the punnet

OpenTextContentSource - OpenText content extractor using OpenText REST protocol

Mandatory settings
KeyTypeDescription
OpenText credentialsOpenTextCredentials
OpenText clientOpenTextRestClient
Optional settings
KeyTypeDescriptionDefault value
Extract all versionsBooleanExtract the superseded versions of the documents matching the query
Extract document metadataBooleanSave metadata as document metadatafalse
Extract document categoriesBooleanSave categories as document metadatafalse
Extract contentBooleanThe document content will be extracted during the processtrue
Ticket periodIntegerTime in seconds between two ticket creation60

SQLContentExtractor - Extract document content from SQL

Extract clob and blob object-types. Classic types like varchar are extraced as well

Mandatory settings
KeyTypeDescription
SQL connection providerSQLQueryGenericCaller
SQL queryPatternSelect precisely documents you want to extract through a classic SQL query
Optional settings
KeyTypeDescription
SQL mapping for contentString/String mapMapping of SQL properties to document content.