OIC has for some time now provided an FTP adaptor and more recently included a full FTP server capability. But both have limits on the file size and capacity. The file constraints (1GB for a file and 500GB for the FTP server) shouldn’t be an issue for day to day activities. But OIC is often used to support SaaS Financials and other cloud solutions which do have monthly process cycles which can generate significant data volumes, for example, payroll data. The question is how to handle such data with such constraints?
Data Integration?
There is a school of thought that points to the possibility when handling such large data volumes we should consider using Data Integration rather than a more event-centric integration tool. Personally, I think there is a lot of validity in the argument, and anyone dealing with such bulky data activities should review and question if it is a better answer.
That said, there are cases where it does stand-up. For example:
- If an organization is transitioning to a more event-driven or at least micro-batch model, you have to start the transition somewhere, but trying to line up changes everywhere can be problematic, so we have to start somewhere. Building an integration process so you have an event model developed, but in the interim, you need to take that bulk mechanism and convert it to a small stream of events.
- You may be working with a bulk data extract and only need a small subset of the data provided, it won’t help if the data is also represented using a verbose notation such as XML.
Other Approaches
How to overcome the constraint? Oracle databases aren’t so constrained, and SQLLoader can provide an easy means to ingest the data into a staging table. The benefit of this is:
- if you’re only needing a subset of the data you can pull just those columns from the table.
- the bulk of the use of XML to be self-describing can be shed through using the DB schema as being more prescriptive.
- SQL scripts can handle the checksum records removing that data and overhead from the integration process, leaving you to concentrate more on the business process.
If the data is still substantial once in the database there are a number of strategies to consume the data in more manageable chunks, such as
- Running SQL script on the database that takes each row and calls the OIC as a restful API point. This approach is potentially very interesting as it may then mean if you’re moving towards an event process in the future the API endpoint represents the future state and the database stored procedure is mimicking the future client behaviour.
- Use polling strategies and result set limits to control how much data is processed in a single execution of the integration. This approach does mean the integration needs to tag which records have been processed to avoid re-reading them.