heroku · Java · REST · Salesforce

Archive of Salesforce data

The Salesforce has many regulatory limits and one such is Data Storage. As the CRM ages, it accumulates large amounts of information and many enterprises eventually run into data storage limitations. Purchasing additional Salesforce storage space is not economical when you add the operational burden. To overcome this challenge a comprehensive archiving mechanism is recommended and is an essential part of every progressive business.

What is Archiving?

Archiving is transferring the data to a less frequently used storage medium. Archiving enables organizations to create and retain more data than ever before.

Why data should be archived in Salesforce?

An archive includes historical, rarely-used Salesforce data that can be entirely removed from your Salesforce environments. Upon archival, the data is moved to long-term retention and can be used for future reference. Data archiving

  • allows only to use Salesforce provided storage limits.
  • reduces storage costs and increased storage limits for relevant data.
  • keeps your customer sensitive data safe.
  • consistent application performance.

Does Salesforce provide Archival mechanism?

The answer is yes, Salesforce has a built-in archiving capability. It is limited to tasks, events, and activities that are older than a year and this doesn’t count against your company’s storage allocation. Salesforce doesn’t address other huge objects data that should be archived depending on the business process. Another option that Salesforce proposes for archiving is Big Objects, which has its own advantages.

Are there any third-party vendors that provide archival functionality for Salesforce?

Yes, there are vendors, one such is DataArchiva, it gives a lot of options to choose storage like AWS, Azure, Heroku, etc. The pain part is, it comes with pricing.

Challenges for Archiving data in Salesforce

Approach Challenge
Building the application on the Force.com platform using the Zippex library. This works only for the total file size of less than 3MB.
Building the application on Force.com platform using javascript libraries in the Visualforce page. This solution cannot run in the background as a scheduled job.
Hosting the above-mentioned application on the site page and invoking site page through the batch apex. Javascript code of Visualforce page is not compiled and as a result no execution of Archival Logic.

Now let’s see an Economical, limitation less solution for the Archival of Salesforce data.

APIs are always there to enhance a platform and based on this the solution is implemented to overcome the platform limits. This blog provides the solution for Archiving large files using the Salesforce REST API and Java Application hosted on Heroku. 

Solution Architecture

ArchivalBlogg.pngThere will be three entities in the solution

  1. Salesforce
  2. Java Application
  3. Online file storage web service (here AWS Glacier)

The Salesforce data is pushed to AWS Glacier via Java Application. In order to archive a specific record, the approach would be 

  1. Get the HTML view of the record in a content document.
  2. The Java Application will connect to Salesforce and get the content versions, convert the HTML page of record to PDF/A, zip them and upload the zip to the glacier.  In simple we are Wrapping PDF of the record and its related files together and putting them in the glacier.
  3. Glacier gives back a response Id for every zip file you insert. Store this Id in the Salesforce or any other database.
  4. Use this Id to retrieve the data when you need it for any evidence.

Note, to get the HTML view of the record to create a custom Visualforce page that shows all the data of the record along with related lists and chatter, and convert that VF page to HTML dynamically.

Since the Force.com platform has limitations, we are moving the logic of processing files to another open-source platform like Java and performing necessary operations and inserting back the result to Salesforce through API. This Java entity can be hosted on any PAAS like Heroku. It can even be scheduled to make Archival an automated job.

Steps to connect to Salesforce from Java application

  1. Create a basic connected application in Salesforce, and select the OAuth Scopes to Access and manage your data (API)
  2. Once the connected app is created, note the consumer key and secret key.
  3. Configure OAuth2.0  authentication.
  4. Once Authentication is successful, using the Oauth token you can communicate with Salesforce. Make the required queries to get the files from the Salesforce.

Note, instead of making multiple callouts from the external application to get the content version Id (File Id), populate the contentversionIds in a field on the record using SOQL in Salesforce.

Criteria for Archiving a record

It can be defined at the org level or object level or record level. If it is at org level or object level, a custom setting can be created with a field that stores the Archival Period. If you want to use it for org level, create a single record with the archival period, and if you want to use it for object level, create a record for each object specifying the archival period. If you want at the record level, then just create a field on every object that stores the Archival Period. Note, data satisfying this Archival Condition only should be queried from Java.

Zipping in Java

Once we query all the files under a record which includes HTML also,  keep them in a folder with the name as Record Name. Before placing the files convert the HTML page of record to PDF/A. Now, this folder can be  Zipped using any of the libraries present in java. For reference use this link

Sending it to Cloud storage (here AWS Glacier)

Here we can leverage the Java language, cause whatever cloud storage you choose, there will be some material available online where we can easily get the code in Java to connect to that storage. For AWS Glacier refer to this link.

This uses SDK to connect to the glacier, and integration becomes pretty simple with this SDK.

Now we have a fully working application, the question is where do we host it. If you choose Heroku as a platform to host follow the below steps.

  1. You will require a one-off dyno to host it and to be scheduled.
  2.  For scheduling instead of using the Heroku scheduler, use the custom clock process in Java, refer to this link.  

Want to implement an economical archiving solution for your Org then get in touch with us.


2 thoughts on “Archive of Salesforce data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s