ServiceNow Data Mart Loader (a.k.a. ServiceNow DataPump)

Concepts

Script Mode

In Script mode the application reads, parses and executes a set of jobs from a script file. Script mode is initiated by specifying the following command line option:

-f filename

Script mode has the following advantages:

  • There are no requirements to install any Update Sets or make any changes to your ServiceNow instance.
  • Since the application only reads (and does not update) your ServiceNow instance, you may (if you choose) use a SOAP account which lacks write access to the instance.

For details on script syntax refer to this page.

GUI Mode

In GUI mode the application reads and executes a set of jobs from the ServiceNow instance. GUI mode is initiated by specifying the following command line option:

-js jobsetname

GUI mode requires you to install an Update Set in your ServiceNow instance. The Update Set creates tables in your instance which are used to configure the Datapump. These same tables are also updated as the Datapump runs.

GUI mode has the following advantages:

  • The Datapump is easier to configure. You do not need to worry about syntax errors in your script file.
  • Progress metrics can be viewed in ServiceNow while the Datapump is running.
  • More configuration options are supported. For example, in GUI mode you can configure a job to only process records that have changed since the last time the job ran.
  • If a long-running job abends due to a communications error or other issue, it can be restarted from the point where it failed. This is an important feature since ServiceNow is remotely hosted and the internet is not a highly reliable transport.

For information on GUI mode refer to this page.

Job

A Job is a set of instructions for replicating data from a ServiceNow table to a target table in an SQL database. There are various types of jobs as described below. Each job replicates a single table. The set of records to be replicated may be restricted using a condition (i.e. an encoded query) and/or a date interval which is based on sys_updated_on or sys_created_on.

Target tables in the SQL database will be automatically created if they do not already exist when the job starts.

JobSet

A Job Set is a collection of jobs that are run together. If operating in script mode then a set of jobs will be defined in a script file. If operating in GUI mode then a Job Set is used to identify a group of jobs to be run together. In both cases the jobs will be run sequentially, one at a time. In the case of a script the sequence is controlled by the order of the lines in the script file. In the case of a Job Set the sequence is controlled the the Order field on the Job. Once all jobs in the script or job set complete the application will end unless there are Poll jobs. If there are Poll jobs in the script or job set then the application will sleep for a while, then awake and run the Poll jobs again, and then go back to sleep.

Load Job

A Load job is a job which does the same thing every time it runs. You can use specify a Condition field to restrict which records are loaded. You can also specify a date interval based on sys_created_on or sys_updated_on. The date interval will not be updated automatically by the application. In other words, if the job is run multiple times, the date interval will be the same each time the job is run.

Use a Load job for the following cases:

  • You are performing a one-time initialization of a table. Once the job completes it will not be run again.
  • You have a table which is periodically reloaded in its entirety.
  • You periodically load a set of records into a target table using a filter condition, but the filter condition does not change with each run.

When you define a Load job you have the option of specifying a Truncate option. If Truncate is specified then the target table will be truncated before the load begins.

Load Method

A Load Job can specify a Load Method. If Truncate is false then you can choose one of three load methods.

  • Update Insert - This is the default. Records in the target table will be updated using the primary key (sys_id). If they do not exist they will be inserted.
  • Insert Only - Records will be inserted in the target table. If a record already exists then an SQL Primary Key Violation will be thrown and the application will abend. This method should only be specified if you have reason to expect that the target records do not exist.
  • Compare Timestamp - The timestamp (sys_updated_on) from ServiceNow will be compared with the timestamp (sys_updated_on) of the target table. If the ServiceNow record is more recent then the target table will be updated or inserted. This method may be selected if, for any reason, you wish to avoid updates to unchanged records.

If Truncate is true then the load method is automatically Insert Only.

Refresh Job

A Refresh job loads records which have been inserted or updated since the job ran last. When using GUI mode, the application will advance the job interval each time the "Refresh" job is run. The ending datetime from the last run becomes the starting datetime for the current run. The ending datetime for the current run becomes the current datetime.

When using Script mode, a Refresh job provides no more additional functionality than a Load job, unless polling is activated.

Note that the date interval is half-inclusive. The starting datetime is included in the query. The ending datetime is excluded.

Refresh jobs may adversely affect the performance of your system if sys_updated_on is not an indexed field in the specified table. Contact ServiceNow Customer Support to add indices to your instance.

Prune Job

A Prune job reads the sys_audit_delete file to identify any records which have been deleted in ServiceNow since the last run. These records will be deleted in the target table.

Prune jobs may adversely affect the performance of your system if tablename and sys_created_on are not indexed fields in the sys_audit_delete table. Contact ServiceNow Customer Support to add indices to your instance.

Polling Option

A polling option is available for Refresh and Prune jobs. In Script mode the polling option is activated by adding an every clause to the refresh or prune command. In GUI mode the polling option is activated by setting Poll Continuously to true. The polling option causes the application to sleep, awake and repeat the job at a periodic frequency. If a polling option is specified then the application will continue running until it is cancelled or killed.

It is possible to have multiple jobs in a single script with different polling frequencies. Since the application is single threaded and only runs one job at a time, a job may need to wait longer than its configured frequency before running.

Polling jobs may adversely affect the performance of your system if the proper fields are not indexed in your ServiceNow instance. Refer to the notes above regarding indices for Refresh and Prune jobs.

Backloading Historical Data

When using ServiceNow Direct Web Services, there are two ways to retrieve a large number or records from a table.

  • getKeys — Use the getKeys method to retrieve the keys of desired records, then fetch the records in chunks using an encoded query of the form sys_idINsys_id,sys_id,...

  • windowing — Use the __first_row and __last_row parameters to fetch the records in chunks.

GUI Mode supports both methods. Script Mode currently supports only the first method (getKeys) since our testing indicates that the performance is better. There is an up-front hit on the getKeys call, but all subsequent SOAP requests will run faster. However, this method can be problematic if the result set is too large, as the initial getKeys call may never return.

The solution is to break up the request into a number of smaller requests by date (typically using sys_created_on). A number of smaller jobs are created. Based on the transaction volume, these may be by month or by week. If the transaction volume is especially high then it may be necessary to create a job for each day. If the volume is low then one job per quarter may be sufficient. We typically try to keep the size of a job to under 50,000 records.

When using the DataPump in Script Mode, use your favorite scripting tool or text editor to create a script file containing a bunch of load jobs. If using the DataPump in GUI Mode, use a UI action to clone the a load job. Create one load job for the first month, then clone it 23 times to load two years worth of data. Kick off the jobs on a Saturday afternoon when system activity is at an ebb, and within a few hours the data will be loaded. While this approach may seem a bit tedious, it is important to remember that it is a one-time process. Once the history has been loaded into the data mart, it never needs to be loaded again. Going forward we load recent activity at regular intervals, and the transaction volumes are much more managable.