Deciding on your policy for whether a job works or not is fundamental to how you run your batch.  But do you know all the different ways the Workload Automation for Z can help you?

What does the job return code mean

You see a job that ends in error with error code 0004, but what does that mean?

Well it depends on your configuration; you need to look at your tracker EWTROPTS at the RETCODE keyword.  The options are –

  • HIGHEST – This will return the highest return code encountered anywhere in the job, so any step could cause your job to be flagged as a failure.
  • LAST – This will return to return code of the last step to actually run.

In some ways LAST is the most flexible as you can design your jobs carefully using COND or IF statements to very selectively decide in your JCL whether the job worked or not.  In other ways it risks jobs failing and NOT being spotted, if a step ends with a return code you did not expect, but your COND and IF statements ignore it, the job may have failed, but your JCL reports it as working.

A slightly better alternative to this is to have a step that ABENDs following each key step and the ABEND step is conditioned to only run when the key steps issue expected return codes.  In this scenario the default highest return code is set to 4095 as only jobs that abend are considered errors.  This makes a failure clearer, but still runs the risk of missing unexpected problems.

The more common, and recommended choice is HIGHEST and you make the exceptions in the workload definitions to explicitly exclude error conditions that you know about, any you don’t will cause the job to fail.  When you investigate the failure, If the failing condition is OK, then you can add that to the workload definition to prevent that circumstance flagging again as an error.

Having decided upon whether you want LAST or HIGHEST you then need to look at the ways you can tell the product which return codes are errors and which are not.

Highest acceptable return code

By default only if every step ends with a return code less than or equal to 4 is a job considered to be successful.  But you can alter that default in the JTOPTS controller setting with the HIGHRC setting.  If not specified the default is 4, but to be safe most people override this to 0.

Then in your application definition, on a job by job basis you can set the highest acceptable return code by selecting the operation from the OPER list in the application definition and selecting “Automatic Options” from the menu (or S.4 directly from the OPER list).

Here we can see a job that if it ends with a return code up to and including 4 it will be considered to be a successful run of the job –

Copy to Clipboard

The most jobs in most workloads, that is enough.

The NOERROR statements

But what if RC=4 is OK in STEP1 but not in STEP2.  If you set the highest RC to be 4 and the job has RC=4 in STEP2 then it would be deemed OK even though it failed.  Equally if you set the highest RC to be 0 and STEP1 ends with RC=4 but STEP2 ends with RC=0 then it would be flagged as a failure, even though it is OK.  This is where NOERROR statements come in.

For each step and return code that matches a NOERROR statement, that step is not counted when calculating the highest return code.  You do not need to set highest return code to allow for any return code that is excluded from consideration by a NOERROR statement.

Before engaging with NOERROR though it is worth noting that NOERROR statements are NOT contained in the database, and once active are processed for every job when it completes BEFORE the highest return code processing.  They are also listed in the EQQMLOG every time the controller starts, and every time a NOERROR list is reloaded.  So consider their use with that in mind, keeping the lists as small as possible.

You can define NOERROR statements in one of 2 ways –

  • Adding the NOERROR keyword to the JTOPTS statement in your controller parm or as NOERROR LIST items in the controller parms. Though possible this is not recommended as to reload any NOERROR statements you have to reload the entire table every time.
  • Adding an INCLUDE statement to the controller parms. This can load one or more members of NOERROR statements at startup, and each member can be individually reloaded with a Modify command without bouncing the controller.  This is the recommended method, and it is also recommended to divide your NOERROR entries into separate members if you have some logical way to do this.  More members means less messages repeated on the EQQMLOG each time a single member is reloaded.

By default the NOERROR processing will also perform consistency checks as the entries are loaded.  This lets you see if you have duplicates or overlaps in your tables.  However the default position will cause your controller to terminate if you have inconsistencies.  So it is recommended that you add NOERRCONCHECK(MSG) to your OPCOPTS statement in the controller.  This will still do the consistency checks but will NOT fail the controller.

Coding a good NOERROR statement

The syntax of a NOERROR statement looks like this –

Copy to Clipboard

Operator can be –

  • EQ Equal to (this is the default).
  • GE Greater than or equal to.
  • GT Greater than.
  • LE Less than or equal to.
  • LT Less than.
  • NE Not equal to.
  • TO Indicates a range between two values. Use it by specifying an expression with the form TO.errorcode2 where the extreme values are inclusive.

The default is EQ, the operator argument can be omitted for EQ.

One very important thing to note about stepname and procstepname.  As far as Workload

Automation for Z is concerned these terms mean the following –

  • stepname – The name of a step that invokes an in-stream or cataloged procedure. This is always the name of an EXEC PROC statement. The maximum length is 8 characters.
  • procstepname – The name of a step that invokes a program. This is always the name of an EXEC PGM statement. The maximum length is 8 characters.

This is always the case in z/OS for any step that executes a proc, but steps that are directly coded in the job JCL that execute a program are considered differently in other z/OS products.

A simple job like this –

Copy to Clipboard

Using the JS line command in SDSF this job would show like this –

Copy to Clipboard

But to code a NOERROR statement to make STEP1 allow RC=4 you would need to code this –

Copy to Clipboard

So, despite STEP1 being the STEPNAME in SDSF, in Workload Automation it is the PROCSTEPNAME since it is the one that executes the program.  It is vital that you remember that distinction when coding any NOERROR statements.

As a general rule you should not code a NOERROR statement like SIMPLJOB.*.*.0004, as in MOST cases this would be equivalent to setting the highest return code for job SIMPLJOB to 4.

The only time this would not be equivalent is if any step in the job was capable of returning RC=1, RC=2 or RC=3 AND those return codes should be considered as error.  Since most z/OS jobs tend to only issue return codes such as 4, 8, 12 or 16, and 4 is considered warning only, chances are this would be better handled in the application definition.  This allows you to keep your NOERROR list as small as possible, which is always a good thing.

I would recommend searching for “.*.*.0004” across your NOERROR entries to see if you have candidates that would be better suited as Highest Return code settings.  In a similar vein all

“.*.*.0001”, “.*.*.0001.EQ” and “.*.*.0001.LE” can definitely be changed to a highest return code of 1 in the application definition.

Also any argument with both steps being * and LE or LT as the argument, e.g.

SIMPLJOB.*.*.0008.LE

can be replaced with highest return code 8, and similarly with LT e.g.

SIMPLJOB.*.*.0008.LT

could be replaced with highest return code 7.  As can any range that starts at 0001 for all steps.

Note that for all of the recommendations for replacing the NOERROR with a highest return code, you can only do this for entries where the jobname is absolute e.g. MX* as the jobname may not be a good candidate for consideration to replace with highest return code, unless you do it for all current MX jobs and remember to do it for all future MX jobs.

One not so obvious area for improvement that is not so easy to spot may be where you have NOERROR statements for all the  steps in the job that make RC=4 acceptable.

SIMPLJOB.*.STEP1.0004

SIMPLJOB.*.STEP2.0004

If SIMPLJOB only has STEP1 and STEP2 then this is identical to coding SIMPLJOB.*.*.0004 and could therefore be a candidate for replacement with highest return code 4 in the application.

Another possible reduction in entries is to look for consecutive values for EQ which could be replaced with TO ranges.

SIMPLJOB.*.STEP1.0008.EQ,

SIMPLJOB.*.STEP1.0009.EQ,

Could be replaced with

SIMPLJOB.*.STEP1.0008.TO.0009

To replace 2 consecutive return codes the jobname, stepname and procstepname must be the same in each statement.

Consistency checks

As mentioned earlier the controller will perform consistency checks as it loads your NOERROR statements.  This is a bit like a teacher marking your work as “must try harder”

Again to keep the NOERROR list small and lean it is recommended that you review the EQQMLOG at controller startup to remove any duplicate statements or overlaps.

There are a few things it might tell you about.

EQQN068I Overlap –

Copy to Clipboard

Here you can see that we have an entry that allows 3333 for all steps and another that makes the same return code acceptable for STEP1.  The more specific statement for STEP1 is not needed and can be removed as it is already covered by the generic one. Be aware though that some overlaps are not necessarily redundant –

Copy to Clipboard

Here we are making RC=4 acceptable on any step called STEPXYZ1 and we are saying that RC=4 is acceptable in any step in SIMPLJOB.  So there is an overlap BUT you cannot remove either of them as one of them covers one step in every job, whereas the other covers every step in one job.  Though you potentially could remove the job specific entry with a highest return code in the application.

EQQN069W not consistent –

Copy to Clipboard

Strictly speaking in this example you can see that this is setting 4 as the highest return code for jobs beginning with SI but for the absolute job name of SIMPLJOB it is 16.  However the consistency checker sees two rules that cover SIMPLJOB with different values and objects.

The specific rule could easily be replaced by setting the highest acceptable return code to 16 for SIMPLJOB and the conflicting entry removed.

EQQN096I Already present –

Copy to Clipboard

This one is fairly self explanatory, you have exactly the same entry more than once and you should remove one of them.  Bear in mind that they may not be in the same member as each other.

Hopefully now you have enough information to design your workload’s error handling in a simple and efficient manner.