If you live and work in the United States of America, or to a degree the United Kingdom, you may be blissfully unaware of what an EBCDIC Code Page is, and what it does. Even in other countries you may have avoided many problems with them or perhaps spent too much time fighting with them and surrendered by using Code Page IBM 37.
The only real noticeable difference for English speaking nations using zOS is that JES commands begin with Dollar Sign in the US and Pound Stirling character in the UK. Not to be confused with the US Pound which is known as Hash for the rest of the world which is generally the same EBCDIC character (but not always).
The problem with these particular set of characters is so bad that to publish this blog I have had to be careful not to use these characters in the actual text, as they can also cause issues with web browsers and web page editors. In the text you will see the use of words such as BANG, HASH and AT, with the characters themselves only occurring in images.

Today’s Tales from Across the Pond will expand a little on what a code page is, what this means for using Workload Automation and what you can do to get around it.
What is an EBCDIC code page?
Let us first start with the million dollar question, what is EBCDIC? It stands for Extended Binary Coded Decimal Interchange Code, and is the native character set for mainframes. You may already be familiar with the term ASCII (American Standard Code for Information Interchange), which is the character set used on Windows and Unix platforms, well EBCDIC is pretty much the same thing, but for mainframes.
Early computing pretty much decided every character could be defined as 1 character = 1 byte. So internally that means as 1 byte is 8 bits, the most different characters you can ever have is 256. This is particularly pertinent on Mainframe with its tendency towards fixed length records, a problem Windows and Unix don’t have.
For a single English based language 256 is more than enough to cope with Upper and Lower case letters (52) Numerical Digits (10) punctuation, mathematical symbols and a few control characters. But then move a tiny bit East, just as far as Europe, and many European languages have accents, which add many more variants to letters, and even extra punctuation characters. Then to further complicate matters each European language uses different accents on different letters.
This is why code pages exist, to allow each country to use the “spare capacity” in the 256 characters to represent characters exclusively for their country.
Now you would think that all the characters common to many languages would have the same character code in each code page – that would be way too easy.
In fact EBCDIC code pages have a set of Invariant characters that are the same across all code pages and then Variant characters which are not. Whilst most variant characters are specific to a language, surprisingly there are some very common characters that sit in the Variant list.
In the diagram below the light coloured cells are invariant characters and the darker cells are the variant characters in the IBM 037 code page which is for the USA.

So, it really is the million dollar question as the Dollar symbol will not necessarily be the same character code in every code page.
In the US the dollar is 5B, but in the UK code page IBM 265 the character at 5B is the UK Pound Stirling character and the dollar is represented by 4A.
You might notice some quite surprising characters that are in the variant list.

So, I’m sure you can see the start of the problem. But it gets worse, even some of the characters listed as Invariants here can “sometimes” still move in rare cases. For example in the Turkish code page 1026 the double quote, listed as an Invariant character 7F in the above table but is FB in Turkey.
This problem has been around for many, many years and affects both EBCDIC and ASCII as both have the same 1 byte restriction, though ASCII has been able to alleviate some of the problems by using Unicode, which can take advantage of the less column structured records used in the distributed world, but mainframe is largely focused on record layouts with fixed length and column positions. Which is great for speed, but not so much for character sets.
For larger Eastern character sets Double Byte Character Sets (DBCS) were developed, but this effectively halves the lengths of many records and fields.
What does this mean to WAPL?
WAPL is an interpreted language, written on top of an interpreted language (REXX), which was written and compiled on a system using the IBM 037 code page.
On the whole many of the characters you need to write in your WAPL code live within the safe zone of Invariant characters. So, A-Z, a-z, 0-9 and various mathematical symbols are all fine.
But imagine seeing this bit of code in a WAPL presentation or manual

This looks fairly harmless until you type this into your system using a code page other than IBM 037, then things might get a little confusing.
Most of that code is fine, but BANG and AT are variant characters and sometimes so is the double quote.
To understand what is going on inside WAPL and other Z programs. When you write simple code to react to a specific character, the program is not looking for a character that looks like BANG it is looking for a character that has the byte value of 5Ax (5A in hex or 90 in Decimal).
The problem comes when you either type or cut and paste that sample code onto your system that uses a different code page. For example, the BANG character as typed into JCL on a system using the Turkish Code Page 1026, this would enter character 4Fx, which means nothing to WAPL. In fact, it strenuously objects –
How can we get around this?
The problem is that WAPL is currently unaware of all the other code pages in the world and more importantly which one you are using.
However, there are some options built into WAPL to at least attempt to help you get around some of these issues. For example, if you type in your code
WAPL will then use YOUR local version of the exclamation mark for the rest of that job.
Equally, you can do the same for some of the other important characters using the OPTIONS statement –
This will again override the internal character code of those important characters with your local version. Note that there is no option to handle Double Quotes as this has only recently been discovered that it is not truly an Invariant character across all code pages.
Unfortunately, this will not always work, if your local version of BANG just happens to be the same character code of some other important character already then it will not let you change this using either method.
For example, the BANG character in the Turkish code page 1026 is 4Fx. If you attempt the explicit VARSUB SCAN with BANG in that code page it will fail, because 4Fx is a reserved character already, as it is the vertical bar in code page IBM 037. This is used in expressions for Boolean “OR” or character concatenation. So that cannot happen.
For now, the best approach I can offer is to enter, or cut and paste your code from the example written using code page IBM 037 and perform the following set of edit commands in ISPF editor. Or at least just the ones that you have characters in your code for –

This will then result in something that looks like this, which will run without character problems now.

If your code page does not have a problem with the Double Quote or Single quote you could write an ISPF Edit Macro to speed this along, that might look something like this.

Of course, you only need to worry about code pages if you are using one of these variant characters, of which BANG and AT are the most common if you need to use variables or email.
Longer term we are investigating a solution to make this easier, but until then this is how you can get your jobs working.
For additional information about code pages and their contents use the following documentation from IBM Host Code Page Reference .

