Memory Size

The memory size specifies the number of words in memory to be used when sorting. If memory size is not specified, SORT assumes a default value of 12,000 words. SORT calculates the number of records to be kept simultaneously in memory with a limit of 65535 records. SORT also calculates the number of buffers to be used for each string, with a limit of 256 buffers per string.

When the SORT utility is invoked from a COBOL85, FORTRAN77, or C program, a memory size of at least 1524 words is required. If the memory requested is less than 1524 words, the required memory size is estimated, based on the size and expected number of records. The maximum memory size estimate provided by the software in such a case is 200,000 words.

The memory size estimate determines stringing and merging vector sizes, which, in turn, control string length and merging. Producing a small number of long strings is desired for sorting because fewer merge passes are required. Increasing the memory available to SORT is the most effective way to increase sorting efficiency. However, if the memory size is increased beyond the optimum, SORT might run slower rather than faster. The optimum memory size varies according to file size, which includes the record size, the blocking factor, and the number of records. Optimum memory size is also affected by the number of other jobs in the mix at the time of the sort operation.

SORT attempts to be processor-bound in both the stringing and merging phase (as opposed to processor-bound during the stringing phase and I/O-bound during the merging phase). Unless the memory size specified is relatively small for a given sort, SORT achieves the goal of being processor-bound. When this condition is obtained, speed improvements can be realized only by methods that reduce processor time. Decreasing the amount of processor time helps system throughput as well as reducing sort timings.

In general, memory allocation proceeds through the following steps:

  1. Memory size provided by the program is reduced by 1,500 words. The reduced size is used for all subsequent calculations. The reduction is a generous estimate of the amount of space required for working storage and the space required for various SORT procedures.

  2. A buffer size is selected for the internal disk or tape files or both. SORT tries to select buffer sizes so that it does not become I/O-bound. For disk sorting, SORT normally allocates two buffers for each string. For tape sorting with n tapes, SORT allocates 1/nth of memory as buffers for each tape.

  3. During executions of the stringing phase, two output buffers are normally allocated; thus, the remainder of memory is left for the sort vector. During execution of the merge phase, virtually all available memory is used for buffers.

In general, Unisys recommends a memory estimate of 4,000 or more words. Memory estimates of 20,000 or more are recommended only on large systems. Information on how to determine memory size for different sorting modes is given later in this section.

When the memory allocation is completed, SORT initializes its internal files with the proper computed attributes. SORT takes into account any change made to these file attributes through file equation. However, these file attributes override memory size specification; therefore, actual memory size can be less than or greater than the memory size specified to SORT. Refer to “SORT Files” in this section for more information on file equation of SORT internal files.

Determining Memory Size for Disk Sorting

A SORT that specifies disk requires memory to be used for the actual sorting. If memory size is not specified, then a default of 12,000 is assumed.

Perform the following procedure to determine memory size for disk sorting:

  1. Convert the sort record size to the number of words required to contain a single record. For example, a 1-character record requires 1 word, and fifteen 6-bit characters require 2 words, while fifteen 8-bit characters require 3 words.

  2. When the SORT utility is invoked from a COBOL85, FORTRAN77, or C program, determine the number of words required to contain all of the sort keys in a single record. Add this value to the record size obtained in step 1.

  3. Add 3 additional words to the record size—to be used only by SORT.

  4. Multiply the number obtained in step 2 by the desired number of records according to the following preferences:

    • For fast sorting, memory size should provide enough space to contain at least 2,000 records.

    • For reasonably fast sorting, memory size should provide enough space to contain at least 1,200 records.

    • For adequate sorting, memory size should provide enough space for 600 records, as a general rule.

    • For faster sorting when a large amount of memory is available, memory size should provide enough space to contain 1/16th of the number of records in the file, or 400,000 words, whichever is larger.

  5. After memory has been computed for the number of records times the record size, add 1,500 words to provide for sort working space.

Determining Memory Size for Tape Sorting

Use the following to determine memory size for tape sorting:

  1. Convert the record size to words.

  2. When the SORT utility is invoked from a COBOL85, FORTRAN77, or C program, add the number of words required to contain the sort keys in a single record to the record size obtained in step 1.

  3. Add 3 additional words—to be used only by SORT.

  4. Multiply the number obtained in step 2 by the number of tapes specified in the SORT statement.

  5. Multiply the number obtained in step 3 by one of the following:

    • 300 for fast sorting

    • 200 for reasonably fast sorting

    • 100 for adequate sorting

  6. Add 1,500 words to provide for sort working space.

Tape sorts are similar to disk sorts in that providing more memory generally yields faster sorts. However, the point of diminishing returns is more data-dependent for tape sorting. In general, using more sort work tapes rather than providing additional memory is more efficient. Providing more memory and more tapes is ideal when speed is the most important factor.

Determining Memory Size for Memory Sorting

Use the following for memory-only sorting:

  1. Convert the record size to words.

  2. When the SORT utility is invoked from a COBOL85, FORTRAN77, or C program, add the number of words required to contain the sort keys in a single record to the record size obtained in step 1.

  3. Add three additional words—to be used only by SORT.

  4. Multiply the number obtained in step 3 by the number of records to be sorted. Note that the maximum number of records that can be sorted in a memory-only sort is 65,535. If the number of records exceeds that value at execution time, a disk sort is performed.

If you specify a value less than 1524 words, the system calculates a memory size estimate for you by multiplying the record size as computed in the above three steps by the number of records to be sorted. The memory size estimate provided by the system cannot exceed 200,000 words.

Disk Sorts Involving Files with Exceptionally Large Record Counts

Memory-size-estimate calculations for a fast disk sort provide acceptable performance levels across a wide range of sorting environments. Sorting disk files that contain an especially large number of records requires additional attention. The appropriate method for dealing with exceptionally large record counts includes providing a sort with larger amounts of memory than normal.

Before using larger amounts of memory, evaluate your particular SORT requirement with regards to the following alternatives:

  • SORT Attributes

    Ensure that the memory value of the current sort is at least the amount recommended for a fast sort (see Determining Memory Size for Disk Sorting). Verify that input and output file blocking is consistent with guidelines specified (see Input and Output Options).

  • Application Attributes

    Determine if a FLAT file has grown to a point where it should be considered for inclusion in a database (such as a DMS dataset with a set that provides the required sort order) so that it can be processed with a FIND FIRST ... FIND NEXT mechanism. If the desired data is already a DMS dataset, consider the addition of a set that provides the desired sort order. Also, consider the increase in disk storage that will be required.

    Determine if it is realistic to alter the application to create a number of smaller files that can be sorted individually (possibly asynchronously), then merged to create a large file. If record counts can be kept under 65K, you can sort them quickly in core.

SORT performance depends on many environmental factors. If memory is the primary performance restriction, giving more of it to an extremely large sort potentially will improve performance by enabling SORT to support larger block sizes for its work file (DISKF), as well as larger internal tables. Larger block sizes enable more work to be done before having to go to disk, but significant amounts of memory are generally needed to cause block size increase. Doubling an estimate for a fast sort is likely to have minimal or no effect.

Internal computations used by the sort to derive optimal block sizes are complex and do not lend themselves to a simple formula or table. The most straightforward way to determine a suitable memory estimate is simply pick a 'large' initial amount, and test it in the special case sort. If additional performance is needed, increment and continue tuning until you either reach an acceptable performance level or encounter your limit (zero, negative, or marginal performance increase). A general guideline for the 'large' initial amount is enough memory (words) to hold 65K records. The value is the approximate maximum size of a 'String vector' (a collection of sequenced records).

The following example illustrates the differences in memory amounts for a fast SORT and for an exceptionally large one:

Example

File record size: 120 byte records
Memory estimate computed for a fast sort:
((120)/6 + 3) * 2000 + 1500 -or- 47,500 words
Memory estimate computed using this technique:
((120)/6) * 65000 -or- 1,300,000 words 

The MCP has extensive checks and balances for SORT parameters. The following are possible risks:

  • Any memory that you provide the sort becomes what is called "SAVE" memory (cannot be overlaid). This technique is based on providing significantly more for possibly longer periods. Misuse or abuse of this technique might cause overall system performance to become unpredictable in an environment where available memory is scarce.

  • Increased work file block sizes can result in larger AREA sizes and cause a "MORE SECTORS" RSVP message. As long as the RSVP is unanswered, "SAVE" memory is tied up. Larger block sizes might result in unused space and could exhaust disk space.