MANAGING AND DELETING PERSISTENT DATA SETS WITHIN IBM INFOSPHERE DATASTAGE

MANAGING-AND-DELETING-PERSISTENT-DATA-SETS-IN-DATASTAGE
Data Sets sometimes take up too much disk space. This blog describes how to obtain information about datasets and how to delete them.
Data sets can be managed using the Data Set Management tool, invoked from the Tools > Data Set Management menu option within DataStage Designer.
Alternatively, the ‘orchadmin’ command line program can be used to perform the same tasks.
The files which store the actual data persist in the locations identified as resource disks in the configuration files. These files are named according to the pattern below:
descriptor.user.host.ssss.pppp.nnnn.pid.time.index.random
Name Significance
descriptor Name of the data set descriptor file
user Your user name
host Hostname from which you invoked the job which created the data set
user Your user name
ssss 4-digit segment identifier (0000-9999)
pppp 4-digit partition identifier (0000-9999)
nnnn 4-digit file identifier (0000-9999) within the partition
pid Process ID of the job on the host from which you invoked the jop that creates the data set
time 8-digit hexadecimal time stamp in seconds
index 4-digit number incremented for each file
random 8 hexadecimal digits containing a random number to insure unique file names

For example, suppose that your configuration file contains the following node definitions:

{
node node0
{
fastname “host1”
pools “”
resource disk “/opt/IBM” {pools “”}
resource scratchdisk “/opt/scratch” {pools “”}
}
node node1
{
fastname “host1”
pools “”
resource disk “/opt/IBM” {pools “”}
resource scratchdisk “/opt/scratch” {pools “”}
}
}

A data set named dataset1.ds created by a job using this configuration file will contain data in two partitions, one for each processing node declared in the configuration file. Because each processing node contains only a single disk specification, each partition of data would be stored in a single file on each processing node. Following the naming convention shown above, the data file for partition 0 would be located on the host1 machine, in the /opt/IBM filesystem, and the file would be named:

/opt/IBM/dataset1.ds.user1.host1.0000.0000.0000.1fa98.b61345a4.0000.88dc5aef

The data file for partition 1 data would be similarly named:

/opt/IBM/dataset1ds.user1.host1.0000.0001.0000.1fa98.b61345a4.0001.8b3cb144

It is important to understand that the file referenced in the job, called dataset1.ds in our example, does not contain any actual data. It is a data set descriptor file, and it contains information about how the data set is constructed. In order for DataStage jobs to access the data, both the descriptor and the actual segment files must exist.

Cleaning up Data Sets:

A great plan for managing data sets is to identify the Data Sets that are no longer required and to use the Data Set Management tool to delete them. If you have the jobs that reference the data sets, you can open each of the data set descriptor files using the Data Set Management tool and then view and delete the data set. If you do not have the jobs, another possible method is to look in the resource disk locations for segment files with very old modification dates. Once you have identified the segment files, you can determine what the data set descriptor file name was.
/opt/IBM/dataset1.ds.user1.host1.0000.0000.0000.1fa98.b61345a4.0000.88dc5aef

In this example segment file shown above, the highlighted “dataset1.ds” is the file name of the data set descriptor. You can then locate this file in your computer with the find command.

find /my_projects/datasets/ -name “mydataset1.ds” -print

Once you have located the descriptor file, you can then use the Data Set Management tool to view and delete the data set. If someone has already deleted the descriptor file, then the segments have been orphaned. There is no utility or function to recreate the descriptor file. In this situation, you can safely delete all the segment files named with the “dataset1.ds” in the file name.

Cleaning up Data Sets from the command line:

It is also possible to use the orchadmin executable program to delete data sets. This program is located in $APT_ORCHHOME/bin.

To delete a data set using orchadmin, the environment has to be setup properly, and the descriptor file has to exist. Follow these steps to use delete a data set.

$ cd $DSHOME
$ . ./dsenv
$ LD_LIBRARY_PATH=$APT_ORCHHOME/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
$ APT_CONFIG_FILE=; export APT_CONFIG_FILE
$ APT_ORCHHOME=$DSHOME/../PXEngine; export APT_ORCHHOME
$ PATH=$APT_ORCHHOME/bin:$PATH; export PATH
$ $DSHOME/../PXEngine/bin/orchadmin delete

NOTE: Adjust the steps accordingly for your platform, for example use LIBPATH instead of LD_LIBRARY_PATH on the AIX platform.
Reference for this article : IBM Technote

Comments

comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: