We know we can give hard coded sheet name or parameterized sheet name while reading excel data from unstructured data stage.
But the problem comes when you want to read data from excel in specific sheet number such as first sheet or second sheet and so on. If you know sheet name, you can directly hard code sheet name there. But if sheet name is changing everytime, there is no direct option to read data from specific sheet.
I will show you how to read specific sheet name using a parallel job and you can pass this input to next parallel job to read it.
Below parallel job will demonstratie how to read sheet name.
Of course there is no other option to read excel sheet unless we use unstructured data stage. I used this stage only to read excel workbook.
Step 1: Click on ‘Configure’ in unstructured data stage
Step 2: Select ‘Sheet name’ from the Import properties as shown below.
Step 3: Do not select any other columns in the Import, we are not reading entire excel sheet here. Keep all the options exactly like below. Specify the range according to your excel sheet metadata.
Step 4: ‘SheetName’ column will be there in Columns tab, it comes automatically.
Step 5: Use constraint in transformer stage as like below.
@INROWNUM=1
If you are using more than one node to run this job, please run the job in sequential mode. Otherwise you may not see accurate result for @INROWNUM system variable.
That’s it!
Just write this FirstSheetName into one flat file and read that using execute command stage in sequence. Pass that value to your parallel job as a parameter.
You can parameterize the sheet name in ‘Range expression’ field in unstructured data stage.
Thanks for reading! Let me now your thoughts on this by commenting below.