Ordered Rows in Kettle

There was a question posed the other day on the Pentaho forums about how to get Kettle to process “all the rows” at one step before beginning execution on the others. Sven suggested to use the “execute once for every row” as a solution which I think is probably overall, a cleaner way to accomplish a multistep process. However, it is possible to do this in Kettle now.

The solution is to add “Blocking Step”s in your transformation where you need the whole thing to have completed before continuing processing.

Consider the following example:

200806251534

The step “block1” does not pass rows to Step2 until all rows have finished at Step1. This accomplishes the desired outcome of ensuring that all records have completed processing on step1 before step2 processes. The example transformation outputs to the debug log and it’s clear that they are output in the correct order.

2008/06/25 15:25:04 - step1.0 - Step1:1
2008/06/25 15:25:04 - step1.0 - Step1:2
2008/06/25 15:25:04 - step1.0 - Step1:3
2008/06/25 15:25:04 - step1.0 - Step1:4
2008/06/25 15:25:04 - step1.0 - Step1:5
...
2008/06/25 15:25:05 - step1.0 - Step1:499
2008/06/25 15:25:05 - step1.0 - Step1:500
...
2008/06/25 15:25:05 - step2.0 - Step2:1
2008/06/25 15:25:05 - step2.0 - Step2:2
2008/06/25 15:25:05 - step2.0 - Step2:3
2008/06/25 15:25:05 - step2.0 - Step2:4
2008/06/25 15:25:05 - step2.0 - Step2:5
...
2008/06/25 15:25:05 - step2.0 - Step2:499
2008/06/25 15:25:05 - step2.0 - Step2:500
...
2008/06/25 15:25:05 - step3.0 - Step3:1
2008/06/25 15:25:05 - step3.0 - Step3:2
2008/06/25 15:25:05 - step3.0 - Step3:3
2008/06/25 15:25:05 - step3.0 - Step3:4
2008/06/25 15:25:05 - step3.0 - Step3:5
2008/06/25 15:25:05 - step3.0 - Step3:6
2008/06/25 15:25:05 - step3.0 - Step3:7
2008/06/25 15:25:05 - step4.0 - Step4:1
2008/06/25 15:25:05 - step3.0 - Step3:8
2008/06/25 15:25:05 - step4.0 - Step4:2
2008/06/25 15:25:05 - step3.0 - Step3:9
2008/06/25 15:25:05 - step4.0 - Step4:3
2008/06/25 15:25:05 - step4.0 - Step4:4

Example here: ordered_rows_example.ktr

Leave a Reply

Your email address will not be published. Required fields are marked *