gpu: avoid mapping final non-permutable bands to the device
If the outer node of the schedule tree is a sequence and the final
children of this sequence do not have any permutable bands,
then there is no point in including these final children
in the part that is mapped to the device.
Instead, these final children can be run on the CPU instead.
This extends earlier support for separating out independent and
initial non-permutable bands in
ppcg-0.03-191-g6fa73710 (gpu:
avoid mapping independent non-permutable bands to the device,
Thu Oct 24 13:15:10 2013 +0200) and
ppcg-0.04-49-g201e18aa (gpu:
avoid mapping initial non-permutable bands to the device,
Tue Dec 15 12:13:30 2015 +0100).
Signed-off-by: Sven Verdoolaege <sven.verdoolaege@gmail.com>