Fashionable neural networks are rising not solely in dimension and complexity but in addition in inference time. Probably the most efficient compression strategies — channel pruning — combats this pattern by eradicating channels from convolutional weights to scale back useful resource consumption. Nevertheless, eradicating channels is non-trivial for multi-branch segments of a mannequin, which may introduce additional reminiscence copies at inference time. These copies incur enhance latency — a lot so, that the pruned mannequin is even slower than the unique, unpruned mannequin. As a workaround, current pruning works constrain sure channels to be pruned collectively. This absolutely eliminates inference-time reminiscence copies, however as we present, these constraints considerably impair accuracy. To unravel each challenges, our perception is to allow unconstrained pruning by reordering channels to reduce reminiscence copies. Utilizing this perception, we design a generic algorithm UCPE to prune fashions with any pruning sample. Critically, by eradicating constraints from current pruning heuristics, we enhance ImageNet top-1 accuracy for post-training pruning by 2.1 factors on common — benefiting pruned DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Moreover, our UCPE algorithm reduces latency by as much as 52.8% in comparison with naive unconstrained pruning, almost absolutely eliminating reminiscence copies at inference time.
Xplore Your Programming Skills with Programmer’s Academy