Unconstrained Channel Pruning – Apple Machine Studying Analysis

Fashionable neural networks are rising not solely in dimension and complexity but in addition in inference time. Probably the most efficient compression strategies — channel pruning — combats this pattern by eradicating channels from convolutional weights to scale back useful resource consumption. Nevertheless, eradicating channels is non-trivial for multi-branch segments of a mannequin, which may introduce additional reminiscence copies at inference time. These copies incur enhance latency — a lot so, that the pruned mannequin is even slower than the unique, unpruned mannequin. As a workaround, current pruning works constrain sure channels to be pruned collectively. This absolutely eliminates inference-time reminiscence copies, however as we present, these constraints considerably impair accuracy. To unravel each challenges, our perception is to allow unconstrained pruning by reordering channels to reduce reminiscence copies. Utilizing this perception, we design a generic algorithm UCPE to prune fashions with any pruning sample. Critically, by eradicating constraints from current pruning heuristics, we enhance ImageNet top-1 accuracy for post-training pruning by 2.1 factors on common — benefiting pruned DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Moreover, our UCPE algorithm reduces latency by as much as 52.8% in comparison with naive unconstrained pruning, almost absolutely eliminating reminiscence copies at inference time.

(Visited 1 times, 1 visits today)

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
Ask ChatGPT
Set ChatGPT API key
Find your Secret API key in your ChatGPT User settings and paste it here to connect ChatGPT with your Tutor LMS website.
0
Would love your thoughts, please comment.x
()
x