rocmPackages.composable_kernel: make more parts big-parallel

All parts except pool are too slow without the higher core limit on big-parallel builders.

Infra channel discussion: https://matrix.to/#/!RROtHmAaQIkiJzJZZE:nixos.org/

Luna Nova 934f488c 0b4a36dc

+8
+8
pkgs/development/rocm-modules/6/composable_kernel/default.nix
··· 36 36 "device_grouped_conv2d_fwd_instance" 37 37 "device_grouped_conv2d_fwd_dynamic_op_instance" 38 38 ]; 39 + requiredSystemFeatures = [ "big-parallel" ]; 39 40 }; 40 41 grouped_conv_bwd_3d = { 41 42 targets = [ ··· 46 47 "device_grouped_conv3d_bwd_weight_bilinear_instance" 47 48 "device_grouped_conv3d_bwd_weight_scale_instance" 48 49 ]; 50 + requiredSystemFeatures = [ "big-parallel" ]; 49 51 }; 50 52 grouped_conv_fwd_3d = { 51 53 targets = [ ··· 60 62 "device_grouped_conv3d_fwd_scaleadd_ab_instance" 61 63 "device_grouped_conv3d_fwd_scaleadd_scaleadd_relu_instance" 62 64 ]; 65 + requiredSystemFeatures = [ "big-parallel" ]; 63 66 }; 64 67 batched_gemm = { 65 68 targets = [ ··· 77 80 "device_grouped_gemm_fixed_nk_multi_abd_instance" 78 81 "device_grouped_gemm_tile_loop_instance" 79 82 ]; 83 + requiredSystemFeatures = [ "big-parallel" ]; 80 84 }; 81 85 gemm_universal = { 82 86 targets = [ ··· 108 112 "device_gemm_splitk_instance" 109 113 "device_gemm_streamk_instance" 110 114 ]; 115 + requiredSystemFeatures = [ "big-parallel" ]; 111 116 }; 112 117 conv = { 113 118 targets = [ ··· 118 123 "device_conv2d_fwd_bias_relu_add_instance" 119 124 "device_conv3d_bwd_data_instance" 120 125 ]; 126 + requiredSystemFeatures = [ "big-parallel" ]; 121 127 }; 122 128 pool = { 123 129 targets = [ ··· 139 145 "device_normalization_bwd_gamma_beta_instance" 140 146 "device_normalization_fwd_instance" 141 147 ]; 148 + requiredSystemFeatures = [ "big-parallel" ]; 142 149 }; 143 150 other2 = { 144 151 targets = [ ··· 150 157 "device_softmax_instance" 151 158 "device_transpose_instance" 152 159 ]; 160 + requiredSystemFeatures = [ "big-parallel" ]; 153 161 }; 154 162 }; 155 163 tensorOpBuilder =