Deploy021 cuda/Triton 写一个LayerNorm/RmsNorm