Huawei-led team claims it post-trained DeepSeek's 1.6-trillion-parameter model — 1,000 Ascend 910C chips used in training
… Post-training is essentially the “tuning” stage that follows the much larger pre-training phase. Pre-training builds a model's core capabilities by working through enormous text corpora, and DeepSeek's documentation puts V4-Pro's pre-training corpus at more than 32 trillion tokens. …