• @[email protected]
    link
    fedilink
    English
    8
    edit-2
    24 hours ago

    1.2T param, 78B active, hybrid MoE

    That’s enormous, very much not local, heh.

    Here’s the actual article translation (which seems right comparing to other translations):

    Translation

    DeepSeek R2: Unit Cost Drops 97.3%, Imminent Release + Core Specifications

    Author: Chasing Trends Observer
    Veteran Crypto Investor Watching from Afar
    2025-04-25 12:06:16 Sichuan

    Three Core Technological Breakthroughs of DeepSeek R2:

    1. Architectural Innovation
      Adopts proprietary Hybrid MoE 3.0 architecture, achieving 1.2 trillion dynamically activated parameters (actual computational consumption: 78 billion parameters).
      Validated by Alibaba Cloud tests:
    • 97.3% reduction in per-token cost compared to GPT-4 Turbo for long-text inference tasks
      (Data source: IDC Computing Power Economic Model)
    1. Data Engineering
      Constructed 5.2PB high-quality corpus covering finance, law, patents, and vertical domains.
      Multi-stage semantic distillation boosts instruction compliance accuracy to 89.7%
      (Benchmark: C-Eval 2.0 test set)

    2. Hardware Optimization
      Proprietary distributed training framework achieves:

    • 82% utilization rate on Ascend 910B chip clusters
    • 512 PetaFLOPS actual computing power at FP16 precision
    • 91% efficiency of equivalent-scale A100 clusters
      (Validated by Huawei Labs)

    Application Layer Advancements - Three Multimodal Breakthroughs:

    1. Vision Understanding
      ViT-Transformer hybrid architecture achieves:
    • 92.4 mAP on COCO dataset object segmentation
    • 11.6% improvement over CLIP models
    1. Industrial Inspection
      Adaptive feature fusion algorithm reduces false detection rate to 7.2E-6 in photovoltaic EL defect detection
      (Field data from LONGi Green Energy production lines)

    2. Medical Diagnostics
      Knowledge graph-enhanced chest X-ray multi-disease recognition:

    • 98.1% accuracy vs. 96.3% average of senior radiologist panels
      (Blind test results from Peking Union Medical College Hospital)

    Key Highlight:
    8-bit quantization compression achieves:

    • 83% model size reduction
    • <2% accuracy loss
      (Enables edge device deployment - Technical White Paper Chapter 4.2)

    Others translate it as ‘sub-8-bit’ quantization, which is interesting too.

    • @[email protected]
      link
      fedilink
      English
      618 hours ago

      I’m sad to see how many mentions of “proprietary” there are in there. I didn’t think that was DeepSeek’s way of doing things.

      • @[email protected]
        link
        fedilink
        English
        311 hours ago

        The rumor is probably total BS, heh.

        That being said, it’s not surprising if Deepseek goes more commercial since it’s basically China’s ChatGPT now.