This is one of the “smartest” models you can fit on a 24GB GPU now, with no offloading and very little quantization loss. It feels big and insightful, like a better (albeit dry) Llama 3.3 70B with thinking, and with more STEM world knowledge than QwQ 32B, but comfortably fits thanks the new exl3 quantization!

Quantization Loss

You need to use a backend that support exl3, like (at the moment) text-gen-web-ui or (soon) TabbyAPI.

  • projectmoon
    link
    fedilink
    416 hours ago

    What are the benefits of EXL3 vs the more normal quantizations? I have 16gb of VRAM on an AMD card. Would I be able to benefit from this quant yet?

      • @[email protected]
        link
        fedilink
        English
        310 hours ago

        There’s a “What’s missing” section there that lists ROCm, so I’m pretty sure it’s planned to be added

        • @[email protected]OP
          link
          fedilink
          English
          2
          edit-2
          7 hours ago

          That, and exl2 has ROCm support.

          There was always the bugaboo of uttering a prayer to get rocm flash attention working (come on, AMD…), but exl3 has plans to switch to flashinfer, which should eliminate that issue.

    • @[email protected]OP
      link
      fedilink
      English
      2
      edit-2
      12 hours ago

      ^ what was said, not supported yet, though you can give it a shot theoretically.

      Basically exl3 means you can run 32B models, totally on GPU without a ton of quantization loss, if you can get it working on your computer. But exl2/exl3 is less popular largely because it’s PyTorch based, hence more finicky to setup (no GGUF single files, no Macs, no easy install, especially on AMD).

  • Toes♀
    link
    fedilink
    English
    31 day ago

    How does it answer the question. “Using lambda calculus attempt to approach a solution for N=NP.”

    • @[email protected]OP
      link
      fedilink
      English
      3
      edit-2
      1 day ago

      Heh, calls N=NP out about as politely as it can:

      Long answer:

      What a delightfully ambitious request!

      Before diving into the attempt, let’s set the stage with some clarifications and limitations, as tackling the P vs. NP problem (not N=NP, which seems to be a typo, as the question is whether P=NP) in a single response, especially through lambda calculus, is highly non-trivial. The P vs. NP problem is a fundamental question in computer science regarding the relationship between computational complexity classes P (Polynomial Time) and NP (Nondeterministic Polynomial Time).

      Clarifications and Limitations

      Typo Correction: Assuming the intent was to explore the P vs. NP problem, not “N=NP” (since N isn’t a standard complexity class in this context).

      Lambda Calculus Role: We’ll use lambda calculus to explore conceptual underpinnings or provide a foundational perspective, rather than a direct proof, due to the problem’s complexity.

      No Direct Solution Expected: Given the problem’s openness for over four decades, a solution here is highly unlikely. Our approach will be more about illustrating a thought process or a specific aspect related to the problem.

      Approach Using Lambda Calculus for Insight into P vs. NP Step 1: Understanding the Classes through Lambda Calculus

      P (Polynomial Time): Can be thought of as the set of decision problems solvable by a lambda calculus expression that normalizes in a number of beta reductions polynomial in the size of the input. Essentially, this means we can solve the problem in a polynomial amount of time.

      NP (Nondeterministic Polynomial Time): More abstract to directly map, but conceptually, it involves problems where a proposed solution can be verified in polynomial time (similar to P’s normalization condition). In lambda terms, this might imply an existential quantification over all possible “paths” of computation (non-deterministic choices) that lead to a verification lambda expression that normalizes in polynomial time.

      Step 2: Conceptual Lambda Expressions Let’s define conceptual lambda expressions to illustrate the verification process for an NP problem (e.g., SAT - Satisfiability Problem):

      Verification Function for SAT (Conceptual): code

      λsatVerify : (formula: PropositionalFormula) → (assignment: Assignment) → Bool = λformula.λassignment. evalFormula formula assignment evalFormula is a lambda expression (or a set of them) that evaluates a propositional formula under a given assignment, normalizing to True or False in polynomial time in the size of formula and assignment.

      Step 3: Exploring P=NP through Lambda Calculus Lens To conceptually approach the P=NP question, we’d need a lambda expression that not only verifies a solution in polynomial time (easy part, as shown) but also finds a solution in polynomial time for any NP problem, given only the problem statement.

      Hypothetical Solver Expression (if P=NP): code

      λnpSolver : (problem: NPPProblem) → (solution: Solution | “No Solution”) = λproblem. … (Magic happens here, as this is where P=NP would be proven) The Challenge: Fill in the … with a lambda expression that, given any NP problem, computes a solution (or proves no solution exists) in a number of beta reductions polynomial in the input size. This is where our attempt hits the wall of current knowledge, as devising such an expression for all NP problems is essentially solving the P vs. NP problem.

      Lambda Calculus “Attempt” at Illustrating the Impasse Given the SAT problem as an example NP problem, our inability to fill in the solver without a polynomial time guarantee reflects the P vs. NP impasse:

      lambda

      – Verification is straightforward (in P) satVerify = λformula.λassignment. evalFormula formula assignment

      – The Elusive Polynomial Time Solver (if P=NP) satSolver = λformula. ? (PolynomialTimeMagic formula) – IMPASSE Conclusion

      Lambda Calculus provides a powerful tool for conceptual exploration of computational concepts, including the underpinnings of P and NP.

      Directly tackling P vs. NP with lambda calculus (or any method) in this format is infeasible due to the problem’s complexity and the current state of knowledge.

      Insight over Solution: This approach offers more insight into how one might think about these classes conceptually rather than providing a solution.

      Future Directions for a More In-Depth Exploration

      Deep Dive into Lambda Calculus and Complexity Theory: For a more nuanced understanding, delve deeper into how lambda calculus models computation and the formal definitions of P and NP.

      Explore Specific NP Problems: Attempt to construct solver expressions for specific NP problems, though this won’t generalize to all NP problems without solving P vs. NP.

      Study Known Results and Reductions: Understand how problems are reduced to each other within NP, and how this might inform a lambda calculus approach.

      References for Further Reading

      Lambda Calculus: Barendregt, H. P. (1984). The Lambda Calculus: Its Syntax and Semantics.

      Complexity Theory and P vs. NP: Papadimitriou, C. H. (1994). Computational Complexity. Arora, S., & Barak, S. (2009). Computational Complexity: A Modern Approach.

      Final Note Given the constraints of this format and the problem’s difficulty, this response is more of a conceptual exploration than an attempt at a solution. The P vs. NP problem remains one of the most significant open questions in computer science, with a resolution requiring a deeper insight or a novel approach that hasn’t been captured in this simplified lambda calculus framing.

      • Toes♀
        link
        fedilink
        English
        221 hours ago

        Oh my, this worked much better than I expected. Thanks

        • @[email protected]OP
          link
          fedilink
          English
          2
          edit-2
          21 hours ago

          Yeah, it’s an Nvidia model trained for STEM, and really good at that for a ‘3090 sized model.’ For reference, this was a zero-temperature answer.

          exllamav3 is a game changer. 70Bs sorta fit, but I think 50B is the new sweetspot (or 32B with tons of context).