• @[email protected]
    link
    fedilink
    61 month ago

    It’s probably deepseek r1, which is a “reasoning” model so basically it has sub-models doing things like running computation while the “supervisor” part of the model “talks to them” and relays back the approach. Trying to imitate the way humans think. That being said, models are getting “agentic” meaning they have the ability to run software tools against what you send them, and while it’s obviously being super hyped up by all the tech bro accellerationists, it is likely where LLMs and the like are headed, for better or for worse.

    • @[email protected]
      link
      fedilink
      11 month ago

      Still, this does not quite address the issue of tokenization making it difficult for most models to accurately distinguish between the hexadecimals here.

      Having the model write code to solve an issue and then ask it to execute it is an established technique to circumvent this issue, but all of the model interfaces I know of with this capability are very explicit about when they are making use of this tool.

      • @[email protected]
        link
        fedilink
        11 month ago

        Not really a concern. It’s basically translation, which language models excel at. It just needs a mapping of the hex to byte

            • @[email protected]
              link
              fedilink
              11 month ago

              It’s not out of the question that we get emergent behaviour where the model can connect non-optimally mapped tokens and still translate them correctly, yeah.

              • @[email protected]
                link
                fedilink
                01 month ago

                I’m confused, is the concern when the model doesn’t properly identify when it is using software to identify something like a hex pattern?

                • @[email protected]
                  link
                  fedilink
                  01 month ago

                  The concern is that the model doesn’t actually see the world in terms of distinct hexadecimals, but instead as tokens of variable size - you can see this using the tiktokenizer-webapp: enter some text and it will split it into the series of tokens the model actually will process.

                  It’s not impossible for the model to work it out anyway, but it is a reason for this type of task to be a bit harder on LLMs.

                  • @[email protected]
                    link
                    fedilink
                    01 month ago

                    I understand how base models tokenize language. What I’m curious about you’re basing your response off a horrendously screenshotted meme image of someone interacting with deepseek. Is your concern that deepseek isn’t showing the code used to approach a hex string? Because that’s certainly a valid concern, though you can ask the model to output the code it is running. That’s definitely an ethics improvement that should be made in the UI, but it’s very clear what the model is doing under the hood