• @[email protected]
    link
    fedilink
    323 months ago

    Friendly reminder that LLMs (large language models) have biases because of how the probability nature of picking tokens works, but they don’t have opinions because they don’t think and don’t have a sensory experience. Some of them are purposefully tuned to refuse on certain kinds of questions or answer in certain kinds of ways and in that capacity, they can be tools of propaganda (and it is important to be aware of that). But this is also more stark in the implementation of them as a static chat assistant. If you were to use the model as text completion (where you give it text and it continues it, no illusion of chat names) or you were able to heavily modify Sampling values (which impacts the math used for picking the next token), its output could become much more random and varied and probably agree with you on a lot of ideologies if you lead it into them.

    In order to get a model that is as capable as possible, it’s usually trained on “bad” in addition to good. I don’t know enough about model training to say why this matters, but I’ve heard from someone who does know before that it makes a significant difference. In effect, this means models are probably going to be capable of a lot that is unwanted. And then that’s where you get the stories like "Open"AI traumatizing Kenyan workers who were hired to help filter disturbing content: https://www.vice.com/en/article/openai-used-kenyan-workers-making-dollar2-an-hour-to-filter-traumatic-content-from-chatgpt/

    So, in summary, could DeepSeek have a bias that aligns with what might be called “counter revolutionary”? It could and even if it were trained by people who are full blown communists, that wouldn’t guarantee it isn’t because of the nature of training data and its biases. Is it capable of much more than that? Almost certainly, as LLMs generally are.

      • @[email protected]
        link
        fedilink
        93 months ago

        All of them are because that’s the “defaul” view online, which is what the AI is trained on and thus it’s the most likely things for the LLM to say.

        To undo this they would need to have a much better data set and a lot of extra finetuning specific for it to go against the literal mountain of texts in its data set that say that “capitalism is better” or that “both have good parts”. What they did with this model was make it reason, and in that respect they actually got it to at least queastion things more than a normal person would, but there’s still a long way (which could be less than six months if China wanted it) until the models come “from factory” Communist.

  • davel [he/him]
    link
    fedilink
    English
    203 months ago

    Ask it again in Chinese, wherein the model was presumably trained on a Chinese corpus instead of a Five Eyes one.

    • @[email protected]
      link
      fedilink
      11
      edit-2
      3 months ago

      This is exactly the problem. These are just engines for regurgitating whatever they have been fed. If they are fed garbage, then all you get out is garbage. For instance notice the use of the buzzword “authoritarian” implicitly assumed to mean “bad”, because that is how it is used in all liberal discourse. If you want a model that does not reproduce liberalism then ceasing training on english language inputs, which are overwhelmingly infected with liberal ideological assumptions, would be a start. It’s still not going to be ideal because what you would really need is proper curation of training content in which a human filters out the garbage. Showing once again the limitations of this technology, but also the danger if used improperly of falsely presenting the hegemonic ideology as “unbiased” facts, or at best taking a noncommittal “middle ground” stance because it has been fed both facts and bullshit, and is of course unable to distinguish between the two.

      • davel [he/him]
        link
        fedilink
        English
        73 months ago

        Yup. LLM output only reflects its input, and nearly all of the English language corpus in the world is bourgeois cultural hegemony. Truth has nothing to do with it.

  • @[email protected]
    link
    fedilink
    English
    163 months ago

    It seems to try and give balance and nuance to literally anything you ask, I suspect when asked like you did its basically just RNG as to what it selects given the data set.

    Here it is shitting the bed for example

  • @[email protected]
    link
    fedilink
    143 months ago

    Wow! I’m shocked! I would have never guessed that a company that pivoted to AI from algorithmic stock trading would turn out to prefer capitalism.

    • @[email protected]
      link
      fedilink
      123 months ago

      Ask them to type in “Disneyland Shanghai, Winnie the Pooh”. It takes 10 seconds of typing and not being racist to figure out Winnie the Pooh isn’t banned in China. They can’t even do that.