I just started using this myself, seems pretty great so far!

Clearly doesn’t stop all AI crawlers, but a significantly large chunk of them.

  • @[email protected]
    link
    fedilink
    English
    7
    edit-2
    1 month ago

    It’s a clever solution but I did see one recently that IMO was more elegant for noscript users. I can’t remember the name but it would create a dummy link that human users won’t touch, but webcrawlers will naturally navigate into, but then generates an infinitely deep tree of super basic HTML to force bots into endlessly trawling a cheap-to-serve portion of your webserver instead of something heavier. Might have even integrated with fail2ban to pick out obvious bots and keep them off your network for good.

    • @[email protected]
      link
      fedilink
      English
      41 month ago

      That’s a tarpit that you’re describing, like iocaine or nepthasis. Those are to feed the crawler junk data to try and make their eventual output bad.

      Anubis tries to not let the AI crawlers in at all.

    • @[email protected]
      link
      fedilink
      English
      21 month ago

      generates an infinitely deep tree

      Wouldn’t the bot simply limit the depth of it’s seek?

      • Cethin
        link
        fedilink
        English
        31 month ago

        It could be infinitely wide too if they desired. It shouldn’t be that hard to do I wouldn’t think. I would suspect they limit the time a chain can use though to eventually escape out, though this still protects data because it obfuscates legitimate data that it wants. The goal isn’t to trap them forever. It’s to keep them from getting anything useful.

      • nickwitha_k (he/him)
        link
        fedilink
        English
        11 month ago

        That would be reasonable. The people running these things aren’t reasonable. They ignore every established mechanism to communicate a lack of consent to their activity because they don’t respect others’ agency and want everything.

  • @[email protected]
    link
    fedilink
    English
    31 month ago

    Meaning it wastes time and power such that it gets expensive on a large scale? Or does it mine crypto?

  • @[email protected]
    link
    fedilink
    English
    31 month ago

    I think the maze approach is better, this seems like it hurts valid users if the web more than a company would be.

      • @[email protected]
        link
        fedilink
        English
        11 month ago

        This looks like it can can actually fuck up some models, but the unnecessary CPU load it will generate means most websites won’t use it unfortunately

  • randomblock1
    link
    fedilink
    English
    3
    edit-2
    1 month ago

    Why Sha256? Literally every processor has a crypto accelerator and will easily pass. And datacenter servers have beefy server CPUs. This is only effective against no-JS scrapers. It seems to me that anything that uses a Chromium driver to scrape will be completely ignored… which is increasingly most of them.

    • poVoq
      link
      fedilink
      English
      3
      edit-2
      1 month ago

      It requires a bunch of browser features that non-user browsers don’t have, and the proof-of-work part is like the least relevant piece in this that only gets invoked once a week or so to generate a unique cookie.

      I sometimes have the feeling that as soon as some crypto-currency related features are mentioned people shut off part of their brain. Either because they hate crypto-currencies or because crypto-currency scammers have trained them to only look at some technical implementation details and fail to see the larger picture that they are being scammed.

      • @[email protected]
        link
        fedilink
        English
        01 month ago

        So if you try to access a website using this technology via terminal, what happens? The connection fails?

        • Drew
          link
          fedilink
          English
          01 month ago

          If your browser doesn’t have a Mozilla user agent (I.e. like chrome or Firefox) it will pass directly. Most AI crawlers use these user agents to pretend to be human users

  • Daniel Quinn
    link
    fedilink
    English
    31 month ago

    It’s a rather brilliant idea really, but when you consider the environmental implications of forcing web requests to ensure proof of work to function, this effectively burns a more coal for every site that implements it.

    • @[email protected]
      link
      fedilink
      English
      21 month ago

      I don’t think AI companies care, and I wholeheartedly support any and all FOSS projects using PoW when serving their websites. I’d rather have that than have them go down

  • @[email protected]
    link
    fedilink
    English
    1
    edit-2
    1 month ago

    I did not find any instruction on the source page on how to actually deploy this. That would be a nice touch imho.

    • @[email protected]
      link
      fedilink
      English
      11 month ago

      There are some detailed instructions on the docs site, tho I agree it’d be nice to have in the readme, too.

      Sounds like the dev was not expecting this much interest for the project out of nowhere so there will def be gaps.

  • drkt
    link
    fedilink
    English
    -11 month ago

    Anubis is provided to the public for free in order to help advance the common good. In return, we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment.
    If you want to run an unbranded or white-label version of Anubis, please contact Xe to arrange a contract.

    This is icky to me. Cool idea, but this is weird.

    • LiveLM
      link
      fedilink
      English
      21 month ago

      …Why? It’s just telling companies they can get support + white-labeling for a fee, and asking you keep their silly little character in a tongue-and-cheek manner.
      Just like they say, you can modify the code and remove for free if you really want, they’re not forbidding you from doing so or anything

      • @[email protected]
        link
        fedilink
        English
        21 month ago

        Just like they say, you can modify the code and remove for free if you really want, they’re not forbidding you from doing so or anything

        True, but I think you are discounting the risk that the actual god Anubis will take displeasure at such an act, potentially dooming one’s real life soul.

      • @[email protected]
        link
        fedilink
        English
        01 month ago

        Yeah, it seems entirely optional. It’s not like manually removing the Anubis character will revoke your access to the code. However, I still do find it a bit weird that they’re asking for that.

        I just can’t imagine most companies implementing Anubis and keeping the character or paying for the service, given that it’s open source. It’s just unprofessional for the first impression of a company’s website being the Anubis devs’ manga OC…

        • @[email protected]
          link
          fedilink
          English
          1
          edit-2
          1 month ago

          It is very different from the usual flat corporate style yes, but this is just their branding. Their blog is full of anime characters like that.

          And it’s not like you’re looking at a literal ad for their company or with their name on it. In that sense it is subtle, though a bit unusual.

  • @[email protected]
    link
    fedilink
    English
    -11 month ago

    Giant middle finger from me – and probably everyone else who uses NoScript – for trying to enshittify what’s left of the good parts of the web.

    Seriously, FUCK THAT.

    • LiveLM
      link
      fedilink
      English
      2
      edit-2
      1 month ago

      You should blame the big tech giants and their callous disregard for everyone else for the Enshittification, not the folks just trying to keep their servers up.

  • Possibly linux
    link
    fedilink
    English
    -2
    edit-2
    1 month ago

    It is not great on many levels.

    • It only runs against the Firefox user agent. This is not great as the user agent can easy be changed. It may work now but tomorrow that could all change.

    • It doesn’t measure load so even if your website has only a few people accessing it they will stick have to do the proof of work.

    • The POW algorithm is not well designed and requires a lot of compute on the server which means that it could be used as a denial of service attack vector. It also uses sha256 which isn’t optimized for a proof of work type calculation and can be brute forced pretty easily with hardware.

    • I don’t really care for the animé cat girl thing. This is more of a personal thing but I don’t think it is appropriate.

    In summary the Tor implementation is a lot better. I would love to see someone port it to the clearnet. I think this project was created by someone lacking experience which I find a bit concerning.

      • Bilb!
        link
        fedilink
        English
        11 month ago

        Catgirls, jackalgirls, all embarrassing. Go full-on furry.

    • @[email protected]
      link
      fedilink
      English
      11 month ago

      …you do realize that brute forcing it is the work you use to prove yourself, right? That’s the whole point of PoW

      • Possibly linux
        link
        fedilink
        English
        01 month ago

        True, I should of phrased that better.

        The issue is that sha256 is fairly easy to do at scale. Modern high performance hardware is well optimized for it so you could still perform attack with a bunch of GPUs. AI scrapers tend to have a lot of those.