Yacy is, by its own definition, a distributed Web Search Engine based on a peer-to-peer network. I tried to run a peer (again). Here's a résumé of my general thoughts.
What speaks for yacy:
It tries to solve a prevalent and urgent problem: The hegemonic1 ownership of indexes to the web, with all the devastating consequences of capitalist possession, deutungshoheit und destructiveness to privacy and autonomy.
It addresses a nerdy sentiment. With all the feedback on statistics and topologies it gives the impression that important stuff is continuously happening and allures to tweak the omnifarious options.
What speaks against yacy:
There's a contradiction in the basic assumption that distributed index data can be collected, evaluated and presented to a user in real time. At least one thing has to give: Either the data is incomplete and insufficient (that is, the user doesn't get what is possibly existing in the system) or the data doesn't arrive in a timely manner (that is a manner, an interactive user still would consider as responsive).
There's a hard problem still unsolved: Maintaining a reliable index in a system of coming and going dht peers.
The common index is highly vulnerable to be flooded with spam, adverts (think e. g. CNAME trackers) and exploits of all kinds. It's hard to imagine how independent peers could even try to maintain a meaningfull common index. That's part of the general problem of what the internet has become and is going to be.2
The implementation fails to KISS. Yacy tries to be anything:
- a blog
- a blog generator
- a browser
- a caching proxy
- a content aggregator (snippets)
- a content api
- a content management system
- a content validator
- a cron service
- a database
- a database client
- a database converter
- a database import and export tool
- a database server
- a desktop application
- a dht peer
- a filter for inappropriate content
- a gis
- a graphical configuration interface
- a graphical user interface
- a http proxy
- a log aggregator
- a messenger
- a network service
- a presentator for statistical data
- a ranking algorithm
- a semantic language parser (it doesn't even know, but tries anyway)
- a standalone intranet index
- a system demon (sorta)
- a visualizer for topologies
- a webcrawler
- a website
- a website generator
- a wiki
- a wiki generator
- an access, permission and acl manager
- an aggregator for statistical data
- an indexer
- an input parser
- an url proxy
But it still lacks:
- a block chain
- a media player
- a voice assistant
- an artificial intelligence
- and a seal of approval from Von-Leiter-Institut für verteiltes Echtzeit-Java
How yacy addresses a need and ultimately screws up can be seen in several reports all over the net where the authors collectively come to a similar conclusion: "I tried. I wanted it to work. But it didn't work for me. But I still run an instance, maybe it will help somehow."
Putting all the criticism to the side, I have learend one new thing: A search engine shouldn't seperate semantic parsing and ranked output. But that's a topic for another post.
Update. A reader comments:
As someone who just tried to use YaCy, I unfortunately agree with your review. IMO the even bigger issue preventing people from using it is spam. Even if setup and configuration was incredibly error-prone and difficult, maybe some people would be able to set up servers and share them with other users, and YaCy use could spread. But if the top results on every search are full of spam, then the core feature of the program is broken and no one will want to use it.
I started indexing some websites on my local instance, and search seemed to be OK. Then I connected it to the rest of the internet and pulled in search results from the YaCy network, and instantly the exact same search terms started pulling up spam.
It's a disappointment, but I think spam is an inevitable conclusion when you ask random machines to show you links to websites.