Cweb

From ProgClub
Revision as of 09:12, 8 August 2011 by John (talk | contribs)
Jump to: navigation, search

Cweb is a Blackbrick project hosted at ProgClub. It will be licensed under the GPL. "Cweb" is for "Collaborative Web", and essentially the software is a distributed search engine implemented on a 64-bit LAMP platform.

The site will be implemented by a distributed set of providers. In order to become a provider a user will need to register their system with ProgClub/Blackbrick. They will get a host entry in the cweb.blackbrick.com DNS zone, so for example my cweb provider site would be jj5.cweb.blackbrick.com. I will then need to setup my 64-bit LAMP server to host /blackbrick-cweb, and maybe setup an appropriate NAT on my home router to my LAMP box. Not sure yet how I'm going to manage HTTPS and certificates. HTTPS would be nice, but maybe we'll make that a v2 feature. Ideally Blackbrick would be able to issue certificates for hosts in the cweb.blackbrick.com zone, but I'm not sure what would be involved in becoming a CA like that.

There will be a front-end for cweb on all provider sites in /blackbrick-cweb/. The user will be able to submit queries from this front-end, and also submit URLs they find useful for particular queries, submit URLs they find as not useful for particular queries. Users will be able to go to their own, or others', cweb sites. Users will need to be logged-in to their cweb site in order to submit usefull/useless URLs. Votes for useful/useless URLs will be per cweb site, not per user.

Initially we won't be indexing the entire web. We'll start with HTML only, and have a list of domains that we support. As we grow we can enable the indexing of more domains. We'll start with domains like en.wikipedia.org and useful sites like that. Also, initially we will only be supporting English. That's because I don't know anything about other languages. To the extent that I can I will design so as to make the incorporation of other languages possible as the project matures.

There will be a 'master' cweb site, available from master.cweb.blackbrick.com. I might speak to ProgSoc about getting them to provide me a virtual machine on Morpheus for me to use as the cweb master. As the project matures there might be multiple IP addresses on master.cweb.blackbrick.com. The cweb master is responsible for:

  • Nominating and distributing the blacklist
  • Nominating and distributing Cweb IDs
  • Nominating and distributing Domain IDs
  • Nominating and distributing URL IDs
  • Nominating and distributing Cweb Providers

Cweb will need to be able to function in an untrusted environment, full of liars and spammers. So, provision will need to be made to facilitate data integrity. Essentially all cweb sites will record the Cweb ID of the site that provided them with particular data, and if that Cweb ID ever makes it onto the blacklist then all data from that site will be deleted.

Cweb will be designed to facilitate anonymous queries. This will work by having Cweb sites forward queries at random. When a request for a query is received by a cweb site, cweb will pick a number between 1 and 10. If the number is less than 5 (i.e. 1, 2, 3 or 4) then cweb will handle the query. In any other case (i.e. 5, 6, 7, 8, 9 or 10) cweb will forward the query to another cweb site for handling. This will mean that when a request is received for a query by a cweb site, it is most likely that the request has been forwarded. In this way, no-one on the network will be able to track the originator of a query.