<lioux> Pav, I've been wondering about doing some changes to our distribution mechanism... the ports one
<Pav> hm?
<lioux> using some sort of tornado fec or merkle trees (don't know both yet but look good from what I've read) to make it possible to do multi-part downloads
<lioux> for ports
<lioux> ala bittorrent but for specifically for our needs
<lioux> so as to improve performance and reduce strain on MASTER_SITE_{LOCAL,FREEBSD}
<lioux> and possibly other sites as well if we do it correctly
<lioux> I've been meaning to do something like this for the past 2 years but never got around to it
<lioux> http://citeseer.ist.psu.edu/39448.html
<Pav> I think the most needed thing is a master sites randomization
<lioux> merkle trees idea came from #bittorrent guy
<Pav> on by default
<Pav> say, first site on MASTER_SITE_GNOME got hammered really bad by us lately
<lioux> well, I want multi-part download with recovery
<lioux> RANDOMIZATION? you really think that's prio #1, let me cook it
<Pav> it's there, just turned off by default
<lioux> ah, okay
<Pav> because it's not really clever. something geographically aware would be nice
<lioux> I've seen the flag but does it work well with :lables?
<Pav> yes
<lioux> geographically, we would need to include support for geoip or something like that
<Pav> yes
<Pav> and I don't know 1) licencing issues, 2) how good it scales
<lioux> well, we would sort by region of the downloader 1st then randomize the rest of the sites
<lioux> that would be a simple stragegy which would buy us some speed
<lioux> though, it would still hurt the bigger distribution sites such as US since they're the most active downloaders and most loaded sites
<Pav> question is, how to obtain region of downloader?
<lioux> geoip
<lioux> you run a query against the possible sites and against downloader IP
<lioux> though, it wonn't work against nat
<Pav> what if user is behind NAT
<lioux> unless we have a offsite cgi (or something) only for that
<lioux> which would become a single point of failure
<lioux> not to mention excess load
<Pav> can't do, people would start talking about 'sending personal data'
<Pav> everything must be done and decided on the user's box
<lioux> wel,l just disable it
<lioux> no can do then
<lioux> you have to rely on a outside source to find out your ip
<Pav> if i want to have it enabled by default, it must be as politically correct as possible
<lioux> with http, the server does not tell you the ip
<lioux> with ftp, you can try to parse the info but it's dodgy
<lioux> and 'sending personal data' will happen all the time
<lioux> ppl already send their ips when they connect to ftp sites
<lioux> it's just a matter of policatlly handkling it correctly
<lioux> also, it will be a cry wolf for a few months then it goes silent
<lioux> but it's prone to super bikeshedding
<Pav> that's probably the reason why eik gave up
<Pav> a view to a bikeshed
<lioux> well, the best way is to write a private implementation, run it in pointyhat on a -exp run
<lioux> THEN, after it is all tested, tell -developers
<lioux> to gauge what they think
<lioux> not to get other implementations
<lioux> IF we get enough support, we let it out on -ports
<lioux> which will be the worst part, expect a few months of discussion
<Pav> only workable plan is to sneak it into the tree before anyone can object
<lioux> portupgrade could do that
<Pav> the bikeshed would be unbearable
<lioux> Pav, not really, we will get shot very hard if we do it covertly
<lioux> we have been for a lot less in the past
<lioux> but it's the best approach against bikeshed
<Pav> anyway, a dedicated sf mirror for freebsd ports would be nice
<lioux> :)
<Pav> i wonder if anyone could arrange that
<lioux> anyway, I still want a reliable download with downloader site recovery
<lioux> when we get a bad distfile, all ports system break, IF the user doesn't distclean
<Pav> you're going to add the functionality to fetch(1) ?
<lioux> to say the least
<lioux> fetch? hell no, des would kill me
<lioux> write a lib to write a client 1st, THEN, see what happens
<lioux> thoguh, we could try tapping libbt, which I began porting
<Pav> it must be in base, you know. otherwise what
<lioux> but that's for bittorent compatibility
<lioux> yeah, I want it in base
<Pav> now that will be a major bikeshed
<Pav> first UNIX with bittorrent client *in base* oh hell :)
<lioux> bittorrent would be nice but it requires the sites to run specific server side program
<lioux> we need something that's client only
<lioux> but that would be only multi-part with user chunk recovery 
<lioux> a bitttorent like cooperative p2p application would be nicer but then again, we need a server side application even if we have it a fbsd.org specific box instead of in all other servers
<lioux> which I think is the best approach
<lioux> we share the burden of download with all servers and the users
<lioux> best case scenario, we will decrease load on servers
<Pav> using p2p technology would meet strong denial from security people
<lioux> worst case scenario, user can't connect to our server to get info and he will go with normal ftp/http so it's not slower than it is now. Or, he will not find anyone else to share
<lioux> Pav, security? we will send the chunk checking bits INSIDE the ports tree distinfo
<Pav> that's a considerable delay if you have to wait for timeout
<lioux> that's best security as we will ever get, md5 checked and chunk verified
<lioux> Pav, not really, we gather all info when beginning the fetching process
<lioux> bt is pretty damn fast
<Pav> i wonder how well bittorrent trackers scale... 
<lioux> even when the tracker timeout
<lioux> there is a problem with the trackers when there is only one seed and zillions of downloaders
<Pav> and, how will you handle seeding after download? will the client continue on background for some amount of time or ...?
<Pav> if everyone hit-and-run .. it will not work well
<lioux> seeding after download? that's why I want a server on our side
<Pav> and who will be injecting new files to the tracker?
<lioux> not quite bt, hang on, gimme a sec, I have to get the phone
<Pav> you can't expect every committer to put the files on tracker by hand
<lioux> let me explain my idea, it
<lioux> is not quite bt, it was a bad example
<lioux> anytime anything is added to the ports tree, some chunk hash/fec is generated so that the file can be chunk verified and, therefore, safely multi-part downloaded
<ahze-> do you mean replace fetch with bittorrent or something similar to bittorrent ?
<lioux> any time we try to download, we check a remote offsite (fbsd org probably) to check if anyone else can cooperate to download the same dist
<lioux> ahze, hang on
<lioux> IF we can't cooperate, we do a standard multi-part download from all available mirror sites (or part of them to relieve load) with randomziation or geoip (which is better)
<lioux> with the chunk verification, it becames easy to recover and do it as fast as possible
<lioux> why the offsite server? because it helps with 0day releases
<lioux> such a KDE/GNOME/XORG/OPENOFFICE release
<Pav> this starts sounding like warez :)
<lioux> which has zillions of downloaders at the same time
<Pav> what's offsite server, again?
<lioux> think something like a tracker but simpler cause we do not upload the torrents
<lioux> it will feed on a from ports (safe) generated info
<Pav> what should this server do?
<lioux> just track if there is someone downloading something you also want to download
<Pav> what will happen when someone else is downloading?
<lioux> so you can cooperate and share bandwidth
<lioux> like bt in this case
<ahze-> what about something like the way prozilla worked something to use all the master sites or X number of MASTER_SITES at once?
<lioux> ahze, multi-part download? that would be the standard way when there isn't a 0day release
<lioux> since we know ppl will just download and run
<lioux> I just want to add a top layer where we can tap on user resources to increase speed delivery and decrease server load when we have huge releases
<lioux> I know for sure that ppl stays HOURS downloading KDE when it's releases
<Pav> it's all about finding a good mirror
<lioux> not just that, I want to share the burden
<Pav> ok fine, I support the idea, I'm just unsure about the implementation
<Pav> if I got it right, you want to write a stripped-down torrent system
<lioux> me too, it's one of those things which will be designed as we go
<ahze-> yeah, i think the idea is great.
<lioux> but convincing fbsd.org developers (committers) about adding a onsite "tracker" will be dread hehe
<lioux> specially SECoff as you mentioned
<Pav> it's not developers, it's admins
<lioux> where do we get a cvs for this?
<Pav> there's projects module in cvs, I think you could use it
<lioux> admins? we can as well add something under .firepipe if need be
<Pav> cvsweb is there
<lioux> for proof of concept then we need to move it to an official site
<lioux> I have always wanted to learn automated partial recovery algorithms hehe
<lioux> someone want to share the development burden :) I know everyone's time is scarce so sharing would be nice hehe
<Pav> you could make it non-FreeBSD specific
<Pav> I bet a lot of other say Linux users download files :)
<lioux> could be, with a C++ or ansi C implementation
<Pav> make it pure C, C++ scares people 
<lioux> but it would need to be integrated into each system
<lioux> so we would make hooks for ports, let them worry about dpkg or else
<Pav> drop-in replacement for fetch, wget, curl
<Pav> write some simple wrappers
<lioux> though, I know how ppl bikeshed between distributions so I'm sure most wouldn't use ours heheh
<lioux> Pav, yeah, replacing FETCH_CMD
<Pav> possibly gentoo people could use it, they use stock wget
<Pav> having common pool of clients downloading same tarballs on gentoo and freebsd machines would be good
<lioux> I just want to write the check layer and cooperative layer, we could tap on libfetch for the ftp/http handling
<lioux> Pav, hell yeah if we could get them to use the cooperative server
<Pav> lioux: i don't see a problem, we have few ports having their distfile mirrors as download sites :)
<lioux> we need a proof of concept before submitting and add it as a port
<Pav> well, write one :)
<Pav> I
<lioux> heh
<Pav> I
<Pav> !
<Pav> I'll watch your progress from the position of noninvolved stranger
<lioux> it's just I'll probably begin a new job next week (i'm currenlty in-between jobs, that's why I'm around so much) so time will become a rare commodity again
<lioux> let write a proposal paper
<lioux> s/let/let me/
<lioux> C++... humm, it would be nice to write something in C++ after this many years
<lioux> I know it scares the hell out of ppl thogh

--------------------

<wca> is getright really a worthwhile program to use?  i think most MASTER_SITES can easily saturate users downstreams
<wca> how much better is it if you use more than one connection?  10%?
<lioux> I'm not aiming for speed, but off loading
<lioux> I'll mention l8r on that multi-part does not guarantee any speed increase for the downloader since speed varies across multiple download sites
<wca> randomizing sites can hurt, if the user happens to pick a slow master site
<lioux> yup
<lioux> good point, I think that it was one of the reasons we discussed for not making it default
<wca> it may be possible to adjust for that but there's too many variables that can affect download rate and server load :-/
<lioux> that the committer already sorts MASTER_SITES for availability and speed
<lioux> yeah, the geographical thing is a very dodgy heuristic, I'm just talking about it cause it's part of the subject
<wca> yes, but note that their sort is only truly valid at the time of sort and from their location
<wca> the more i think about it, the less i like any of the master site sorting because of the huge variety of issues people may encounter with them
<lioux> yup, also that it dismisses DNS roundrobin
<wca> hell, i hadn't even thought of that
<lioux> it can be close to np-complete problem with all the variables. I'll focus only on off loading servers :) which should make lessen the # of heuristics 
<lioux> I'll write the proposal then get some comments from a few ppl, you, pav, ahze, knu and portmgr
<wca> why do you think it is not np-complete...
<lioux> cause I haven't tried to prove it :)
<wca> heh
<lioux> the more I think about it, the worst part will be the multi-part download, I don't think I've ever seen a good implementation for that in unix
<lioux> wca hence my taking notes hehe for accurate quoting
<wca> i don't think you can do a good implementation of it
<lioux> why not?
<wca> because you can always end up with a slow server for some part of the file
<lioux> that's not a problem with what I am proposing: "off loading"
<lioux> and I'll let users disable it which will not mean much cause ppl do not tweak their port trees, just us power users
<wca> sure it is, if you randomize there's always that possibility
<lioux> I don't mind hitting a slow link site :) I just want to ease on the hardest hit one such as sourceforge and kde
<lioux> and since I'll ppl disable it, it's fair game
<wca> ok
<wca> maybe make it per-port enabled?
<lioux> humm, maybe, I hadn't thought about it, what do you propose?
<wca> i dunno
<wca> almost every port tries to list as many master sites as possible to reduce the possibility of inaccessiblity of a distfile
<wca> but many of those may be "slow", and sometimes only for a relatively small number of users
<lioux> do you want to try a low watermark? if getting less than these bytes, try switching serverS? record and keep the fastest ones?
<wca> heh
<wca> even if you record you still inaccuracies, i.e. server overload and slow sites, due to changing circumstances
<wca> still get
<lioux> there are some heuristical problems with multi-part, how many chunks? how many sites to part? do we change servers given some criteria? if we change, can we go back? how do we guarantee we are being fair to the servers and not spamming them?
<lioux> surely but then again, speed is not critical but off loading :) well, I get your point
<lioux> and we really need ppl to pay attention to fenner's check MASTER_SITES scripts if we do something like this
<lioux> we can't be spending time trying to part with non-working sites
---------------------
<ahze> i think this is a wonderful idea, I'm totally up for anything to make download faster.
<ahze> this is kind of on the same lines, check out ftp/prozilla 
<ahze> it can handle many download sites at once. It's pretty outdated and I think it's FORBIDDEN 
<ahze> but the source might hold some good ideas
<ahze> how is the server-end going to work? is each MASTER_SITES going to have to run something new?
<lioux> nope, we don't touch anyone but the server we nede for cooperation and the client we will write
<lioux> the servers use standard ftp/http
<ahze> cool
<lioux> we need that for overall use
<ahze> I didn't see this, but I feel we need to ability for users if they wish not to upload they have that choice.
<lioux> I've just emailed kuryiama, he seems to be involved with pdtp
<lioux> yeah, but optional, and we will not let them cooperate-tap too much cause it's a bit unfair
<lioux> like bittorrent, everyone get a share of the pie, the more you contribute, bigger the slice
<ahze> yeah =)
<ahze> this sounds great. I'm very interested 
<lioux> the paper means nothing, code talks. before we have code, it's just an idea
<ahze> so how do you plan to handle the hash? when you do 'make fetch' is it going to have to download a torrent file (or similar) before it can download the actually file?
<lioux> the torrent should be already in the ports tree for safety/security
---------------------
<lioux> I'm just trying for a hybrid distribution approach to ease load on distribution sites such as sourceforge
<lioux> and if your stuff is good, I'll use that :)
<lioux> it's just a proposal draft, nothing technical, just some ideas
<tyranix> When you say ports, you mean like the ports tree?
<lioux> yup
<lioux> sorry, I should have been clearer
<tyranix> If so, I've seen proposals/ideas about integrating BitTorrent into Debian's apt too.
<lioux> interesting, let me google for that
<tyranix> merkle trees are actually from Bram who got the idea from someone else.  It's slated for the BitTorrent protocol 2.
<lioux> yeah, someone in the channel said that
<lioux> is bittorrent markovian distributed?
<lioux> bear in mind, I'm not knowledgeable on these algo/techniques. I'm trying to educate myself quickly (only 2 days). I need book suggestions to go faster :)
* lioux has saved a book budget for this month
<lioux> http://sianka.free.fr/
<tyranix> Hmmm I'm not sure about how it's really distributed.  I saw a paper from uiuc at one point that had a flow analysis of BitTorrent.  But from other sources, they said that it had rather major flaws in their assumptions.
<lioux> humm, I gather that when there is only one seed and a huge swarm, the protocol almost crawls
<lioux> pdtp is thinking on how to do a markovian distribution
* lioux really needs books on these theories
<tyranix> Hmmm I asked about pdtp today in #p2p-hackers and they had the same impression I did: rather dull and not that p2p.  It doesn't seem like it will scale well at all (but I couldn't confirm it because their docs online are preliminary and only discuss the protocol and not the behavior as much).
<tyranix> The BitTorrent swarm crawl with only one seed?
<lioux> that's what I've been reading
<lioux> but I didn't do any research/testing
<tyranix> Hmm I don't think that's the case.  The seeder is only needed to get the initial data out there.  As soon as other peers have the data, they can exchange with each other.  You don't need the seeder when there is a distributed copy (i.e. you can get all the pieces from your peers).
<lioux> with a swarm storm which is not usually the case (hundreds of leechers with little data)
<tyranix> There is an initial ramp up period with seeders because when you first start a torrent, no one has pieces so there isn't inter-peer trading, only downloading from the seeder.
<lioux> but it should adjust as data is distributed
<lioux> though, some sort of balancing/distribution algo should be used. bram does not suggest a specific one on his paper, just some heuristics
<lioux> well, let's see how v2 of the protocol goes
<lioux> apt-get bittorrent is not what I want, I want to make it possible to cooperate WHILE multi-part downloading as well
<lioux> and not forcing users to have a daemon on their system , that's hard to push
<lioux> I mean, politically speaking
<lioux> we always have to worry about politics in huge projs :(
<tyranix> hmm I think the BT2 will be dull if he doesn't add more features.
<lioux> also, the torrent should be already in the ports tree for safety/security
<tyranix> What do you mean by cooperate while multi-part downloading?
<lioux> tyranix, everyone multi-part download and use some sort of checking mechanism for verifying the chunks (fec/hash)
<lioux> they also, connect to a server (tracker/whatever) saying what they're doing
<lioux> they cooperate with others connected to the server WHILE they multi-part download
<lioux> thus, increasing likeability that more chunks are available since there are no seeders
<lioux> just leechers, i.e, ppl disconnect as soon as they finish download
<tyranix> Okay so the multi-part download is like the current BT: You have a list of checksums for pieces and you can download from peers because you can check that hash.  What sort of other cooperation do you imply?  I guess I'm missing that part.
<lioux> k, we multi-part from existing ftp/http sites like getright (windows) does
<lioux> very standard, no cooperation
<lioux> we also connect to a "tracker" to let it know that we're downloading
<lioux> is anyone there doing the same? we p2p with them and also use them as an additional multi-part source
<lioux> very simple, nothing fancy
<lioux> though, I'm not sure there is a good multi-part ftp/http opensource code out there
<lioux> the tracker is modified to hold information on ALL freebsd ports
<lioux> and this same info is contained within the ports tree
<tyranix> Not in those terms.  I've heard of websites that are supposed to be "BT enabled" by allowing you to swarm the content with other people while they are there.  I haven't seen a proposal like that because I think people are trying to concentrate on eliminating bandwidth costs.  So instead of doing the multi part, you could force users to use BitTorrent and save on your bw bill.
<lioux> so it's all registered and sorta safe
<lioux> well, it's for project wide distribution. It's not likely I'll be able to force sourceforge, berlios and our mirrors all around the world to run an additional program to fill in as seeders
<lioux> the idea is just to add an additional cooperative layer that requires no server side component other than a server of our own
<tyranix> okay, I see what you mean.
<lioux> :)
<tyranix> Hmm I've thought a little about that, and also the other way.  This is a simple extension, but there could be times when you get say 80% of the BitTorrent file downloaded and then all the peers dry up or the tracker goes down etc.  Now you're left with this incomplete file that has random segments so you can't use wget/curl to continue it.
<tyranix> You could have a way to ask for parts of the file using standard HTTP/FTP commands to request regions using an algorithm to minimize overlapping but at the same time not requesting each individual piece because that's probably too many requests.
<lioux> yeah but that's the idea of writing our own fetching application
<tyranix> ok
<lioux> but you get the idea
<tyranix> yeah
<lioux> what do you think? what else should I add? it's not p2p per se but it uses it
<tyranix> Hmmm this is actually close to my interest area :(  I have a couple motivations and one is seeing something like the internet archive be more BT friendly.  There you have terrabytes of data and few requests for the same data so swarming doesn't help.  The second is to alleviate Debian's servers and use peers something like what you mentioned above.
<tyranix> So are you coding this from the ground up or are you using parts of BT's code/PDTP's?
<lioux> no coding yet but I hope to use already stablished libraries
<lioux> no reason to reinvent the wheel just for the sake of it
<lioux> also, it has to be in C or C++ (I prefer but it will encounter some obstacles)
<lioux> so that it can be distributed in the base system
<lioux> we do not ship interpreters other than csh/sh in the base distribution :)
<tyranix> so that rules BitTorrent out (python) but not pdtp (C)
<lioux> libbt is a c lib
<lioux> I'm probably getting on bittorrent, more interoperability
<lioux> users could use a bittorrent client to get a file if they wish
<lioux> as long as we maintain compatibility
<tyranix> "LibBT is currently targeted at the Linux Operating System."
<lioux> http://libtorrent.sourceforge.net/  http://libtorrent.rakshasa.no/ http://libbt.sourceforge.net/
<tyranix> hmmm I hope there aren't non-portable hacks in it.
<lioux> hah, it's already ported to us
<tyranix> ahh good
<lioux> http://www.freebsd.org/cgi/cvsweb.cgi/ports/net/libbt/
<lioux> flz did it :)
<lioux> now, I just need to add the 2 different projs named libtorrent and see how they go
<lioux> and I need a good multi-part http/ftp download implementation
<lioux> the http/ftp protocal is easy to handle with freebsd specific libfetch
<lioux> and it should be fairly ease for other to use since it's standard C
<lioux> I want portability as much as possible so that others may use this
---------------------

Proposal for a ports distributed FreeBSD fetching mechanism

1) Current state

	FreeBSD ports system currently fetch distfiles from one site at a time serially trying to find a site that works.

	It verifies the accuracy of downloaded files by checking size prior to downloading (when that information is available) and against a previously known md5 hash after download is complete.
	
	A FETCH_REGET feature was introduced to enable our standard download tool fetch(1) to retry downloading distfiles on the event of download failures. This FETCH_REGET feature is ONLY available during the initial fetch stage, it does not work afterwards.
	
	A RANDOMIZE_MASTER_SITES feature was introduced as a server off loading technique to decrease the odds that some sites would be more actively used than others by picking master sites at random.
	
	A MASTER_SORT_REGEX feature was introduced to enable sorting master sites by any criteria. This makes it possible to sort sites by country; thus, allowing users to pick sites that are geographically closer first.
	
	1.1) How fetching works (finite non-deterministic automata)
	
		a) Check if a distfile with the same name exists
			- continue it does not
			- go to (e) if it does
		b) Pick a download site
			- go to (g) if the list is empty
			- continue if we have one left in the list, remove it from list
		c) Check size prior to downloading if server supports this feature
			- continue if size matches or server does not support the feature
			- go to (b) if it does not
		d) Try downloading the file. Fetch finishes either by success or by failure
		e) If a distfile with the same name exists, check hash against a known md5 hash
			- if file does not exist, go to (b)
			- continue if the hash matches
			- if the hash does not match
				- if FETCH_REGET is enabled, add current download site back at the head of the download site list; go to (b), repeat for as many times as FETCH_REGET requires it
				- if FETCH_REGET is not enabled, go to (g)
		f) Download is sucessful, go to (i)
		g) Download is not sucessful, go to (i)
		h) Check hash against a known md5 hash
			- go to (f) if it matches
			- go to (g) if it does not match
		i) End

2) Shortcomings

	2.1) Corrupt download
	
		If a file size matches prior to download BUT the download is interrupted, sometimes a garbled file is left behind. Although, fetch(1) is known to remove downloads that were not downloaded without errors, the FETCH_REGET feature changes that scenario. Now, we can be left with a garbled file that does not match the known md5 hash which means a port will correctly refuse to build.
		
		Moreover, even if the download is not interrupted, it is possible to download a file with correct size but with a hash that does match the known md5 hash.
		
		Furthermore, the ports system will refuse to retry fetching the garbled distfile after the fetch target has finished. The user is required to remove the broken distfile manually or remove all downloaded distfiles (both good and bad) with a distclean make target.
		
	2.2) Corruption recovery
		
		It is, therefore, not possible to automatically recover a corrupt download. If the size is wrong and smaller than the expected one, we could try downloading the rest of it but it does not guarantee that previous parts are not corrupt.
		
		The situation is similar if we have a file that is larger than expected or, even, the same size as expected. In all aforementioned cases, the known md5 hash will not match. We cannot gather any information other than the file is wrong.
		
		We cannot find the corrupt parts of the file to either correct or replace them with good parts using ftp/http range downloading.
		
	2.3) Resuming download

		Since we cannot guarantee that previous parts of a download are not corrupt, download resuming is not guaranteed. However, it is possible with an appropriate download client (e.g., wget(1)) to try resuming downloads during the fetch phase. Nevertheless, it stands that previously either corrupt or incomplete downloads cannot be resumed.

	2.4) Load balancing
	
		2.4.1) Multi part download
		
		It is currently not possible to load balance downloads. There is no support for multi-part download from multiple sites at the same time (e.g., getright windows application) using ftp/http range downloading.
		
		However, it can be argued that although multi-part download reduces strain in a single server by reducing the time a client will be connected to it, it shares the strain on multiple servers at the same time by using connection slots on all of them.
		
		Moreover, the situation is much worse if multi-part download is applied against the same site. This seems weird but users can get increased speed by doing so at the expense of using up multiple connection slots on download sites. This technique should be avoided at all costs.
	
		2.4.2) Randomizing
		
		Randomizing is not enabled by default and it does not guarantee off loading, it just decreases the odds that the same site will always be used by all the users for some distfile. This is a start.
		
		Furthermore, the file has to be completely downloaded from this random site so a site will still will have the burden of handling a complete download.
		
		Moreover, the list is randomized only once per fetch attempt; i.e., after the list has been randomized, it will be shared with ALL the distfiles for a given port. That means that even though the same sites will not be used by all the users, same (other) site list order will be used for all the distfiles of a given user.
		
		2.4.3) Geographical
		
		MASTER_SORT_REGEX geographical sorting only works IF the domain name matches the geographical location desired. Furthermore, there are not examples anymore on FreeBSD documentation on how to use MASTER_SORT_REGEX for geographical sorting. Moreover, it is disabled by default.
		
		Geographical load balancing should happen based on source IP address rather than relying on misleading domain naming. However, there is no guarantee that using geographically closer sites will produce better results, both net proximity and speed have no direct relation to geographical location. Nonetheless, this can decrease strain on international transmission lines. Besides, it may produce good results if both the destination site and the downloader are either connected to the same carrier or close by.

3) Proposal	
		
(rough draft, needs to be polished)

				  The main concept is off loading: reducing the toll that FreeBSD ports produce on the distribution servers.
        
        1) use multi-part download: break the files in chunks and
        grab them from different MASTER_SITES, not to increase speed
        but to allow each site to have as small a load as possible

        2) use chunk hashing (merkle hash tree/tornado erasure
        forward-error correcting code) so that we can detect partial
        corruption and; thus, partial recovery of files. This is
        essential for correct multi-part download. This would go
        inside distinfo

        3) use a colaborative download system, any peer downloading   
        should share their downloads with other peers. How this works?

        3.1) People connect to a server somewhere and declare their
        intent to download a file (e.g., modified Squall/modified
        BitTorrent tracker/other implementation). If the server   
        does not respond, proceed to (3.3)

        3.2) If there is someone connected with parts available for 
        the desired file, begin cooperative download

        3.3) Start download using multi-part download from MASTER_SITES

        3.4) Repeat (3.1, 3.2, 3.3) to gather more peers and help

	        spread the download burden until the download ends
    
        a) What this means? Worst case scenario, no one cooperates
        in downloads and multi-part download happens as usual. This
        will probably happen all the time given that it is unlikely
        that people will download same files at the same time. It
        is possible though with huge files given the global
        (geographical distribution, time zones) scope of the FreeBSD
        project.

        b) Best case scenario: 0 day releases. Whenever there is a
        huge project release, e.g., KDE, GNOME, OpenOffice, xorg. 
        People stay hours, maybe days downloading distfiles. In  
        these specific cases, cooperative download will SURELY and
        EFFECTIVELY off load the distribution servers, no to mention
        increase download speeds in some cases.

        4) Geographical distribution: should be optional using some
        sort of GeoIP system. People could try downloading from
        sites geographically closer first (their country). I am a  
        bit skeptical if this is a good heuristic. However, it is
        often requested so we should have an optional support for
        this

        5) Better load balance, given that we will contact a remote
        server for cooperative download, we COULD use that information
        to gauge load on MASTER_SITES and tell peers to try different
        servers. I am not sure about this heuristic as well

        6) Randomization: we should allow for optional randomization
        of MASTER_SITES. I am not sure about this heuristic as well

        I am not finished with the proposal yet. But you get the idea.
There are other items that could be added to the proposal. I am
open to suggestions. Please, do not share this proposal with others
just yet. Unless, you think they will contribute with good ideas.  

        What this all means? Distribution sites would not need to 
be changed so we have a shot at getting this used. Worst case, we
are as slow as we are today, maybe a bit more slow but we off load
the distribution sites. Best case, 0day will be awesome and we WILL 
off load the ditribution sites. :)

        BEST CASE, we get other projects such as
NetBSD/gentoo/debian/whatever_distro to use such an idea and get
more chances of it working.

   I am currently considering using a modified version of
either BitTorrent or Peer Distributed Transfer Protocol (PDTP, just
found about it, like 10 minutes ago). There are libraries already
written so the implementation should be easier, import under
src/contrib. We should use already stablished protocols instead of
rolling our own for transparency and ease of client
development/accuracy/testing.

        I am tending on BitTorrent as it seems that Peer Distributed
Transfer Protocol (PDTP) is not ready yet. I would like to have a
crude prototype for testing in a few months. It could be done in
less than a month but we all have paying jobs which are not related
to FreeBSD. :)

	I welcome any help to this project.


		If we could, we would, obviously, have to recheck the downloaded parts and try them again or skip double bad downloading sites as 
		
		wha
		
		
References

[1] tyranix - irc.freenode.net #bittorrent #p2p-hackers