Received: with ECARTIS (v1.0.0; list gopher); Tue, 29 Nov 2005 19:23:25 -0600 (CST) Received: from [156.26.5.151] (helo=fritz.complete.org ident=Debian-exim) by glockenspiel.complete.org with esmtps (with TLS-1.0:RSA_AES_256_CBC_SHA:32) (TLS peer CN fritz.complete.org, certificate verified) (Exim 4.50) id 1EhGgs-0003wr-PC; Tue, 29 Nov 2005 19:23:24 -0600 Received: from jgoerzen by fritz.complete.org with local (Exim 4.52) id 1EhEla-0005F5-Vg; Tue, 29 Nov 2005 17:20:06 -0600 Date: Tue, 29 Nov 2005 17:20:06 -0600 From: John Goerzen To: gopher@complete.org Subject: [gopher] Re: Bot update Message-ID: <20051129232006.GP19727@complete.org> References: <20051031034851.GA30223@katherina.lan.complete.org> Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.10i X-Spam-Status: No (score 0.0): none X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Tue, 29 Nov 2005 19:23:24 -0600 Content-Transfer-Encoding: 8bit X-archive-position: 1161 X-ecartis-version: Ecartis v1.0.0 Sender: gopher-bounce@complete.org Errors-to: gopher-bounce@complete.org X-original-sender: jgoerzen@complete.org Precedence: bulk Reply-to: gopher@complete.org List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: Gopher X-List-ID: Gopher List-subscribe: List-owner: List-post: List-archive: X-list: gopher On Wed, Nov 16, 2005 at 10:04:17PM -0600, Jeff wrote: > On Sun, 30 Oct 2005 21:48:51 -0600, John Goerzen > wrote: > > > Here's an update on the gopher bot: > > > > There is currently 28G of data archived representing 386,315 > > documents. 1.3 million documents remain to be visited, from > > approximately 20 very large Gopher servers. I believe, then, that the > > majority of gopher servers have been cached by this point. 3,987 > > different servers are presently represented in the archive. > > Any news? Not really. The bot hit a point where its algorithm for storing page information was getting to be too slow, and there was also a problem with the database layer I'm using segfaulting. When I get some time, I will write a new layer. In the meantime, I'd like to talk about how to get this data to others that might be willing to host it, as well as how to store it out there for the public. Any ideas?