From h0qd6e102 at sneakemail.com Fri Dec 2 10:01:44 2005 From: h0qd6e102 at sneakemail.com (Jon B) Date: Fri Dec 2 23:32:23 2005 Subject: [Slogger] Re: Sigh In-Reply-To: <438FB66E.90300@18-30chat.net> References: <3690-67168@sneakemail.com> <438FB66E.90300@18-30chat.net> Message-ID: <30557-22444@sneakemail.com> > >be done to improve this? I wonder if it is the log file or the data > >files that is slowing it down. > > > That is odd, I do not experiance this due to slogger, but my session > saver sure slows it down. > 25+ pages for a average load, twice or better when doing work It definitely got a lot faster at the beginning of the month (my Slogger is configured to create a new directory every month. I'm sure it varies by slogger configuration and the amount of time spent on the web) And it speeds up if I disable slogger. Keep in mind I have it set to log the HTML of *every* page I visit, and to keep a separate HTML log file with a list of each page. The log file was something like 6 MB at the end of the month, which could be the problem by itself. (Does it have to open the entire file every time it appends a page?) I'm going to change it to creating a new directory every day or week instead, for now. > I use NTFS compression and no issues with any software so far aside > Virtual PC's virtual HD > files. The Slogger data files are very redundant, since it saves the same exact pages with only slight differences over and over. When I tried a zip file it compressed to 20% of the original size or so, but NTFS compression only 80% or so. I'm sure it slows down the computer to save in compressed form, too, as it has to calculate the compressed data first. I think a script that regularly compresses the log files while allowing them to be searched would be better. > >4. It keeps giving me this interrupting modal window error "The link > >could not be saved. The web page might have been removed or had its > > > I get this too. Ive found nothing to solve it cept to block logging of > the "offending" sites. Yeah. In Firefox 1.5, they've apparently replaced such modal error dialogs with error pages, which will be greatly appreciated by me if true. (I'm waiting around til I know my favorite extensions will work after upgrade.) > >5. Also, I would like to be able to filter more powerfully. For > > > I wish this also. Do you get a odd repeating of URL's in the filter > list? I have to some times manualy > edit the list and remove the duplicates. Yes! I wonder why that is. > What I would really like to use > is a logging proxy with a directory > structure that mimmics the websites (with a few options). Ive got one > partly worked on but gave up on > when i found slogger. Hmmm... From lists at thimk.org Sun Dec 4 08:53:57 2005 From: lists at thimk.org (Tom Hoover) Date: Sun Dec 4 17:27:42 2005 Subject: [Slogger] Sigh In-Reply-To: References: <3690-67168@sneakemail.com> Message-ID: <20051204145357.GM25364@hisword.net> On Tue, Nov 29, 2005 at 04:12:54PM +0000, Rory McCann wrote: > On Mon, 28 Nov 2005, Jon B wrote: > > 2. It saves a lot of redundant data. I wish there were a way to > > compress the files that it saves. I might try using NTFS's file > > compression, but I fear that would just make it even slower. > > I've noticed this too. I'm going to write a bash shell script that will > compare new files to all existing files in the data/ directory and delete > it if it's a duplicate. It'll then gzip all files. I will have to write a > plugin for MacOSX's Spotlight so that it can search inside .gz files, I > haven't found an existing one. This would mean I would have the advantages > of searching through all the pages I've visited and minimizing the needed > space. Well, you prompted me to go ahead and do what I've been meaning to do for some time, since my slogger directory has grown to 22G over the past year. The following one line bash script requires the use of fdupes, found in most Linux distributions (or you can google for it): fdupes -r . | grep -v " " | gawk 'BEGIN {RS=""} { for(i=1;(i+1)<=NF;i++) print "ln -f "$i " " $(i+1) }' > runme explanation: fdupes # uses fdupes to find all duplicate files grep # remove any filenames that include spaces (I was too lazy to do any fancy parsing) gawk # gawk builds another bash script that links the duplicate files runme # chmod +x on this file, and then execute it (or, just "bash ./runme") I don't delete the files, but hard link them. The disk space goes way down, but all the links will continue to work. No guarantees, but it works for me. Decreased the size of my slogger directory from 22G to 6.4G. I setup a cronjob to run this anytime the partition containing my slogger directory gets more than 90% full, so it should be virtually maintenance free. From kschutte at MIT.EDU Mon Dec 5 08:37:06 2005 From: kschutte at MIT.EDU (Ken Schutte) Date: Mon Dec 5 08:38:06 2005 Subject: [Slogger] Re: Sigh In-Reply-To: <30557-22444@sneakemail.com> References: <3690-67168@sneakemail.com> <438FB66E.90300@18-30chat.net> <30557-22444@sneakemail.com> Message-ID: <43944282.3040702@mit.edu> Jon B wrote: >>>be done to improve this? I wonder if it is the log file or the data >>>files that is slowing it down. >>> >> >>That is odd, I do not experiance this due to slogger, but my session >>saver sure slows it down. >>25+ pages for a average load, twice or better when doing work > > > It definitely got a lot faster at the beginning of the month (my > Slogger is configured to create a new directory every month. I'm sure > it varies by slogger configuration and the amount of time spent on the > web) And it speeds up if I disable slogger. Keep in mind I have it > set to log the HTML of *every* page I visit, and to keep a separate > HTML log file with a list of each page. The log file was something > like 6 MB at the end of the month, which could be the problem by > itself. (Does it have to open the entire file every time it appends a > page?) I'm going to change it to creating a new directory every day > or week instead, for now. Ahh that's got to be it - it does read in the entire log file each time and writes a new one. Of course it would be much faster to just append to the log each time, but I changed it not to do this to allow some amount of "footer" in the log file, so it could be valid html,xml,etc. It reads in the whole file and writes out everything but the last N byes (where N is length of footer), then writes the new stuff, then the footer. Maybe there's a way to append to a file while removing the last N bytes, or at least have it just append when there is no footer... I have it start a new log file each day, so never really noticed it. Ken From h0qd6e102 at sneakemail.com Mon Dec 5 09:48:37 2005 From: h0qd6e102 at sneakemail.com (Jon B) Date: Mon Dec 5 09:50:38 2005 Subject: [Slogger] Re: Sigh In-Reply-To: <43944282.3040702@mit.edu> References: <3690-67168@sneakemail.com> <438FB66E.90300@18-30chat.net> <30557-22444@sneakemail.com> <43944282.3040702@mit.edu> Message-ID: <8808-21359@sneakemail.com> > Maybe there's a way to append to a file while removing the last > N bytes, or at least have it just append when there is no footer... Hmm... > I > have it start a new log file each day, so never really noticed it. Yeah, I just did that the other day and it sped up noticeably. Then I upgraded to Firefox 1.5. Oops. :-) From mozdev.org at access-research.org Mon Dec 5 16:08:56 2005 From: mozdev.org at access-research.org (Greg Lowney) Date: Mon Dec 5 19:09:33 2005 Subject: [Slogger] Slogger log files don't correct invalid characters in filenames Message-ID: <002e01c5f9f9$4048e860$6800a8c0@lucky13> Hi! Thanks for producing Slogger, which has been immensely helpful to me. I was wondering about one problem I'm encountering running Slogger 1.5.11 on Firefox 1.0.7 on Windows XP SP1: Most of the Local links in the Slogger Log files don't work, because they use the filename before illegal characters were converted to underscores and single-quotes. For example, Slogger saved a Web page whose Title includes double-quotes, which are illegal in Windows filenames. Somewhere along the line, probably by Firefox, the double-quotes are changed to single-quotes before the file is saved to disk with the following name: 2005-11-29_21.34.43 seattletimes.nwsource.com The Seattle Times Timmy of 'South Park' challenges viewers' attitudes about people with disabilities.html However, clicking the Local link associated with this file in the Slogger Log page gives the following error: "The file /D:/websites/Slogger/data/2005-11-29/2005-11-29_21.34.43 seattletimes.nwsource.com The Seattle Times: Timmy of "South Park" challenges viewers' attitudes about people with disabilities.html cannot be found. Please check the location and try again." As you can see, Slogger's log file still lists the filename with double-quotes. Is there any easy way to fix or work around this? If Firefox won't tell you the actual file name it uses, you could probably perform the simple translation yourself, either automatically or as an option. Thanks, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mozdev.org/pipermail/slogger/attachments/20051205/7bfd5b0f/attachment.htm From slogger at graves.com Tue Dec 6 20:54:01 2005 From: slogger at graves.com (David Graves) Date: Tue Dec 6 21:03:02 2005 Subject: [Slogger] latest slogger for firefox 1.5 Message-ID: <00a101c5fad1$19051700$67fea8c0@graves.com> Slogger: LOVE IT! The latest upgrade broke mine however :-( I upgraded to ff1.5, and then upgraded slogger. Environment: winme. It continues to create the xml listings, and will even create the directories, but does not save pages to those directories when I attempt to save the complete web page. Just an FYI / call for help. -dave