From jeejoo at gmail.com Sun May 3 11:05:15 2009 From: jeejoo at gmail.com (Jee Joo) Date: Sun, 3 May 2009 14:05:15 -0400 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF Message-ID: I have saved years worth of webpages as Complete and HTML only, as I have recently learned about MAFF it is definitely a superior method. I don't have to worry about using rsync and robocopy to guard against losing the Creation Dates on them when needing to move my archive of pages to another disk and it neatly hides all the folders that get created when I save pages as Complete that may have images I want to preserve. My problem now is how can I efficiently convert these thousands of folder/html pairs that got created when saving as Complete as well as the HTML only html files? Is there any tools for automating this processes? And that ideally would pull in the create date on the main html file into the metadata? Thank you for any assistance. From paolo.01.prg at amadzone.org Mon May 4 09:59:12 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Mon, 04 May 2009 18:59:12 +0200 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: References: Message-ID: <49FF1EE0.2040002@amadzone.org> Jee Joo wrote: > I have saved years worth of webpages as Complete and HTML only, as I > have recently learned about MAFF it is definitely a superior method. > I don't have to worry about using rsync and robocopy to guard against > losing the Creation Dates on them when needing to move my archive of > pages to another disk and it neatly hides all the folders that get > created when I save pages as Complete that may have images I want to > preserve. > > My problem now is how can I efficiently convert these thousands of > folder/html pairs that got created when saving as Complete as well as > the HTML only html files? Is there any tools for automating this > processes? And that ideally would pull in the create date on the main > html file into the metadata? I don't know of any external tools that can convert already saved HTML files to MAFF automatically at present, even though writing one should not be very difficult, since MAFF archives are just standard ZIP files containing an additional XML text file for each page's metadata. However, since conversion of already saved pages may be quite useful, I think I'll integrate this ability into the MAF extension itself. A full implementation of batch conversion will require some time, but I can already implement a basic "resave" feature. This would allow you to archive a page you opened from your local file system, while preserving the meaningful metadata. A command may be available to perform this operation on multiple tabs at once. The original location would be determined by reading the special comments previously saved in the file, if present, otherwise the local file:// URL would be used. As for the original archive time, it could be extracted from the main file's "Last Modified Time" or "Creation Time". I suppose most browsers set the "Last Modified Time" to the date and time the page was downloaded. Is this the case for your files too, or in your case the "Creation Time" represents the download time and the "Last Modified Time" is the original one of the file on the server? When resaving of single pages is ready, the same procedure could then be automated to handle hundreds of files at once. Please let me know if this is what you need, or if you have any other suggestion. Regards, Paolo From jeejoo at gmail.com Mon May 4 20:35:46 2009 From: jeejoo at gmail.com (Jee Joo) Date: Mon, 4 May 2009 23:35:46 -0400 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: <49FF1EE0.2040002@amadzone.org> References: <49FF1EE0.2040002@amadzone.org> Message-ID: <422560B4-1984-4648-98C5-2F93DFC92AD6@gmail.com> On May 4, 2009, at 12:59 PM, Paolo Amadini wrote: > Jee Joo wrote: >> I have saved years worth of webpages as Complete and HTML only, as I >> have recently learned about MAFF it is definitely a superior method. >> I don't have to worry about using rsync and robocopy to guard against >> losing the Creation Dates on them when needing to move my archive of >> pages to another disk and it neatly hides all the folders that get >> created when I save pages as Complete that may have images I want to >> preserve. >> >> My problem now is how can I efficiently convert these thousands of >> folder/html pairs that got created when saving as Complete as well as >> the HTML only html files? Is there any tools for automating this >> processes? And that ideally would pull in the create date on the main >> html file into the metadata? > > I don't know of any external tools that can convert already saved HTML > files to MAFF automatically at present, even though writing one should > not be very difficult, since MAFF archives are just standard ZIP files > containing an additional XML text file for each page's metadata. > > However, since conversion of already saved pages may be quite useful, > I think I'll integrate this ability into the MAF extension itself. > > A full implementation of batch conversion will require some time, but > I can already implement a basic "resave" feature. This would allow you > to archive a page you opened from your local file system, while > preserving the meaningful metadata. A command may be available to > perform this operation on multiple tabs at once. > > The original location would be determined by reading the special > comments previously saved in the file, if present, otherwise the > local file:// URL would be used. As for the original archive time, > it could be extracted from the main file's "Last Modified Time" or > "Creation Time". > > I suppose most browsers set the "Last Modified Time" to the date and > time the page was downloaded. Is this the case for your files too, > or in your case the "Creation Time" represents the download time > and the "Last Modified Time" is the original one of the file on > the server? > > When resaving of single pages is ready, the same procedure could then > be automated to handle hundreds of files at once. > > Please let me know if this is what you need, or if you have any other > suggestion. > > Regards, > Paolo > Paolo, Thank you very much for understanding this need. It would be awesome if this feature was integrated into the MAFF addon itself, even if it will come together in a few stages as you explained. Most of my archive has been accrued over the years from a Windows workstation using Firefox to Save As.. and I have made the effort to ensure the Creation Timestamp has stayed valid as the actual date and time that I saved the page. Which was not always easy because on Windows when you copy or move around files both the Creation Timestamp and the Last Modified Timestamp get reset to the current time. Now that I have a Mac it seems to set both the Date Created Timestamp and the Date Modified Timestamp to the time the page was saved and preserves these on move and copy. Which is really nice. After doing a quick test to confirm, it looks like Windows sets both Creation Timestamp and the Modified Timestamp as well when saving a page. So considering that I have only been able to definitely preserve the actual original save date in the Creation Timestamp field with my large archive of saved pages I would prefer you use that field for original archive time metadata in the resulting MAFF. Thanks again, and I will gladly assist with testing or in any way to get it all the way to the full batch conversion stage. From paolo.01.prg at amadzone.org Wed May 6 04:03:57 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Wed, 06 May 2009 13:03:57 +0200 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: <422560B4-1984-4648-98C5-2F93DFC92AD6@gmail.com> References: <49FF1EE0.2040002@amadzone.org> <422560B4-1984-4648-98C5-2F93DFC92AD6@gmail.com> Message-ID: <4A016E9D.307@amadzone.org> Jee Joo wrote: > Most of my archive has been accrued over the years from a Windows > workstation using Firefox to Save As.. and I have made the effort to > ensure the Creation Timestamp has stayed valid as the actual date and > time that I saved the page. Which was not always easy because on > Windows when you copy or move around files both the Creation Timestamp > and the Last Modified Timestamp get reset to the current time. > > Now that I have a Mac it seems to set both the Date Created Timestamp > and the Date Modified Timestamp to the time the page was saved and > preserves these on move and copy. Which is really nice. > > After doing a quick test to confirm, it looks like Windows sets both > Creation Timestamp and the Modified Timestamp as well when saving a page. Thanks for testing this! It seems to confirm that the last modification time is generally appropriate for use in the archive's metadata, which is good news, since today I searched for a way to read the creation timestamp from MAF, but unfortunately it seems that this information is not available to extensions that try to access the filesystem from inside a Mozilla application. > So considering that I have only been able to definitely preserve the > actual original save date in the Creation Timestamp field with my large > archive of saved pages I would prefer you use that field for original > archive time metadata in the resulting MAFF. Since the creation timestamp is not available to MAF, my suggestion for a workaround is to use a tool that sets the last modified time equal to the creation time. After that's done, you could use the new feature of MAF to convert the pages. I'll begin working on the feature as soon as the next experimental version is released. > Thanks again, and I will gladly assist with testing or in any way to get > it all the way to the full batch conversion stage. Thanks to you in advance for your help! It's really appreciated. Regards, Paolo From paolo.01.prg at amadzone.org Wed May 6 09:41:25 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Wed, 06 May 2009 18:41:25 +0200 Subject: [Maf] [ANN] MAF 0.12.0 (experimental) released Message-ID: <4A01BDB5.3030800@amadzone.org> Hello, Mozilla Archive Format version 0.12.0 is now officially available for download at . This version is experimental (not included in automatic updates) and provides some optimizations and bugfixes over MAF 0.11.2. MAF 0.12.0 has the following user-visible improvements over MAF 0.11.2: * Improved internationalization and optimized metadata handling. * Bugfixes: o When creating MHTML archives, MAF sometimes failed to encode files larger than 10 KiB properly. o Attempting to overwrite an existing MHTML archive would not achieve the expected result. o The file selection dialog allowed MAFF archives to be saved with an MHTML file extension. o Under unusual circumstances, MAF could fail to open any MAFF or MHTML archive. The following issues are known to exist in MAF 0.12.0: * If the first entry of an archive is not a web page, but for instance an image, it cannot be displayed when opening the archive normally. * If an archive contains multiple pages, refreshing the tab displaying the first one may cause the other pages to be opened again in new tabs. * When saving a page in an archive fails, retrying the download from the Downloads window does not achieve the expected result. Regards, Paolo Amadini From grifwood at glinx.com Wed May 6 20:51:48 2009 From: grifwood at glinx.com (DrPhilGA) Date: Thu, 07 May 2009 00:51:48 -0300 Subject: [Maf] [ANN] MAF 0.12.0 (experimental) released In-Reply-To: References: Message-ID: I had a chance to test 0.12.0 on my usual four platforms with my usual five sites and so far have found not bugs. Phil From paolo.01.prg at amadzone.org Fri May 8 06:46:27 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Fri, 08 May 2009 15:46:27 +0200 Subject: [Maf] [ANN] MAF 0.12.0 (experimental) released In-Reply-To: References: Message-ID: <4A0437B3.2090403@amadzone.org> DrPhilGA ha scritto: > I had a chance to test 0.12.0 on my usual four platforms with my usual > five sites and so far have found not bugs. Thanks Phil, as always I appreciate your prompt feedback! Stay tuned for the new features coming in the 0.12 series! Paolo From paolo.01.prg at amadzone.org Sun May 10 12:48:31 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Sun, 10 May 2009 21:48:31 +0200 Subject: [Maf] [ANN] MAF 0.12.1 (experimental) released Message-ID: <4A072F8F.60204@amadzone.org> Hello, Mozilla Archive Format experimental version 0.12.1 is now officially available for download at . Changes from 0.12.0 to 0.12.1: * New: Pages that were saved previously as normal HTML files can be converted to a web archive format (MAFF or MHTML) without losing the information on the original save time and location. * New: The original save time and location are preserved when a page from an existing archive is resaved. * New: The multiple tab selection dialog now allows sorting by title or location. Regards, Paolo Amadini From paolo.01.prg at amadzone.org Sun May 10 12:53:20 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Sun, 10 May 2009 21:53:20 +0200 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: References: Message-ID: <4A0730B0.2080709@amadzone.org> Jee Joo wrote: > I have saved years worth of webpages as Complete and HTML only, as I > have recently learned about MAFF it is definitely a superior method. > I don't have to worry about using rsync and robocopy to guard against > losing the Creation Dates on them when needing to move my archive of > pages to another disk and it neatly hides all the folders that get > created when I save pages as Complete that may have images I want to > preserve. > > My problem now is how can I efficiently convert these thousands of > folder/html pairs that got created when saving as Complete as well as > the HTML only html files? Is there any tools for automating this > processes? And that ideally would pull in the create date on the main > html file into the metadata? Hello! I added the conversion feature for single pages to Mozilla Archive Format 0.12.1. The archive time is always determined from the file's last modification time, since unfortunately there is no way to access the creation time, as I explained before. The original URL is read from the special comments embedded by the browser in the page, if present; you can find the details in the documentation. Let me know if this first step works for you! Thanks, Paolo From jeejoo at gmail.com Sun May 10 18:59:27 2009 From: jeejoo at gmail.com (Jee Joo) Date: Sun, 10 May 2009 21:59:27 -0400 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: <4A0730B0.2080709@amadzone.org> References: <4A0730B0.2080709@amadzone.org> Message-ID: On May 10, 2009, at 3:53 PM, Paolo Amadini wrote: > Jee Joo wrote: >> I have saved years worth of webpages as Complete and HTML only, as I >> have recently learned about MAFF it is definitely a superior method. >> I don't have to worry about using rsync and robocopy to guard against >> losing the Creation Dates on them when needing to move my archive of >> pages to another disk and it neatly hides all the folders that get >> created when I save pages as Complete that may have images I want to >> preserve. >> >> My problem now is how can I efficiently convert these thousands of >> folder/html pairs that got created when saving as Complete as well as >> the HTML only html files? Is there any tools for automating this >> processes? And that ideally would pull in the create date on the main >> html file into the metadata? > > Hello! > > I added the conversion feature for single pages to Mozilla Archive > Format 0.12.1. The archive time is always determined from the file's > last modification time, since unfortunately there is no way to access > the creation time, as I explained before. The original URL is read > from > the special comments embedded by the browser in the page, if present; > you can find the details in the documentation. > > Let me know if this first step works for you! > > Thanks, > Paolo > Thank you Paolo! I will do as you suggest and find a utility to set all my older original saved pages last modification timestamp the same as the creation timestamp where I can. I am using Firefox Beta 3.5b4 and it says the current version of MAF is not compatible, so I will test it out as soon as I can downgrade my Firefox. I think the MAF version I am running now 0.11.2.0 worked in the beta version of Firefox before this last update? From paolo.01.prg at amadzone.org Mon May 11 06:39:33 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Mon, 11 May 2009 15:39:33 +0200 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: References: <4A0730B0.2080709@amadzone.org> Message-ID: <4A082A95.7050300@amadzone.org> Jee Joo wrote: > I am using Firefox Beta 3.5b4 and > it says the current version of MAF is not compatible, so I will test it > out as soon as I can downgrade my Firefox. I think the MAF version I am > running now 0.11.2.0 worked in the beta version of Firefox before this > last update? All the recent versions of Mozilla Archive Format I released on the MAF site declare they're compatible up to Firefox 3.1, but recently the latest Firefox version number was changed to 3.5, thus those MAF version cannot be installed on the latest beta anymore. However, you may override the compatibility check using the Nightly Tester Tools addon: https://addons.mozilla.org/firefox/addon/6543 I don't expect any problem with MAF on the latest beta; in case you find some, let me know. Thank you very much for testing! Paolo From paolo.01.prg at amadzone.org Thu May 14 09:39:02 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Thu, 14 May 2009 18:39:02 +0200 Subject: [Maf] [ANN] MAF 0.12.2 released Message-ID: <4A0C4926.3070903@amadzone.org> Hello, Mozilla Archive Format version 0.12.2 is now officially available for download at . This version introduces a significant speed improvement in MHTML archive handling, compared to the latest experimental version and all the previous stable versions of Mozilla Archive Format. MAF 0.12.2 has the following user-visible improvements over MAF 0.12.1: * Change: Quoted-Printable content encoding is now 200 times faster under normal conditions. * Change: Base64 content decoding is now 30 times faster under normal conditions. * Change: Now creating MHTML files is faster when saving pages containing many images or other content. * Change: The byte order mark is not removed anymore from UTF-8 data decoded from Quoted-Printable. Regards, Paolo Amadini From johnw at thewatermanchronicles.co.nz Thu May 14 12:11:58 2009 From: johnw at thewatermanchronicles.co.nz (John Waterman) Date: Fri, 15 May 2009 07:11:58 +1200 Subject: [Maf] [ANN] MAF 0.12.2 released Message-ID: <4A0C6CFE.4040201@thewatermanchronicles.co.nz> An HTML attachment was scrubbed... URL: From paolo.01.prg at amadzone.org Sun May 17 10:32:16 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Sun, 17 May 2009 19:32:16 +0200 Subject: [Maf] [ANN] MAF 0.12.3 released Message-ID: <4A104A20.7040101@amadzone.org> Hello, Mozilla Archive Format version 0.12.2 is now officially available for download at . MAF 0.12.3 has the following user-visible improvements over MAF 0.12.2: * Fix: Now the integrated Save Complete component correctly reports an error if the main page cannot be downloaded. Regards, Paolo Amadini From paolo.01.prg at amadzone.org Sun May 17 10:50:52 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Sun, 17 May 2009 19:50:52 +0200 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: References: <4A0730B0.2080709@amadzone.org> Message-ID: <4A104E7C.5050003@amadzone.org> Jee Joo wrote: > I am using Firefox Beta 3.5b4 and > it says the current version of MAF is not compatible, so I will test it > out as soon as I can downgrade my Firefox. I just wanted to tell you that MAF 0.12.2 and 0.12.3 can now be installed on Firefox 3.5 without the need for the Nightly Tester Tools extension, so you can try the new features without the need to downgrade your version of Firefox. Paolo From paolo.01.prg at amadzone.org Tue May 19 11:58:48 2009 From: paolo.01.prg at amadzone.org (Paolo Amadini) Date: Tue, 19 May 2009 20:58:48 +0200 Subject: [Maf] Problem with maf In-Reply-To: References: Message-ID: <4A130168.4070000@amadzone.org> El^Quia wrote: > Hi, atached a multiple page maff and a screen capture with error. > > Maff version: 0.12.3 (it happened with 0.12.2 also) > > Any Ideas? Hello, thank you for your report; the information you provided allowed me to understand the problem. Your original MAFF attachment has been filtered by the mailing list for the convenience of other users, since it was 2 MB in size, but I received it and was able to examine the web page that caused the problem according to the attached screenshot. The problem is caused by the fact that some of the content of the web page you saved is generated dynamically, using JavaScript. The reference to the missing file is generated programmatically when the page is displayed, and whenever the page is not opened from the original site, Firefox complains that it cannot find the referenced file. This is independent from the fact that the page is saved in a MAFF archive; saving normally would achieve the same result. Probably, on that particular page you could try one of these: * Save only the main content page without other elements (right click on the page and select "This Frame" -> "Save Frame As...", then select MAFF as the file type). * Use the "browser standard save system", that may cause some more dynamically linked content to be actually downloaded, even though I think the problem with the missing file would occur anyway, unless you disable JavaScript when you display the page offline. Unfortunately, since a programming language is involved, there is no general solution I can implement that can cover all the possible cases while preserving the dynamic features of the page. Some sites will look weird or won't work when viewed offline. Hope this helps, Paolo -------------- next part -------------- A non-text attachment was scrubbed... Name: Capture.JPG Type: image/jpeg Size: 33985 bytes Desc: not available URL: From jeejoo at gmail.com Tue May 19 18:36:06 2009 From: jeejoo at gmail.com (Jee Joo) Date: Tue, 19 May 2009 20:36:06 -0500 Subject: [Maf] Converting archive of "Save Page As.." html files to MAF In-Reply-To: <4A104E7C.5050003@amadzone.org> References: <4A0730B0.2080709@amadzone.org> <4A104E7C.5050003@amadzone.org> Message-ID: <798592FA-DD7D-4EA9-BE8F-E62E05735A2D@gmail.com> On May 17, 2009, at 12:50 PM, Paolo Amadini wrote: > Jee Joo wrote: >> I am using Firefox Beta 3.5b4 and >> it says the current version of MAF is not compatible, so I will >> test it >> out as soon as I can downgrade my Firefox. > > I just wanted to tell you that MAF 0.12.2 and 0.12.3 can now be > installed on Firefox 3.5 without the need for the Nightly Tester > Tools extension, so you can try the new features without the need > to downgrade your version of Firefox. > > Paolo > Awesome! Thank you Paolo. I have just finished moving to a new state so I have been a bit out of the loop and unable to test this very well as of yet. I hope to have some more time available in the coming weeks.