Webarchive is the format used by the Safari browser to package the HTML, CSS, JavaScript and image resources of a web page when saved as a complete offline archive.
Questions tagged [webarchive]
71 questions
1
vote
1 answer
Display WebArchive from Mail.app and Notes.app
Cocoa's WebView can display .webarchive files. The ones I try to display come from the pasteboard, e.g. when copying parts of a web page in Safari or Mail.app.
The issue I am having is that webarchives from Mail and Notes won't display in a WebView,…

Thomas Tempelmann
- 11,045
- 8
- 74
- 149
1
vote
1 answer
Archived web content without going to the website
I want to fetch web data without going to the actual website.
http://archive.org/web/web.php is an example which keeps the snapshots of websites. Problem with this is that data is quite old (5-6 months).
Do we have any other archive storage where…

instanceOfObject
- 2,936
- 5
- 49
- 85
0
votes
1 answer
Programmatically Edit a .webarchive File
I'm building an AIR app using Actionscript and I want to programmatically insert a piece of text into a .webarchive file. The problem is that every time I insert the text, the file somehow gets corrupted. The code I'm using looks like this:
var…

dshipper
- 3,489
- 4
- 30
- 41
0
votes
1 answer
How to save a .webarchive in UIWebView?
I am following this tutorial and was wondering how I can save the current webpage as a .webarchive with the code I am using from the tutorial?
Help is very much appreciated - thanks!

pixelbitlabs
- 1,934
- 6
- 36
- 65
0
votes
0 answers
Archive.org web scraper bot is causing high server loads - how to slow it down?
Archive .org (waybackmachine) is crawling several sites on my Apache server and crashing them due to very high burst traffic.
I don't want to block their crawler, but I want to rate limit them. I have contacted their support and their answer was…

SolaceBeforeDawn
- 958
- 1
- 8
- 16
0
votes
0 answers
configure firefox/brave to store webpages in safari webarchive format?
I want to configure firefox/brave to store webpages in safari webarchive format.
There used to be a plugin for firefox. But it is not available anymore.
Is there a way to do so?

user1424739
- 11,937
- 17
- 63
- 152
0
votes
0 answers
Cookie expire quickly during Crawling website using Heritrix
I'm trying to crawl a wordpress website with Heritrix, and I have provided cookies to automatically login to the website and crawl, it works fine but after crossing 20MB (approx. 10 minutes) of downloaded data or so, the website logs out and the…
0
votes
0 answers
How do I bulk download a list of URL's earliest archived snapshots from webarchives?
I have an xls list of URLs that I would want to bulk download into a folder, but some of the URLs are so outdated that I need to retrieve its snapshots from webarchives.
Is there any way to do it in bulk using programming languages?
For now I know…

l yeo
- 1
0
votes
0 answers
Does QLThumbnailImageCreate still support webarchive
In the past QLThumbnailImageCreate would return an NSImage from a url pointing to a webarchive. It's not doing so now. Is this a bug, or do I not understand something?

Carl Carlson
- 500
- 4
- 17
0
votes
0 answers
How to remove every "/", except the first one on nginx
FYI: I am trying to replicate the web archive.
Right now, all the urls I am crawling are being send to the path "D:\website\dateoftoday". My code will remove every "/" from the urls, because you can't save a file with a slash in it. I've created a…
0
votes
2 answers
Converting warc.gz to .warc
My attempt to extract a warc.gz file, using gzip, resulted in a WARC, but it won't load in http://replayweb.page.
Extracting it using The Unarchiver gave me all the expanded html and other files.
What is the latest recommended method for converting…

Jack P
- 1
- 1
0
votes
0 answers
Java library to convert "X-Document-Type: Workbook" to Excel
We have some legacy data in .xls (HSSF) format that we are converting to .xlsx (XSSF) format using Apache POI library. It was all working very well till we started seeing many org.apache.poi.poifs.filesystem.NotOLE2FileException. Upon closer…

Sandeep
- 1,245
- 1
- 13
- 33
0
votes
0 answers
How do i prove that a http resource existed in a server at a specific time ? (proof of existence / POE)
I'm working on a web archiving technology that simply saves warc and mhtml format of a web page. Protected/private contents that need authentication are archived on the client-side which is susceptible to tampering which makes them unusable for…

knobiDev
- 462
- 1
- 5
- 17
0
votes
0 answers
How do I fix timezone import error on Bash for webscraper?
I'm trying to use the wayback-machine-scraper, a command-line utility, to pull data from archived sites. The scraper needs to be run in Bash, but requires timezone, which I can only find for Python 3.X. If I switch over to python, then I get a…

Andrew
- 1
- 1
0
votes
2 answers
Modify a .webarchive from within cocoa and write out again
I have access to a .webarchive file. I have so far managed to create a webarchive (using PyObjC) from the file. I wish to modify some elements in the DOM tree and write the modified data out.
I guess I need access to some root DOM tree (the…

Jaycee
- 1