For some pages and specific reasons, i would like, if possible, to generate full static pages that doesn't relies on payload.js information. One of the reasons are that Payload.js seems like the be most convenient way ever built for web scrapers to steal all your website data without any effort from a single source.
Asked
Active
Viewed 327 times
0
-
Scraping pure HTML is probably as easy to steal. If you want to protect against bots, you'll need to implement a challenge for some CPU intensive task or any kind of trap for those. Lastly, on the web most of the stuff is public so you can't really hide it. Maybe obfuscating it may help somehow but it will plummet the performance/accessibility/etc of your whole website. – kissu Dec 09 '21 at 07:51
-
1Also, I don't think that Nuxt can deliver HTML only files as of today, probably soon but not yet. – kissu Dec 09 '21 at 07:52
-
1Not true that scrapping pure html is as easy as a restufull JSON. Actually html scrappers are very fragile and exposed to any layout change in the code. Restful JSON is always the same. Now, nuxt take one step further and makes ALL the API data exposed at once in a single file (payload.js) this is scrapper's wet dream – Kos-Mos Dec 09 '21 at 16:09
-
You can scrap with less specific selectors or parse it with some regex to get a cleaner result, but it's still totally doable. Also, JSON can have it structure changed. Those files are probably created for performance reasons. So it may be unfortunate that it's easier to scrap. Meanwhile, as always: the work needed to hide something on Web is also heavier than just accessing it. If you want a simple way to hide it, add a lay of auth to hide your thing. Or dive deep into complex ways to annoy people who use bots. – kissu Dec 09 '21 at 16:15