WordPress Recovery With Google Cache and Bash

ByJohn

WordPress Recovery With Google Cache and Bash

Woops where’s the WordPress? wp accidental deletion? here’s a guide to Geting Word Press content back from Google Page Cache.

this wp dB recovery has worked but published version is a work in progress. it’s up in the hopes of helps save someone wp recovery time and or heartake.

I’ve yet to try to reimport anything saved with my WordPress content recovery script. I did manage to pull down quite a few of the cached pages before it started timing out. I’d recommend uping the 20s wait(on the last step)

I had looked for a ready made solution but the python bases solutions failed and some before the run the script part. later I will link to a tutorial I think will make reimporting much easier.

Normally I would pay for a third-party WordPress backup or Normally I would have a backup on the raid 6 array that was planned and part on my floor part held hostage but I won’t go into it… Psychotic rage? Maybe but not the it of the story. Comatose from hunger probably.. I digress.

maybe some of it was an error on my part but it’s also pretty well-known that Google plays around adding and removing features changing terms on many of their services quite frequently. Also it would probably cut into their cloud storage if it were simple to use the catch as a backup. So I doubt it’s the fault of the people who tried before me. The solution to get all of the needed urls appeared to be a Google Sheets feature that has long since been retired. Eventually I reasoned that because I had searched Council registered I had a list of URLs anyway I just need it to prepend the URL URL / syntax in front

IIRC what you are about to see is something I used then started modifying and as im virtualy abducted with police turning a blind eye…. as im currently hungry once again as a result I never finished.

what i recall of what I did was, I grabbed the url list manually via google search console. A lot of the methods I found attempted to pull it in some automated method and failed from the start because of this.

I wrote another shell script to prepend what was needed to a flat file with the urls I’ll post that shortly but here’s the gist of it once you get it.

here is what i came up with on my own

Prerequisites :

  • windows mac or desktop *nix to hit google search console
  • notepadd++(if windows)
  • bash shell
  • curl
  • sed
  • search console having been set up before you went oops.

I might make this more new player friendly as time goes on. at the moment it assumes you’ve used command line linux (text-based shell of some flavor) before. if not, screw your site buy a raspbery pi and dive in with the lite pi os version without GUI/Desktop. 😀

kidding…kinda… Linux via text is amazing when you realize how easy it is to do things like install software manipulate text, update… automate automation of automation or just repeat a command with up arrow enter. I digress.

The Process:

Head to Google Search Console

in this step we create a flat /txt file with one url per line.

hopefully, you have search console set up already. if you had the G site kit plugin installed and activated there is a good chance you did whether or not you played with it since that date.

https://search.google.com/

theres probably more than one way to get this list from G Search Console but this uses the new interface (not the legacy stuff) so hopefully it stays for a while.

I will refer to the example file bellow as siteurls.txt note if you are on a Windows PC depending on how you output the file you might want to make sure that line endings are set to units before copying and pasting or however you do your upload. Notepad++ is ideal for this

https://www.vengeancetech.biz/techblog/comps-servers-net/lga-2011-server-build/
https://www.vengeancetech.biz/fabrication/linear-actuator-wired/
https://www.vengeancetech.biz/electronics-formulas/
https://www.vengeancetech.biz/working-with-plexiglas/
Home
https://www.vengeancetech.biz/contact/ https://www.vengeancetech.biz/techblog/general/steps-to-a-form-a-llc-in-mn/ https://www.vengeancetech.biz/techblog/networking-posts/routed-static-block-with-pppoe-opnsense/ https://www.vengeancetech.biz/vtblogs/what-i-wish-i-was-able-to-cook/ https://www.vengeancetech.biz/f-page/its-easy-from-your-430k-house-with-all-comforts-2020-04-25/ https://www.vengeancetech.biz/vtblogs/mold-advice-quickly/ https://www.vengeancetech.biz/vtblogs/2020-24-04/ https://www.vengeancetech.biz/vtblogs/equality-slavery-leapers-oh-my/ https://www.vengeancetech.biz/techblog/comps-servers-net/titanium-125-flux-core-wire-feed-fried-ford-key-transponderrfid/ https://www.vengeancetech.biz/vtblogs/page/7/ https://www.vengeancetech.biz/wishlist/cetus-mk-3-extended/ https://www.vengeancetech.biz/wishlist/ https://www.vengeancetech.biz/vtblogs/andrew/custom-cabinets-a-hidden-compartment/ https://www.vengeancetech.biz/vtblogs/covid-19-and-full-face-respirator-in-walmart/

you should end up with something like this, IIRC i pasted it into google sheets to get it to just the urls. but if you are reading this post you know … those urls are likely half the battle. to make the automated google cache wp recovery work we need to prepend the proper url.

Perpend

we need this string:

https://webcache.googleusercontent.com/search?q=cache:

infront of and butted up to every entry in the url flat file we made above. alternatively on the actual crawl script that trys to save the cache you could prepend it there, i wrote this as quick as possible and not in the best mind set.

If you have any background in coding you can probably take it from here. I started to combine two separate scripts and then life pulled me away it might be a day or two before i run whats bellow to see if im providing an easy button situation. if you are reading this its still rough to not reconstructed how it should be.

the idea is get the list you pulled from search console into one url per line in a flat/txt file. dont forget unix returns /use notepad ++ if you are between windows and linux.

the below example opens a flat file and (at some point) will insert the proper url string to access google cache + your page hopefully cached and output another flat file.

edit: this one probably works copy and paste. i found another .sh file

#!/bin/bash
GOOG="https://webcache.googleusercontent.com/search?q=cache:"

input="/home/cuser/DasShit/siteurls.txt"

while IFS= read -r line; do
echo "$GOOG$line" >> /home/cuser/DasShit/gcurls.txt
done < "$input"

Heres what it should look like if done correctly.

let’s call th

Creating another flat file wasn’t strictly necessary but I knew I had a lot of pages and i was writing this as quick as i could. it just seemed easiest.

The Fetch Script

now we want to take the ready to go personalized google cache page url strings and crawl them.

the only real complication becomes file naming. biggest part of that is you need to get the forward slashes out. length /legibility may be an issue depending on how your wordpress was configured. I stripped quite a bit off mine with the linux sed program.

tas with the above… ive gota fix this/its not copy and paste at the moment.

#!/bin/bash
OUTDIR="/home/cuser/DasShit/pages"
#FULLGCURLFILE="/home/cuser/DasShit/gcurls.txt"
FILNOM="nope"
FULLGCURLFILE="/home/cuser/DasShit/gcurls.txt"
#function splitME(){
#local FULLNOM="$1"
#FILNOM=$(echo "$FULLNOM" | 
#}

while IFS= read -r line; do
FILNOM=$(echo "$line" | sed 's/^.\{84\}//' | sed 's/\//--/g')
echo "----------"
echo "$FILNOM"
echo "$line"
echo "----------"
curl -A ="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6)" "$line&strip=0&vwsrc=0" --output "$OUTDIR/$FILNOM"
sleep 20
done < "$FULLGCURLFILE"

yeah, creative dir names what can i say lol. at some point durring ther download it started returning a time out/google caught on. I would up the sleep time considerably. While it will take considerably longer, the ability to run it as a set and come back later or do something else is still way better than most alternatives.

Had i really been thinking or properly nourished… I have a /29 block. I could have set up some trickery and had open sense rotate the global ip the requests were made from. This might not be realistic for everyone but probably worth mentioning.

What Now?

ill add that soon

My F#!k Up

I’ve probably been using linux shells since 03-04ish and it was about time I messed up. My oops was with the rm command and I shouldnt have been logged in as root but i probably would of had to sudo and it would have worked anyway. todo list is long and was longer then.

Still its Almost like i owned a backup array but a certain police force is hell bent on making me human property and i have no rights to mine.

I was trying to get rid of a system link and stupidly rm -rf and tapped the forward slash before my finger depressed enter.

Friendly Reminder, a trailing slash appended to /path/to/systemlinktodelete …. deletes the directotry

woops there went the DB.

About the author

John administrator

    Leave a Reply

    %d bloggers like this: