If you're new here, you may want to subscribe to my RSS feed 
Late last April I read an article on Cnet about “Google’s pointers on countering web spam” and I felt there were some issues that were correct and some that were… off. I mean I’m down with Matt Cutts and how he’s like a rockstar nowadays but I feel that all this “the Google way is the best way” is to much fanboyism. I mean, that’s how Google got big, by questioning stuff and doing one better.
So, on to the article, the first part was about spammers. Matt Cutts said: “Spammers are human. You have the power to raise their blood pressure. Make them spend more time and effort…If spammer gets frustrated, he’s more likely to look for someone easier.”. And he’s almost right, I mean, hackers are people too and if you piss them off… no, wait, you don’t want to piss hackers off. I mean, a decent enough hacker will be able to break anything if motivated enough, and in a way, it’s like burglars do: if you have a shiny metal door with “burglarproof” written on it, it’s a downright invitation, whereas if you keep cool, you’ll get some spam, but nothing big. It depends on whether you want to annoy the little guys or attract the attention of the big guys. Think this through before you complicate things.
“Use captcha systems to make sure real people, not bots, are commenting on your site. He uses a simple math puzzle–what’s 2 + 2?–but he also likes KittenAuth, which makes people identify kitten photos.”
This is perhaps one of the things that annoys me most about the internet: captchas. Small, awkward texts, illegible to machines (i doubt anyone actually uses machines to read captchas) and pretty much illegible to people. They’re annoying and the really are pointless. They put captchas once on stumbleupon, I almost stopped submitting stuff, that’s how much they annoyed me. As for their reputation of being unbreakable… Slim chance. There was an actual case where a porn site asked people to enter captchas to view some pics and the robot could thus bypass any captcha out there. So captchas are very easily breakable. Not to speak of math captchas (i think i can rustle up a script to detect the + sign or the key words and then do the math), or that kittenauth, which although fun (and highly unaesthetical) can be broken with a script that matches the photos to google searches. All in all captchas are nothing more than an imperfect way to stop spambots. A much better way would be to actually challenge people in something fun, like a flash game or puzzle. Or heck, just trust people and moderate yourself later, you need people to join and comment. And akismet and bad behaviour do a neat job of keeping spam comments out… Why complicate things?
“Reconfigure software settings after you’ve installed it. A little modification of various settings will throw bots off the scent. “If you can off the beaten path, away from default software installations, you’ll save yourself a ton of grief,” he said.”
This is a blur of advice but what this means is the simple stuff, like changing the default username for wordpress (admin) to some strange intergalactic hail, like @>FE{@342dasA. Guess that, spammer. Also, some changes as to permitted post behaviours give you a buffer against spam. That means make all commenters need to be trusted (have at least one approved comment) and you’ll escape a lot of spam…
“Employ systems that rank people by trust and reputation. For example, eBay shows how long a person has been a member and how satisfied others are with transactions with that person.”
This is a way of tracking how much a person has contributed and thus how likely he is of compromising your blog. For example, someone who comments a lot is less likely to spam and lose his status than someone who’s new. Reward people for being there. Wordpress usually does this via number of comments, but feel free to expand on that.
“Don’t be afraid of legitimate purveyors of search-engine optimization services. “SEO is not spam. Google does not hate SEO,” Cutts said. “There are plenty of white-hat SEO (companies) who can help you out.”
I’m not really sure what this has to do with spam, but it seems in some places people view SEO as a bad thing, like they’re cheating Google, which is really not true. I know hackers who do white-hat SEO (the good kind) because they feel spamming and all that should not be used.
Anyway, enjoy this lightweight article, I found it refreshing.
Don’t forget to
subscribe to the feed
or who knows what you’ll miss out on. You can also subscribe by email.
If you're new here, you may want to subscribe to my RSS feed 
I was searching Google the other day for something and I had a hard time locating it. However, I saw a domain name which really looked promising so I wanted to get there. However, Google shipped me to some strange page with no links to home which really sucked… Well, for about 5 seconds until I used uppity (a firefox addon which allows you to go higher, like the windows explorer up button, pretty useful thing).
But that got me thinking, why doesn’t Google give us the ability to go straight to the homepage? I mean it would only take a small link near the URL to take us to the domain. And there’s a lot of cases where people remember an article so they search for that but what they really want is the main page. I know it’s a small thing to think about but I would find that useful.
Wouldn’t you? Lemme know
Now to keep this short, it’s the weekend
Just a thought I had so I thought I’d share it with you. Then again, we could ask Gleb to add it in the next SeoQuake plugin.
Don’t forget to
subscribe to the feed
or who knows what you’ll miss out on. You can also subscribe by email.
If you're new here, you may want to subscribe to my RSS feed 
Uuuh, I smell a new shift in the Force. It seems Google keeps changing stuff…
Or it could just be another bug, like the position six bug that happened earlier this year (where established sites fell to position 6) which was then fixed. Truth of the matter is: some websites are being penalized for stuff they didn’t do, the sandbox effect is being enforced more (sites show up slower in the index if they’re new), Google’s cache doesn’t seem to get updated anymore, and crawl statistics seem to stop fluctuating in the index. And this seems like the system is pausing in order to change. Like the matrix is rebooting
As always senor Matt Cutts, aka Mr. Google to us simpletons (he deals with much of the SEO part of Google, at least from a PR point of view) has denied anything would be changing and said he’d be looking into the deal but I feel something fishy.
Thing is, I’ve experienced this stuff myself. My homepage cache hasn’t been updated since March 30th, which is odd, it usually gets updated every week at most, but otherwise everything’s normal. Cache is doing OK too…
Well, who knows what’s really happening, anyway, just keep watching the thread and see what happens…
Don’t forget to
subscribe to the feed
or who knows what you’ll miss out on. You can also subscribe by email.
If you're new here, you may want to subscribe to my RSS feed 
I’ve seen many people asking about this, and I’ve seen even more people generally mystified by the way Google works. So most people don’t understand how and when Google crawls and are generally thinking it’s a secret.
It’s not really that big a secret, but it is a bit of a thing to predict when Google does come. Of course if you deal with this stuff as often as I do, you start to become used to the schedules. What is a bit confusing though is how this crawl schedule changes like hell depending on a million factors Google finds important.
We’ll start off with a bit of information from the Google Webmaster Center. As always they give us the follow the guidelines and it depends on many things crap, but a few factors come out as obvious in the process:
- PageRank
- links to a page
- crawling constraints (such as the number of parameters in a URL)
Ok, so we know what helps us. PR is the most important, then links, then the ability of your site to be crawled (that number of parameters refers to the fact that Google doesn’t like many php parameters - use mod_rewrite). So we have a starting point. But as always Google is cryptic and doesn’t really help… So we move on.
As early as 2002 people were asking about the Google crawl schedule, and some were guessing at it. However, results were strange and back them high PR sites were a lot more. However, many have seen Google full crawls at around 1st June, while another had it in May and still moving on in June. An interesting piece of info was that for large sites Googlebot came in at about every three minutes indexing about 2-10 pages a second, which I feel was a bit of a slurp but was made to keep a bit of the strain off the webserver. Their discussion goes offtopic then on, but for the purists, go read…
Our next source is a for dummies book excerpt, in which we get a bunch of terms related to the crawl. In doing research for this I was really surprised to see there’s very little info to be found. Then again, it’s not such a hot topic for SEO, but is somewhat important. They say the deep crawl occurs about every month and that fresh crawls occur randomly. Also, they consider the index as static between deep crawls, in a form called everflux in the strange update given by fresh crawls. My opinion later
There’s not much else on the web, except a mention of the Google Dance. I find all these names so amusing, since they don’t really explain the phenomenon and there’s no dancing involved. I guess they got bored of using crawl in everything. It’s basically the deep crawl, and we get the info that it usually begins at the end of the month, lasting 3-5 days, and usually updates PR. Also, for the people out there who know how to monitor server logs, deep crawl uses an IP range of 216.239.46.x whereas fresh crawl uses the 64.68.82.x range. Also at that link above you can find a so called Google Dance Tool, which could be useful to see what pages Google finds important and crawls, but you could just use webmaster tools for that.
Now for my take on the whole thing. I feel that there’s not two, but three kinds of crawls. Firstly, there’s an almost immediate crawl, from pings and links and basically whichever spider Google uses for Google alerts. That happens at once, and crawls the title and the post, but does not index it. It only notices it’s there. Then, in a few days to a week, the post becomes indexed completely, and starts showing up in Google results (on a quite high position at first, then gradually lower if no further activity on that post is detected, or no search activity for that keyword is detected). The next kind of crawl is a longer-term crawl, which usually includes the homepage, and is done every week, or two weeks, or even a month for less active sites. This updates the cache on your active pages, but doesn’t touch the others. And the last kind of crawl happens about three or four times a year, and reindexes everything. This usually happens in February or March, June, November, or in some cases any other month. Google tends to vary this stuff, presumably due to factors on and off the site. So be prepared for a couple of crawls this year in June (beginning) and mid-November or so, and see if it happens as I’ve predicted.
One more thing, an important factor to crawling is the kind of servers you are hosted on. Use GoDaddy or any other established host rather than hosting on your old machine, so Google can download the data properly. The crawl intensity depends a lot on that. Also, Google does not have the same schedule as Yahoo for example. Yahoo just performed a deep crawl for my site a few days ago, whereas Google didn’t. So if you’re interested, here’s a pretty graph to oogle at - not much data yet, but still representative:

Green is Yahoo, blue Google, and that other thing MSN. And with this I must end this post. Enjoy
Also, for more info about Google crawling check out this older post called Google secrets: How to speed up Google Crawl Rate
.
Don’t forget to
subscribe to the feed
or who knows what you’ll miss out on. You can also subscribe by email. Keep tuned people

