Friday, July 31, 2009

Take Control of Your SharePoint Crawl Schedules, Content Sources and Indexes

All my farms were set up (the old way) with one server running Central Admin, and generally I use that for an Indexer too. On top of that, I use a certain utility (which I am not at liberty to discuss in detail) which also runs on the Central Admin server(s). This can sometimes put an intense burden on one server (while the others are loafing).

One thing I am going to do, is load balance my Central Admin and spread it (and that certain utility) around. My setup now is just plain wrong. But in my defense, at one time it was right, and when we did our sanity checks with other experts, it still was right. Once things started getting heavier (indexing 500 GB, then over 700GB of content), there isn't much room for error. Our data center is miles away, but I think I can hear the little fans in those blade servers cranking like mad from here. VVvvvvvv...

Until I redo my Central Admin sites, I have to be cautious about what activities that over-burdened server is doing. One thing I do is turn off indexing when I know I will need those resources. Indexing takes disc space and CPU. I've seen big full crawls fill up lots of temporary space. Be sure to factor this in. It's so easy to take control. You can stop a crawl, but better yet, just don't let a new one start. Change the schedule to fit your activities. Use something like "Every 20 minutes from 8:00 AM for 810 minutes." That will start the last incremental at 9:30 PM, and at 10PM, you should have all the resources of that server for other tasks.

This relates a bit less, but one of my regrets on my setup is that I didn't separate things into different content sources. The logical thing would be to break it up by web app. I will do this soon too. That way you can prioritize (by starting one first) or set separate schedules for different content.

In a really large site, I suppose you could use separate SSPs and manipulate where they are and what they are indexing. We started to go down that path, but if you keep everything together, and just watch how you set up content sources and crawl schedules, you can get a whole lot of useful search indexing (out of not much hardware). Under one SSP, you can have a universal, site-wide index, with no additional setup.

No comments:

Post a Comment