Our New Search Capability

Part of the new homepage launch includes a new Search engine, provided by a Google Search Appliance (GSA) that is already installed in our server room. The new search capability replaces a free tool that we started using over six years ago which Google no longer supports.

Current Status: in test, not publically available yet. Send Comments to webmaster.

Advantages of the new search appliance

google search appliance image
the new box - isn't it cute?
  1. Current site is partially broken. You may have noticed that the search bar in the middle of the current search page is broken. This is in the area of code that Google supplies and we have no ability to change this code nor any contact person at Google to fix it. The one at the top of the page, which we supply, still works, but clearly we’d like to get the campus onto the new tool.
  2. More frequent site crawls. The old free service crawled the site once a month, meaning that new pages did not show up for 1-30 days after going live. The new site is being crawled daily now and depending on load we may run it continuously.
  3. New capability: Biasing is a back-end function that allows the administrator to "push" the Google algorithm to preference search results with various metadata (for instance: to make sure that searches for umptysquat go to the umptysquat department rather than some other random site that mentions umptysquat)
  4. New capability: Filtering is a front-end tool for the user that provides the ability to search by specific metadata values to narrow the results.  Filtering could be setup by domain (faculty sites vs department sites) or type of data (people vs websites.)  This is NOT in place yet but we will gradually roll out this capability for things like searching the Index as well as websites, and, we hope, searching the ADMCS directory database of people as well as websites – but this is a task for next year.

Issues of the new search appliance

We are still learning how to use it. These are some areas we may need help with.

  1. Presence of your site. Since our WWU site is extremely de-centralized with hundreds of completely separate sub-sites and dozens of servers both non-central and even off-campus in some cases, we cannot depend that it is possible to get to every site by starting at and crawling links. Google gives us the ability to assign multiple places to start the crawl and we’re trying to make sure we’ve included all the servers not administered by ITS. (We think we got all ours – but even this is quite a list!). If you’re not seeing pages from your server on the list – send us an email including the server name and start url and I’ll see if we’ve included it.
  2. Rank of your site. The algorithm on the GSA reflects the newest work by Google. The old search used search algorithms that had not been updated in years. Both these algorithms are proprietary so we don’t know exactly what changed, just that lots did. As a result Google will put things high on the list that didn’t use to be and visa-versa. We will bias for department home pages, but probably not for sub-pages. If you think a site homepage is inappropriately low in the list let me know what search term you used and what site you would expect to show up high in the list.
  3. Name of your page. Fix your own page titles! This is the <title> tag in your web page. These are what show up as the name in the search. So as not to embarress anyone (we all have some pages like this) I’ll reveal one of mine. The (bad) title in blue below at least gives an indication of department, but nothing else. There are also lots of “Untitled” out there. Please make sure that the page title doesn’t just say, for example “curriculum” but says “Basketweaving 101: Curriculum” or something of the sort. Be complete so that someone scanning the results will know what the page is about.sample missing title tag