Always open for the right person…
(all disciplines: dev, test, pm)
Interested? Questions? Click here:
![]()
Data Mining
Data Mining is a hot, new area, and we want a talented, highly motivated
individual to join our growing Data Mining group. With all the data from the
World Wide Web, we have endless potential to uncover patterns to help us improve
our Search Engine, delight our customers, and confound our competition. This is
an opportunity to use all kinds of leading-edge technologies, including
machine-learning (Neural Networks, Support Vector Machines, Hidden Markov
Models, etc.), Natural Language software, Parallel processing, and very large
databases.
You must be highly customer focused, and have several of the following
qualifications: proven experience with C++; proven experience with
object-oriented design; very solid coding/debugging skills; solid algorithmic
skills; knowledge of SVMs, Neural Networks, HMMs, Decision Trees, etc. with
in-depth knowledge of and practical experience with at least one or two of
these; demonstrated success at dealing with ambiguous problems; and the ability
to make solid progress when the solution is not well defined. Actual experience
doing data mining is desirable, but not required. Basic knowledge of SQL is
required. Masters CS or equivalent.
Spam Busting
Spam is one of the top killers of relevance for any search engine. Try the
queries: valentine, big island or cialis to see this firsthand. Low quality, low
relevance sites use several techniques to spam their way into search results.
Spam adversely affects all areas of search -- crawl, data extraction,
link-analysis, results ranking.
We need someone to kill spam dead. As in the data mining area, this is an
opportunity to use all kinds of leading-edge technologies, including
machine-learning (SVMs, etc.), parallel processing, graph theory to tame this
problem. The ideal candidate will combine strong software engineering skills
with a solid background in one or more of these disciplines. Enthusiasm for
reviewing the latest research, inventing new techniques, and doing rigorous
experimental validation is required.
Results Ranking
The Ranking team develops the components that predict in a fraction of a second
which of our 5 billion web documents will best answer a user's query. It is one
of the highest impact and most technically challenging projects you will find
anywhere in our industry. In collaboration with Microsoft Research, we explore
cutting edge techniques from statistics, information retrieval, machine
learning, and computational linguistics to attack this problem. The ideal
candidate will combine strong software engineering skills with a solid
background in one or more of these disciplines. Enthusiasm for reviewing the
latest research, inventing new techniques, and doing rigorous experimental
validation is required. Foreign language skills are also a plus.
Question Answering
Do you want a search engine that can answer your questions instead of returning
a list of documents? This is one of our most technically ambitious projects, and
we need a few exceptional SDEs who can make it a success. In collaboration with
Microsoft Research, we will take a promising prototype and add innovations that
dramatically improve its accuracy, coverage, and language portability. In
addition to strong software engineering skills, the ideal candidate will have
strengths in statistics, machine learning, and/or computational linguistics.
Foreign language skills are also a plus.
Web Structure Analysis
The web has a complex structure that gives us valuable information about the
popularity and authoritativeness of documents in our search index. Success in
web search depends on harvesting as much information as we can from this massive
source of data. This area is filled with fascinating technical challenges from
distributed graph algorithms to pattern recognition, and because people are
constantly trying new ways to manipulate search engine rankings, there are
always new challenges. Candidates for this position should have a strong
software engineering and computer science background that includes graph theory,
distributed computing, performance optimization, probability and statistics, and
machine learning.
Enabling Engineering Excellence
We are a growing team, with a growing v1
codebase. We are looking for someone to help us build tools to ensure that this
is the best engineering team at Microsoft. You will be responsible for managing
all aspects of our engineering excellence work -- laying out our future source
management strategy, our build infrastructure, our branching methodology, the
whole works…. You will take pride in raising development efficiency across the
team, and in being the enabler of great search technology.
You must have prior experience working in world-class build environment. You are
a perl and sd gearhead. You take pride in your scripts, and in your ability to
tame complex dependency problems.
Grepping the web
The index serve team is chartered with doing the 'search' in the web search, and
doing it faster than a grep on a local file, and doing it for thousands of
queries a second, over billions of documents over thousands of servers. Help us
create, refine, innovate, and deploy software that defines the ability to
provide user answers fast and reliably. We are responsible for the
infrastructure that makes it possible to reliably and efficiently manage and
process hundreds of terabytes of information. Along with query serving, this
team also provides the platform to support relevance and data mining.
You have at least a BS in CS or equivalent with several years of software
development experience, a solid background in software development on
multithreaded, high scale server systems. You should be comfortable working on a
first generation, ambitious project with rapid development iterations and high
reliability and performance standards.
Running the super computer
The autopilot team builds an infrastructure for MSN Search and other distributed
applications. The main challenge is turning unreliable hardware and software
into a reliable cluster with 99.9% uptime, only 9 to 5 operations support, and
less than 1 operations person for maintaining 1000+ machines.
Here are some of the problems you would help to solve: early detection of
hardware and software failures; performance monitoring and analysis for the
large volumes of computers; distributed applications scheduling and load
balancing; messaging and data transfer protocols. Bottom line – we want to build
a system that would let 10,000 commodity PCs work as a supercomputer.
You have at least a BSCS or equivalent (MS/PhD preferred); 3 years of software
development experience using C, C++ or C#; deep understanding of object oriented
design and practical experience at dealing with ambiguous problems.
Hand crafted results
When all else fails, and the ranking algorithms do not pass the confidence
threshold, we fall back to delivering handcrafted results. Working on a team of
approximately 132 other handcrafters in 26 worldwide markets, you will receive a
user query, use all the available search engines to quickly scour the web for
results, pick the top 10 results for this query, and send it on to the user.
Successful handcrafters can typically find top 10 results for a real-time user’s
query in less than 3.8 seconds. This is an opportunity to truly connect with
customers, because the queries that get routed to you are precisely the ones
that the engine cannot answer well. We will have adequate staffing to allow
generous coffee and bathroom breaks.
If you are an expert at using at least 3 different search engines, well versed
with American English/colloquial usage, and can type at > 149 words/minute as
measured by the Simia-Lico method – come join us and delight users real-time!
Multimedia search
Large scale web based multimedia search (Images, video, audio) has been stagnant
for five years, so there is an opportunity to leapfrog the competition. In this
area, ranking and relevance s very challenging since:
web page static rank analysis (e.g. PageRank, MSNRank) doesn’t help much
user input is still a text search box, but target content is not text
UI experience is difficult because of copyright limits on direct linking from
result pages to multimedia content
text descriptions of multimedia content on the web are notoriously bad
no good understanding of what users really want
this list goes on….
If you are capable of working on an ambiguous pre-v1 project that touches all
parts of the search engine, and want to enable a killer multi-media search which
is very cheap to operate, then this is the job for you!
© 2007 Microsoft Corporation. All rights reserved.