Data
Mining in the Presence of Quantitatively and
Qualitatively Diverse Information
IDM-0415190
Anne
M. Denton <anne.denton@ndsu.edu>
North
Dakota State
University
Homepage: http://www.cs.ndsu.nodak.edu/~adenton/
Abstract
Real
data often show a more complex structure than is assumed in much of
statistics,
machine learning, and data mining. Objects may be characterized by
diverse
types of information such as numerical quantities, text, and properties
of a
network neighborhood. The goal of the
project is to develop techniques to integrate information components
that
differ both quantitatively and qualitatively.
Classification algorithms that are based on homogeneous
attributes can
be evaluated exclusively by their overall classification quality. In the presence of qualitatively and
quantitatively diverse information, the space of all relevant
combinations of
techniques and parameters is too large to be evaluated by any
reasonable amount
of test data. Three goals are pursued
- Define intermediate,
homogeneous attributes that allow effective use of
uniform
classification and clustering techniques
- Develop robust criteria
that allow
identification of suitable intermediate attributes and do not
exclusively rely
on overall classification accuracy
- Develop efficient and effective
approaches to generate intermediate attributes from data with network
connectivity, time-dependent data, and text among other types of data.
Starting
with a specific classification problem in bioinformatics, the project
attempts
to find solutions that are applicable to a wide range of data mining
problems. The work is ideally suited to
teach students a broad range of research activities from fundamental
concepts
to applications, both in thesis and course work. Results
will be of relevance to a large
number of practical applications in bioinformatics and other sciences.