- The Complete Research Material is averagely 50 pages long and it is in Ms Word Format, it has 1-5 Chapters.
- Major Attributes are Abstract, All Chapters, Figures, Appendix, References.
- Study Level: BTech, BSc, BEng, BA, HND, ND or NCE.
- Full Access Fee: ₦4,000
Get the complete project »

COMPARATIVE STUDY OF LEARNING FROM IMBALANCED DATA
CHAPTER ONE
INTRODUCTION
1.1 Background to the Study
In recent years, information and its transformation into Knowledge became crucial as more and more data is being generated in real world situations which is drastically varying the provision of services for use of predictive analytic or other certain advanced methods to extract value from such data, and seldom to a particular size of data set. However providing a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions. Machine Learning has become one of the mainstays of information technology and with that, a rather central, albeit usually hidden, part of our life. With the ever increasing amounts of data becoming available there is good reason to believe that smart data analysis will become even more pervasive as a necessary ingredient for technological progress.
With this rapid growth
several difficult machine learning “real-world” problems are posed,
these problems are being characterized by imbalanced learning data,
where at least one class is under-represented relative to others.
Examples include (but are not limited to): fraud/intrusion detection,
medical diagnosis/monitoring, bioinformatics, and text categorization.
The imbalanced learning problem has drawn a significant amount of
interest from academia, industry, and government funding agencies. The
fundamental issue with the imbalanced learning problem is the ability of
imbalanced data to significantly compromise the performance of most
standard learning algorithms. Most standard algorithms assume or expect
balanced class distributions or equal mis classification costs.
Therefore, when presented with complex imbalanced data sets, these algorithms fail to properly represent the distributive characteristics of the data and resultant provide unfavorable accuracies across the classes of the data. When translated to real-world domains, the imbalanced learning problem represents a recurring problem of high importance with wide-ranging implications, warranting increasing exploration.
On these basis this Project seeks to provide a detailed comparative study of the current understanding of the imbalanced learning problem and the state-of-the-art solutions created to address this problem providing ensembles to address class imbalance, the assessment metrics for imbalanced learning and highlighting the major opportunities and challenges for learning from imbalanced data.
1.2 Statement of the Problem
In recent years the problem of imbalanced data has being recognized and is being considered as a very crucial problem in data mining and machine learning, this problem occurs when there is significantly fewer training instances of one class compared to another class often associated with asymmetric costs of mis classifying elements of different classes. Additionally the distribution of the test data may differ from that of the learning sample and the true mis classification costs may be unknown at learning time. The problem with class imbalances is that standard learners are often biased towards the majority class and that is because these classifiers attempt to reduce global quantities such as the error rate, not taking the data distribution into consideration. Although much awareness of the issues related to data imbalance has been raised, many of the key problems still remain open and are in fact encountered more often, especially when applied to massive data set. In this project, we concentrate on the two class case.
1.3 Objectives of the study
In this project, we seek to;
- Provide a survey of the current understanding of the imbalanced learning problem and the state-of-the-art solutions created to address this problem.
- Recognize and state crucial real world problems with imbalanced data.
- Provide strategies of dealing with data in imbalanced domain.
- Provide a critical review of the innovative research developments targeting the imbalanced learning problems
- Stimulate future research in this field, highlighting the major opportunities and challenges for learning from imbalanced data.
- To comparatively study and determine the most efficient algorithm in learning from imbalanced data.
- Provides various suggested methods that are used to compare and evaluate the performance of different imbalanced learning algorithms.
- Provide Strategies to deal with imbalanced data sets.
1.4 Significance of the study
With the constant expansion of data availability in many large-scale, complex, and networked systems, such as surveillance, security, Internet, and finance, it becomes critical to advance the fundamental understanding of knowledge discovery and analysis from raw data to support decision-making processes. Hence a great influx of attention needs to be devoted to the imbalanced learning problem and the high activity of advancement in this field, remaining knowledgeable of all current developments can be an overwhelming task. Due to the relatively young age of this field and because of its rapid expansion, consistent assessments of past and current works in the field in addition to projections for future research are essential for long-term development. In this work, we will analyze the imbalanced learning problem which is concerned with the performance of learning algorithms in the presence of underrepresented data and severe class distribution skews, providing a comprehensive review of the development of research in learning from imbalanced data. Our focus is to provide a critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario. Furthermore, in order to stimulate future research in this field, we also highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data.
1.5 Scope of the study
The study is restricted to the nature of Imbalanced data, providing comparative study of learning schemes for learning from imbalanced data. The scope of the study in broad terms of other than learning from imbalanced data. Few among them are;
- Machine Learning algorithmic approach to learning from imbalanced data such as decision Trees (The Naïve Bayes Tree), and Artificial Neural network (The Multiplayer Perception )
- Machine learning performance evaluation measures.
- Performance and monitoring measures used in evaluating imbalanced data learning.
- Model Creation that would be used for learning from imbalanced data
1.6 Project Management
The work involved in the development of this project has been broken down into several steps and allocated across, considerably. The details are contained in (appendix A).
1.7 Organization of the study
This study consist of the following sections:
Chapter 1 – Introduction
This chapter gives the introduction of the entire report, presenting the historical background of the study, the rationale behind the work, imbalanced data and learning for such data giving the problem definition and aims/ objectives of the study
Chapter 2 – Literature Review
In this section a detail review of related study is being carried out hence discovering the theoretical framework upon which this research is built.
Chapter 3 – Research Methodology and Application
In this section we have considered few methodologies used in the analysis of imbalanced data, focusing on the imbalanced data learning algorithms. Data-sets from the Keel repository with different imbalance ratios (IRs).
Chapter 4 – Implementation and Evaluation
In this section machine learning algorithms the Naïve Bayes tress and the Multi-Layer Perceptron are used for learning on imbalanced data set which are evaluated and implemented, providing evaluation metrics for imbalanced data classification problem. Hence we will show the experimental study carried out on the behavior of some algorithms, it also examine the use of non-parametric test for statistical comparisons of the results of the classifiers. In this section we will analyze the behaviors of the best combination of components under different IR levels.
Chapter 5 – Discussion, Evaluation and conclusion
This section gives a detailed summary of the results are indicated and some conclusions and recommendations based on the findings will be made also providing suggestion (s) for future research, made for other investigations to carry out research in the related field or area.
1.8 Operational Definition
1.8.1 Concepts
Algorithm – It is a step by step finite sequence of well-defined set of instructions used to solve problems on a computer, a computational procedure that takes values as input and produces values as output, in order to solve a well-defined computational problem
Data – Numbers, characters, images, or other method of recording, in a form which can be assessed by a human or (especially) input into a computer, stored and processed there, or transmitted on some digital channel.
Data Mining – is an analytic process designed to explore data (usually large amounts of data – typically business or market related – also known as “big data”) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data
Imbalanced Data set – A data set is imbalanced if the classification categories are not approximately equally represented that is the classes are not approximately equally represented.
Learning – is the act of acquiring new, or modifying and reinforcing, existing knowledge, behaviors, skills, values, or preferences and may involve synthesizing different types of information
Machine – an apparatus using mechanical power and having several parts, each with a definite function and together performing a particular task.
Machine Learning – a scientific discipline that explores the construction and study of algorithms that can learn from data and make/take decision on unseen data based on what they have learned from previous data.
Mining – a term explaining the process of finding a small set of precious patterns from a great deal of raw material (big data)
Comparative – Comparative study is a research methodology that aims to make comparisons across different field in this case algorithms used in learning from imbalanced data.
Attribute- a piece of information which determines the properties of a field or tag in a database or a string of characters in a display.
1.8.2 Technology
Decision Tree – a predictive model which maps observations about an item to conclusions about the item’s target value. It is one of the predictive modelling approaches used in statistics, data mining and machine learning.
Cross Validation – Cross validation sometimes called rotation estimation is a model validation technique for assessing how accurate and valid the result of a statistical analysis method will be.
Artificial Neural Network- family of statistical learning algorithms inspired by biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs, and are capable of machine learning as well as pattern recognition, what makes them interesting is their adaptive nature.
1.7.3 Tools
Keel (Knowledge Extraction based on Evolutionary Learning) – is an open source (GPLv3) Java software tool which empowers the user to assess the behavior of evolutionary learning and Soft Computing based techniques for different kinds of Data Mining problems: regression, classification, clustering, Pattern mining and so on.
Data set – a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in questions. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set.
WEKA (Waikato Environment for Knowledge Analysis) – WEKA is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a data set or called from your own Java code. WEKA contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
1.8 Conclusion
Machine learning is growing and expanding in a very rapid pace. Its importance and bewildered growth helps in combining of collaborative activities with sophisticated pattern recognition, intelligent decisions self-modifying and self-learning has brought about computing without infrastructure flexibility and ideal Power. This Section gives an overview and preliminary study on the study of learning pattern using imbalanced data sets evaluating algorithms that helps in the learning process.
You either get what you want or your money back. T&C Apply

You can find more project topics easily, just search
-
SIMILAR COMPUTER SCIENCE FINAL YEAR PROJECT RESEARCH TOPICS
-
1. DESIGN AND IMPLEMENTATION OF INFORMATION MANAGEMENT SYSTEM FOR COMPUTER SCIENCE DEPARTMENT
» CHAPTER ONE 1.0 INTRODUCTION Information systems have become the backbone of most organizations. Banks could not process payments, governments could n...Continue Reading »Item Type & Format: Project Material - Ms Word | 65 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
2. COMPARATIVE ANALYSIS BW OO DATABASE AND RELATIONAL DATABASE
» CHAPTER ONE INTRODUCTION Ever since Postgres released the first object relational database system (ORDBMS) in 1986, ORDBMSs have been seen as the next...Continue Reading »Item Type & Format: Project Material - Ms Word | 57 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
3. THE EFFECT OF INTERNET USAGE ON ACADEMIC ACHIEVEMENT OF SECONDARY SCHOOL STUDENTS IN NIGERIA
» CHAPTER ONE INTRODUCTION 1.1 BACKGROUND TO THE STUDY During the last two decades education institutions have invested heavily in information and commu...Continue Reading »Item Type & Format: Project Material - Ms Word | 52 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
4. COMPUTERIZED TRANSPORT MANAGEMENT SYSTEM (A CASE STUDY OF AKWA IBOM TRANSPORT COMPANY LIMITED)
» 1.1 INTRODUCTION Transport or transfer is the movement of groups and goods from one location to a different. Transportation helps profile an area&rsqu...Continue Reading »Item Type & Format: Project Material - Ms Word | 52 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
5. DESIGN AND IMPLEMENTATION OF FILE TRACKING SYSTEM
» ABSTRACT Large volumes of data are usually generated in most institutions of learning today. Locating files among tons of others can thus be tedious a...Continue Reading »Item Type & Format: Project Material - Ms Word | 50 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
6. ELECTRONIC LOGBOOK FOR CADETS INDUSTRIAL WORK EXPERIENCE SCHEME (S.I.W.E.S)
» CHAPTER ONE INTRODUCTION Scientists and engineers traditionally kept paper logbooks of their experiments and inventions. The need for the logbook is c...Continue Reading »Item Type & Format: Project Material - Ms Word | 63 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
7. COMPUTERIZED TRANSCRIPT MANAGEMENT SYSTEM A CASE STUDY OF CARITAS UNIVERSITY
» CHAPTER ONE 1.0 INTRODUCTION There were three fundamentally distinct education systems in Nigeria in 1990. The indigenous system, Quranic Schools and ...Continue Reading »Item Type & Format: Project Material - Ms Word | 58 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
8. PREDICTING THE FEATURE ACADEMIC PERFORMANCE OF UNDERGRADUATE STUDENTS WITH ARTIFICIAL NEURAL NETWORKS (ANN).
» CHAPTER ONE INTRODUCTION 1.1 BACKGROUND TO THE STUDY Most institutions of higher learning today are concerned with predicting the paths of...Continue Reading »Item Type & Format: Project Material - Ms Word | 55 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
9. DESIGN AND CONSTRUCTION OF A DIGITAL CLOCK
» CHAPTER ONE INTRODUCTION The world as it stands now can be said to be digitalised in every ramification and as such this project is aimed at looking f...Continue Reading »Item Type & Format: Project Material - Ms Word | 52 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT
-
10. DESIGN AND IMPLEMENTATION OF ESTATE VALUATION MODELLING SYSTEM
» CHAPTER ONE INTRODUCTION Preliminaries Estate surveying and valuation in Nigeria dates back to the beginning of creation. At creation the first estate...Continue Reading »Item Type & Format: Project Material - Ms Word | 64 pages |
Instant Download | Chapter 1-5 | COMPUTER SCIENCE DEPARTMENT