Topic: Data Mining Proposal- Biomedical And DNA Data Analysis
This paper will highlight a proposal for the application of the data mining techniques and procedures for the purpose of getting a deep insight of the biomedical and DNA Data analytics of Cancer Research Center. The SNP data, microarray data and various records of the patients are to be collected from the clinical data of the patients and various kinds of demographic database. Due to the rapid expansion in the biomedical research, a huge number of the patterns and functions of different kinds of genes has to be analyzed and studied. The strategy, schedule and plan will be constructed which help to satisfy the various kinds of objectives and thus it will help in the development of a better treatment for cancer by identifying the genes which will play a much better role in the development process (Tabernero et al 2015).
Cancer is such a disease where there is abnormal growth of the cells and it cannot be controlled easily. The disease starts when the genes make certain changes and there are certain cells which begin to grow and multiply rapidly. The cells grow through the normal tissues. This disease harms the body when the cells multiply in an innumerable way and they form huge masses or lumps of tissues which are known as tumors. The most dangerous stage is reached when a cancerous cell moves throughout the entire body by using the lymphatic systems or blood and it destroys the tissues. Such a process is known as invasion. Moreover such cells also grow further and divide and it makes new blood vessels to sustain its life through a process known as angiogenesis (Zolbanin, Delen and Zadeh 2015).
According to a recent survey by the American Cancer society, it is one of the most common reasons for the death in US and thus it accounts for about 1 death in every 4 deaths. A survey carried out by World Health Organization (WHO) in 2015 has estimated that there are 14 million fresh cancer cases which accounts for about 8.2 million deaths worldwide. There are about 200 types of cancer. It is one of the major diseases in mortality and morbidity across the world. A huge cost is involved in the treatment of various kinds of cancer and this is estimated to be about $263.9 billion in US dollar. There are various kinds of treatment options for cancer and it depends on the stage of the disease. The main goal of this treatment is to kill the cancerous cell which will help in the reduction and damage of the normal cells. This disease can be treated through surgery or radiation therapy or chemotherapy. There are various kinds of therapy which can be used for the treatment of cancer such as nutrition therapy, oncology rehabilitation, naturopathic medicine and mind body disease (Braha 2013).
The cancer stem cell is responsible for the tumorigenesis and this contributes to the resistance against the cancer therapy. The cancer stem cell has been isolated from different kinds of human tumors which have penetrated for putative normal stem cell for various kinds of disease such as prostate cancer, brain cancer, leukemia and ovarian cancer. The cancer stem cell is vital for different types of cells which can differentiate, regenerate and maintain the growth of tumor and thus it plays an important role in various kinds of therapeutics and tumorigenesis. This research proposal will highlight the various use of relevant data mining which is used to discover the important variances in the genetic sequence and thus it can also be related to the disease leading to the cure of the disease (Azad et al 2015).
The data collection programme involves the process of gathering the data through Microarray DNA and SNP data in various kinds of DNA samples. The data will be collected from the patient’s database and their clinical records of visiting the Cancer Research Center. Microarray DNA data is such a measure of expressing the genes at any moment. The genes are on the genomes and they are responsible for the synthesis of protein (Heim 2015).
The collection programme will be targeted using a balanced sample of the patients. Cancer is detected in the young as well as the elderly patients. The incidence of cancer increases with the increase in age and it is a challenge for the old age patients. As the life expectancy of the people increases, there are many countries which are having older cancer patients. There are many conditions which will affect the old age people. It is seen that the incidence of cancer in the people above the age group of 65 is 10% more than the people below the age of 65 (Dawson et al 2013).
The age is considered major factor in the diagnosis of cancer and in case of men, the young men shows higher tendency of this disease especially the prostate cancer. The surveillance, epidemiology, and end results program (SEER) for the identification of the patients were used in the study.
The primary objective of the study is:
The knowledge which will be gathered from this study will be beneficial for the biochemical organization and the biochemists and this will help in the development and treatment for Cancer. This study will provide information related to the genes and it will help in the future research programme carried out by others. The most important beneficiaries will be the scientific or the genetic researchers who are conducting thorough experiment on the detection of such a dangerous disease.
The proposal will help in the judgement based and other measurable criteria which will help in the success of the various key performance indicators (KPI’s) to set the qualitative targets. These criteria’s will also help to develop, maintain and support the various primary objectives which are stated in the document.
Data collection of the various kinds of microarray data, SNP data and the record of the patients will provide the data source for the analysis. The data collection programme is aimed to collect the data from 200 samples. This data collection programme is expected to be achieved within the scheduled time frame.
Personnel: 1 Oncologist (consultant), 2 x data miners, project manager, research liaison officer and administrators
The oncologist will be brought periodically to ensure that the data will be analyzed properly and it should not be able to make any invalid assumptions that will be made by the data miners only
The research liaison officer will have to ensure that all the efforts are communicated properly to the team members. The project should be funded by the government and the role of the project manager will be as a mediator.
The project manager must ensure that the task is carried out properly within the schedule budget and time frame. The other typical responsibilities which are associated with the project will be assigned to the project manager. The office administrator will ensure that the business is carried out properly and is according to the business requirement. The software used will be data mining tools and techniques which are associated with prototyping phase.
The data mining project in the context of the scientific endeavor is to find out a solution for the treatment of cancer. The data collection programme is stated earlier and it requires no further explanation. The testing is done in the laboratory to check that the laboratory workers are witnessing how they will further carry out the research on animals. This laboratory work will help in the development of various kinds of drugs which will help in the treatment of cancer (Ferlay et al 2015).
The nature of this project is exploratory and it avoids the risks of not providing proper data which is to be managed. The external risks which are connected to the project are incorrect use of the data collection programme. This is to be done by properly managing the validation of the data using different kinds of samples. It must also be checked that the competitors do not deliver the results before the completion of the project. It must also be checked that inappropriate data are not used in the project because it will not yield proper result
The standard process which will be used in the proper execution and planning of the data mining process for cancer is cBioPortal. This software will help in the modeling and formulation of the different types of data mining projects. The project will be carried out in different phases and proper planning is necessary because it shows the different kinds of levels. This software will also help in the exploration of a large number of databases from various cancer patients.
Importance of data mining
The analysis of the high dimension and complex data such as Microarray DNA data, SNP data requires complicated data modeling and learning algorithms which are used to interpret and decipher the various kinds of data used in prediction. The clinical parameters of the patients which are collected in different genomic data set must be given more importance. The classifications must be made on the various kinds of genomic data which is often used to collect the biological database of the patients. The microarray data have more than thousand genes and only limited samples are used in the study.
The major objective of data mining is to identify the genes which are destroying the cells in the human body. The data mining process will also help to store, process, collect and access the dataset. This process has helped in addressing the problems of cancer and also carries out successful research in the treatment of this disease (Schlötterer et al 2014). The DNA microarray data will be used to study the different kinds of variance in the expression of the genes. This will help to identify the different kinds of genes which play an important role in the disease. The DNA microarray data consists of a thin glass on which the sample of the genes is spotted in a printing device. There is also a different approach of data mining which is used in the identification of the gene structure. This may lead to the destruction of the germ carrying cells (Freitas 2013).
Data exploration/Prototyping
Prototyping is a draft version and it gives us an idea to show our knowledge behind the development of the project. The team will conduct prototyping of a wide number of techniques and tools which is applied in the problem (Rokach and Maimon 2014). The prototyping phase is used for the evaluation of the various tools and techniques which are used in the model. This phase will also be guided by the knowledge of a research which is carried out previously in the similar topic. The other researchers may use different kinds of techniques for the collection of data mainly the microarray data. The various kinds of constraints which are related to the performance will also be extrapolated from other samples to ensure the feasibility of the project (Murtaza et al 2013).
It is very important to verify the quality of data. Efforts must be made to check that the data are of good quality and there are no missing values in the data. The data must be of qualitative nature and it is important to check that all the participants are covered in the sample.
Selection of the data
There are various kinds of opportunities which are used to refine the various kinds of strategies of the attributes of the data. This data are used for building the required model. Improper and redundant data must not be used as it will not deliver proper results. The most important aim of this study is to find out those genes which play a vital role in identifying the particular disease. The data collected from the clinical source will serve as a good purpose for the study (Gao 2013).
The unstructured clinical data will be cleaned and flattened. The missing values which are not filled in the original data will be filled up in this step and it is the main responsibility of this project.
Data formatting
The tools which will be used in formatting the data other than the original format will be used in the transformation and no alterations will be made in those data sets. The data must not be split into output attributes or binary attributes and there must be some amount of binning which is necessary to transform the data (Holzinger, Dehmer and Jurisica 2014).
Modelling
This step of the data mining process is the main step where the important deliverables are produced. This phase is specific to the various techniques which are applied in the process and it does not require any other techniques of modeling which will be used in the validation or training in building up the entire model.
Evaluation
In this phase evaluation will be carried out to check that data produced will be helped in meeting the business requirements. In this stage, it is also important to check that the model fulfills the requirement of the business and revision must be made if it does not fulfill the requirements. This process can be referred to as an iterative process.
Deployment
In this phase, it is important to generate the formal reports which will help in carrying out the projects successfully.
Project Plan
The below mentioned plan is a schedule of the different kinds of phases which will help in carrying out the data mining project successfully.
Start date |
Phase |
Duration |
Staff Involved |
Understanding the business process |
1 month |
Project manager and project owner |
|
Understanding the data |
2 month |
Everyone |
|
Preparation of the data |
15 days |
Data Miners |
|
Modeling |
2 month |
Project managers, experts of domain and data miners |
|
Evaluation |
15 days |
Project managers, experts of domain and data miners |
|
Deployment |
1 month |
Data miners and project managers |
The entire duration of the project is of 7 months
References
Azad, A.A., Volik, S.V., Wyatt, A.W., Haegert, A., Le Bihan, S., Bell, R.H., Anderson, S.A., McConeghy, B., Shukin, R., Bazov, J. and Youngren, J., 2015. Androgen receptor gene aberrations in circulating cell-free DNA: biomarkers of therapeutic resistance in castration-resistant prostate cancer. Clinical cancer research, 21(10), pp.2315-2324.
Braha, D. ed., 2013. Data mining for design and manufacturing: methods and applications (Vol. 3). Springer Science & Business Media.
Dawson, S.J., Tsui, D.W., Murtaza, M., Biggs, H., Rueda, O.M., Chin, S.F., Dunning, M.J., Gale, D., Forshew, T., Mahler-Araujo, B. and Rajan, S., 2013. Analysis of circulating tumor DNA to monitor metastatic breast cancer. New England Journal of Medicine, 368(13), pp.1199-1209.
Ferlay, J., Soerjomataram, I., Dikshit, R., Eser, S., Mathers, C., Rebelo, M., Parkin, D.M., Forman, D. and Bray, F., 2015. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. International journal of cancer, 136(5).
Freitas, A.A., 2013. Data mining and knowledge discovery with evolutionary algorithms. Springer Science & Business Media.
Gao, J., Aksoy, B.A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S.O., Sun, Y., Jacobsen, A., Sinha, R., Larsson, E. and Cerami, E., 2013. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling, 6(269), p.pl1.
Heim, S., 2015. Cancer cytogenetics: chromosomal and molecular genetic aberrations of tumor cells. John Wiley & Sons.
Holzinger, A., Dehmer, M. and Jurisica, I., 2014. Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions. BMC bioinformatics, 15(6), p.I1.
Murtaza, M., Dawson, S.J., Tsui, D.W., Gale, D., Forshew, T., Piskorz, A.M., Parkinson, C., Chin, S.F., Kingsbury, Z., Wong, A.S. and Marass, F., 2013. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature, 497(7447), p.108.
Rokach, L. and Maimon, O., 2014. Data mining with decision trees: theory and applications. World scientific.
Romero, C. and Ventura, S., 2013. Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), pp.12-27.
Schlötterer, C., Tobler, R., Kofler, R. and Nolte, V., 2014. Sequencing pools of individuals–mining genome-wide polymorphism data without big funding. Nature reviews. Genetics, 15(11), p.749.
Tabernero, J., Lenz, H.J., Siena, S., Sobrero, A., Falcone, A., Ychou, M., Humblet, Y., Bouché, O., Mineur, L., Barone, C. and Adenis, A., 2015. Analysis of circulating DNA and protein biomarkers to predict the clinical activity of regorafenib and assess prognosis in patients with metastatic colorectal cancer: a retrospective, exploratory analysis of the CORRECT trial. The Lancet Oncology, 16(8), pp.937-948.
Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M. and Cancer Genome Atlas Research Network, 2013. The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10), pp.1113-1120.
Zolbanin, H.M., Delen, D. and Zadeh, A.H., 2015. Predicting overall survivability in comorbidity of cancers: A data mining approach. Decision Support Systems, 74, pp.150-161.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form