Highly Advanced Deep Learning Research
Keywords: Deep Learning Theory/Methods, Advanced Machine Learning
Candidates: Students who want to publish papers in ICML/NIPS/ICLR. Students must have strong programming background, reasonably good mathematical skills, and solid machine learning knowledge to begin with. Students with good knowledge of statistics, linear algebra, optimization, and neuroscience are preferred.
Introduction: Carnegie Mellon is not a place to follow others’ research ideas and make them slightly better, we are responsible for leading the rest of the world to research by bringing out innovative ideas and implementing these ideas well. For example, making CNN/LSTM/GAN works better is only secondary level research, we are interested in creating the next CNN/ LSTM/GAN.
Specific ideas cannot be disclosed via this introduction, but raw directions include:
● Studying how human brain functions in language and trying to bringing out innovative replacement for LSTM and its variants. (Someone may argue that LeCun has said that we should not rely too much on brain structure, well, if you want to follow LeCun instead of replacing him, please consider Project II. )
● Studying advanced optimization methods and trying to replace SGD.
● Other deep learning models that are derived from advanced linear space models. (The impact of this direction may not be as significant as the previous two, but still enough to be published decently in ICML/NIPS/ICLR)
Deep Learning: Novel Methods and New Applications
Keywords: Deep Learning, Machine Learning, CNN, GAN
Candidates: Students with good programming skills and basic knowledge in deep learning. Experience with TensorFlow or PyTorch is preferred. This is the most popular project for the students with an entrepreneur goal.
Introduction: Deep learning is a huge topic these days, and because of its dominant representation power, people have attempted to apply it to a lot of different domains. However, there are still a lot of cool ideas to be mined. This project is designed for the students who love to know about deep learning to have some hands-on experience in using deep learning to solve problems. All kinds of cool problems are welcome. Some specific examples are done, or waiting to be done recently include:
- Multimodal sentiment analysis
- Medical image recognition and segmentation
- Genomic Application (GWAS)
- Video understanding
- Music Generation
- Natural Language Understanding (Word vector/sentence vector) Image style transfer
Variable Selection, Confounder Correction, and other problems in High Dimensional Data
Keywords: statistics, variable selection, lasso, confounder correction, linear mixed model, high dimensional data, bioinformatics
Candidates: Students with solid statistics background. Statistics major students are preferred.
Introduction: This big data era has witnessed a rapid increase in the volume of data. Together with the increasement of volume, the amount of available information of each data sample increases even faster, resulting in a high-dimensional regime. In machine learning/statistics filed, we typically refer the problems where we have more features/variables than the number of samples as a high dimensional problem. Intuitively, one could imagine the difficulties of such high dimension problems since there will not be enough information to study the these features/ variables. Statistically, there is no widely-accepted solutions to many different aspects of these problems and progresses on these aspects may lead to impactful work nowadays.
This project challenges the students to study some of these challenging problems. For example, the introduction of Lasso has greatly revived the high-dimensional problem. It solved many related questions, however, introduced many more questions.
There are many new projects that need to be solved, but cannot be disclosed. Some old projects that have been done in our lab are listed here as examples:
- Lasso is known with inconsistent and unstable problems, therefore, variable selection usually turns out to be underperform than expected. There are a few solutions proposed to rescue these problems, like Adaptive Lasso , Elastic Net, Precision Lasso, but these are not satisfying enough.
- Another challenge is about heterogeneous data: modern data sets are barely collected with a consistent setting, resulting the data come from twisted distributions. This heterogenous property raises challenges for reliable variable selection, and some attempt solutions are based on linear mixed model. Some extensions are proposed as follow-ups, but there are a lot of chances to extend this work.
Classical Machine Learning Application:Computational Finance & Computational Biology
Keywords: machine learning, computational finance, computational biology
Candidates: Students with basic machine learning knowledge and programming skills. Students must be good at calculus, linear algebra, and statistics. Students of all majors are welcome. Students with CS major, stats major, finance major, or biology major are preferred.
Introduction: Machine learning is going to be one of the most important topics to know in this AI dominating era. This project is offering hands-on research experience for students to develop new machine learning method to solve real-world related problems. Particularly, this project encourages students to solve real-world Computational Finance or Computational Biology problems because the popularity these two problems gain these days. There is a slightly different focus of these two areas:
Computational Finance:
- Focuses on applying existing methods to solve real world financial problems.
- Requires more programming skills than mathematical skills.
- Only requires a limit amount of finance knowledge.
- Offers internship chance from The Bank of New York Mellon if the students show excellent skills in this topic.
Computational Biology:
- Focusing on developing new methods for new problems.
- Requires more mathematical skills than programming skills.
- Sometimes requires a deep understanding of biology.
- Offers internal referral to CMU MSBIC program if the students show excellent skills in this topic.
Visualization for Genomic Study Results
Keywords: HCI, Visualization, Genomic Study, GUI
Candidates: HCI students, basic programming skills is needed, basic college-level statistics knowledge is preferred.
Introduction: Genomic studies are one of the most important research areas, and a significant number of scientists are spending time studying it. Many of their studies end with publications in Nature if presented clearly. However, because the problems themselves are incredibly complicated, it’s not easy for scientists to find a way to report their scientific discoveries clearly even after they have made some significant scientific discoveries. We are aimed to help them.
This project involves two steps:
- We studying the existing visualization techniques provided by tools like LocusZoom, and implement them conveniently as a stand alone python project.
- We studying the problems by these existing visualizations and proposing new ones to replace these old ones, with the aim to simplify the scientists’ labor in presenting results and illustrating their results clearly.
Course Features
- Lectures 1
- Quizzes 1
- Duration 10 weeks
- Skill level All levels
- Language English
- Students 8
- Assessments Yes