This blog post covers some of the rationales that I put into when designing SkyMap, the project which involves making >400,000 sequencing runs accessible to everyone. This post could be informative to you when you are designing your next Big Data application in Bioinformatics. I have listed some of the problems that I faced and … Continue reading Design rationale for SkyMap JupyterHub: How can a Jupyter notebook extract the expression levels or allelic read counts from > 400,000 sequencing runs in seconds?
A lot of people ask me how I went from computer science to bioinformatics. Actually, the two fields aren’t that different. Computers store long-term data in a disk with 0’s and 1’s, while cells store long-term data in DNA as A, C, G, and T. Computers store transient data in the cache or RAM, while … Continue reading Computer vs. Human: From a computer nerd who went into bioinformatics
Imagine what would happen if every day during lunch time, you had to consciously coordinate all of the steps in digestion: breaking down the food in your stomach, pushing the food through your intestines, and telling yourself to stop feeling hungry after eating. You would spend the entire day coordinating your digestive system! Eating is … Continue reading Getting results from Big Data without the Big Infrastructure problem: Cloud + Docker + Kubernete
A Ph.D. is very much like a marriage with your advisor, as suggested by this PHD Comics post: After all, the term “Ph.D.” stands for Doctor of Philosophy, where the word “philosophy” is composed of the Latin roots philo- (love) and -sophos ("wisdom."). So maybe there are some skills transferable from your love life to … Continue reading The PhD versus Online Dating
There have been many articles covering how the entire human race is going to be replaced by AI and machine learning. That I don’t know. However, machine learning is in many ways simply mimicking human learning, and I believe we can apply effective machine learning techniques to improve our own learning and education. Be open … Continue reading 8 ways machine learning techniques can teach us about effective human learning
The recurrent question in the data-intensive workplace often revolves around which computing infrastructure to use. In the past four years as a bioinformatics Ph.D. student, I have both received and offered solicited and unsolicited advice regarding computing infrastructures using my prior experience in high-performance computing lab and current expertise in data analytics. This blog post … Continue reading Buying computing infrastructure vs adopting the Cloud
Github link: https://github.com/brianyiktaktsui/Skymap#quick-start-10min Motivation Pooling pre-processed data from public studies sucks! It takes time and way too much brain energy. When I first started in bioinformatics a couple years ago, I spent much of my time doing two things: 1.) cleaning -omics data matrices, e.g. mapping between gene IDs (HGNC, Ensembl, USCS, etc.) for pre-processed … Continue reading Preview on Skymap project: extracting allelic read count and expression profiles of >400,000 sequencing run into simple omic matrices