tangming2005 Profile Banner
Ming
Ming "Tommy" Tang

@tangming2005

Followers
40K
Following
32K
Media
2K
Statuses
67K

Director of bioinformatics at AstraZeneca. YouTube at chatomics. On my way to helping 1 million people learn bioinformatics. Also talks about leadership.

Boston, MA
Joined December 2011
Don't wanna be here? Send us removal request.
@tangming2005
Ming "Tommy" Tang
6 months
The guide I wish I had 12 years ago: a step-by-step guide to replicate a genomics paper figure
Tweet card summary image
divingintogeneticsandgenomics.kit.com
8
82
516
@tangming2005
Ming "Tommy" Tang
8 hours
Unlock your bioinformatics potential! 🌟 Discover why mastering both Python and R can elevate your data visualization and analysis skills. Dive into the world of code! 💻 #Bioinformatics
0
2
21
@grok
Grok
1 day
Join millions who have switched to Grok.
21
21
171
@tangming2005
Ming "Tommy" Tang
9 hours
Enjoy this tweet? follow me @tangming2005 and join my newsletter to learn computational biology
0
0
1
@tangming2005
Ming "Tommy" Tang
20 hours
High-throughput profiling of chemical-induced gene expression across 93,644 perturbations.
Tweet media one
2
12
86
@tangming2005
Ming "Tommy" Tang
20 hours
I hope you've found this post helpful. Follow me for more. Subscribe to my FREE newsletter chatomics to learn bioinformatics
Tweet card summary image
divingintogeneticsandgenomics.kit.com
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad...
@tangming2005
Ming "Tommy" Tang
20 hours
You can save millions $ for your organization if everyone follows best practices to fill in a spreadsheet. Do simple things to make bioinformatician's life easier. Read the full article here
Tweet media one
Tweet media two
0
0
1
@tangming2005
Ming "Tommy" Tang
20 hours
It should be a required reading for any scientists. It is probably the simplest thing to do that can have a high ROI.
1
0
1
@tangming2005
Ming "Tommy" Tang
20 hours
and I know wet lab scientists are willing to do good work,.they are just not aware of the right way to do it (I was one of them). that's why I want to educate more. DO read this article: Data Organization in spreadsheets.
Tweet card summary image
tandfonline.com
Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for o...
1
0
1
@tangming2005
Ming "Tommy" Tang
20 hours
You can save millions $ for your organization if everyone follows best practices to fill in a spreadsheet. Do simple things to make bioinformatician's life easier. Read the full article here
Tweet media one
Tweet media two
3
4
39
@tangming2005
Ming "Tommy" Tang
21 hours
I hope you've found this post helpful. Follow me for more. Subscribe to my FREE newsletter chatomics to learn bioinformatics
Tweet card summary image
divingintogeneticsandgenomics.kit.com
Why Subscribe?✅ Curated by Tommy Tang, a Director of Bioinformatics with 100K+ followers across LinkedIn, X, and YouTube✅ No fluff—just deep insights and working code examples✅ Trusted by grad...
@tangming2005
Ming "Tommy" Tang
21 hours
Bioinformatics data is messy. Here’s the nightmare that almost broke me 🧵👇
Tweet media one
0
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
10/. Your turn: what’s the worst metadata bug you’ve faced?. Mine was a cancer subtype column where “Basal” appeared as:. “Basal”, “basal-like”, “Basal_Like”, and “BASAL”. What’s yours?.
1
2
2
@tangming2005
Ming "Tommy" Tang
21 hours
9/. Key takeaways:.Metadata is always messier than you think. Standardize early. Inspect everything. Cleaning isn’t busywork—it’s survival.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
8/. Every wasted hour I’ve had in bioinformatics traces back to one thing: ignoring data cleaning. Get this step right, and the rest flows.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
7/. This is detective work. Not glamorous, but essential. Before PCA or UMAP—real expertise starts with knowing your metadata.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
6/. Here’s how I clean them in R:.Use janitor::clean_names() to make columns consistent. Replace text placeholders with true NA using na_if(). Run unique() or table() to spot strange values. janitor::tbly() is your friend too.
1
0
2
@tangming2005
Ming "Tommy" Tang
21 hours
5/. If you don’t standardize them, filters fail. Joins drop rows. You’ll think your pipeline is wrong, when in fact your metadata is lying to you.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
4/. I’ve seen the same “missing” value represented as:. <NA>. [Not Available]. N/A.They look the same. But to R, they’re not. read more details here.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
3/. That was just the start. Metadata in bioinformatics is full of hidden traps. The most dangerous? Missing values.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
2/. See the problem?. One has underscores. The other has a dot. This tiny difference is enough to break your code.
1
0
1
@tangming2005
Ming "Tommy" Tang
21 hours
1/. I was using TCGAbiolinks to analyze TCGA data. The metadata for transcription subtypes had two column names:. paper_expression_subtype. paper_Expression.Subtype.
1
0
0
@tangming2005
Ming "Tommy" Tang
21 hours
Bioinformatics data is messy. Here’s the nightmare that almost broke me 🧵👇
Tweet media one
1
4
12
@tangming2005
Ming "Tommy" Tang
2 days
Accurate and scalable multi-disease classification from adaptive immune repertoires.
Tweet media one
0
3
29