How to Rewrite Pre-Training Data for Boosted Math/Code Performance?
Rewriting pre-training data is one of the most effective ways to improve how language models perform in math and programming tasks. It doesn’t just clean up the data—it makes it smarter. By restructuring examples, adding step-by-step logic, fixing formatting issues, and improving clarity, you give…