As I’m writing this, I’m pausing to look over my laptop onto the very pretty Seattle skyline. It’s afternoon on the first Saturday of May and spring has taken over the city. My 4th floor apartment gets sunlight over 16 hours a day because of the floor to ceiling length windows and all I can see is a mix of grey, blue and green ahead of me. It’s costing me a bomb to stay here what with all the rents in Seattle going sky high but it’s worth being able to look at Mt Rainier everyday, presenting herself to us…

So you landed a job as a data scientist.

You explore really interesting data, come up with a hypothesis, build a model around it, solve the problem and make millions for your company!

This is misleading. This is not the whole picture. This is a mirage.

Let me walk you through some mistakes I made in the first year of my job as a data scientist so that you don’t make those in yours.

1. Not knowing the “kind” of data science my teams needs.

Sometimes, you need to press pause on personal interests and think about what’s best for the product or team rather than just personal progress.

When I…

With the movie Aladdin coming out in just a few days, I re-watched the old animated movie Disney had made for us in 1992. As someone who pays her own bills, is living in the 21st century and most importantly is a woman, I was flustered by the portrayal of Jasmine as a dainty princess with exaggerated feminine features who needs Aladdin to rescue her.

Instead of bashing this movie which probably created countless jobs and gave us great musicals for years, I wrote this post celebrating some women I’ve looked up to, growing up!

As a true 90s…

Just like every other immigrant who travels to the United States of America, I packed my bags on the 4th of September to leave India, my home of 23 years to pursue my graduate studies in Computer Science and Engineering at the University of Washington, Seattle.

I had no clue why I was leaving the comfort of my home, the warmth of my family and a well paying steady job only to rake up thousands of dollars in tuition loan. …

As part of building a recommendation system, I had to recently compute text similarity in a large corpus of data. Have a look if you’re interested in what goes on inside one of Spark’s string similarity algos.

Structure of this article.

  1. Motivation behind the experiment.
  2. Spark implementation of TF-IDF and why it matters.
  3. Comparison and Conclusion.


This wasn’t my problem statement but for the sake of simplicity, let’s assume that you have 2 columns of words like this and we had to match them.

List A       | List B
GOOGLE INC. | Google

For the past few weeks, I’ve been exploring the data streaming module on Spark at work and it only seemed fitting to write a post about this fantastic tool.

Let’s assume we have a data stream that we’ve gotta clean it up, store it in a sensible way and compute something out of. For the sake of simplicity, let’s take a bunch of messages that are coming in from a server and you’re listening to it on a TCP socket. What’s your goto game plan?

Write a simple python program, collect the data, clean it, and then do whatever seems…

Nimisha Sharath

Data Scientist, Microsoft. Podcast Host, Seattle. Bangalore.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store