At Lazada’s Data Science team, I use Spark a fair bit, especially when the data gets big (e.g., online behavioural and transaction data). While PySpark, the Python API for Spark was available when I started, I decided early on to code in Scala. Perhaps I relished the challenge or just wanted to pick up a new language.
Before the course, my programming skills in Scala were mainly self taught, through the school of hard knocks and stackoverflow. Thus, when the course was made available on Coursera, I saw the opportunity to learn about the fundamentals of Scala (away from Spark) and its syntax in a structured fashion.
The course is taught by Martin Odersky, designer of the Scala programming language. It follows a structure commonly found in MOOCs—approximately two hours of lectures (more theoretical) and a lab assignment taking three hours (more practical) weekly.
Over the course of six weeks, Martin taught about:
I found the main challenge not to be Scala’s syntax, or working with a compiled language. Rather, the main challenge was thinking through the logic of solving problems through recursion. While I’ve wrote recursive algorithms before, I haven’t quite grokked it yet.
In the course, almost all assignments were solved through tail-recursion. At work, I mostly think about data in the form of tables, strings, or graphs—solving problems recursively doesn’t come up much.
While the course focused on the scala language and functional programming paradigm, I gained two other lessons that I value just as much.
There was a lot of emphasis on a key software engineering practice—testing (using ScalaTest). Beginning in week one, the practice of writing unit tests was taught and encouraged. Throughout the course, Martin actively shared about edge cases in the code, and how they can be formalized and easily checked consistently in a unit test.
The lab assignments progressively taught more sophisticated ScalaTest methods, and how to test more effectively. Improving on the practice of testing will make my code more robust, my work more efficient, and me a better data scientist in the long run.
I also gained practice in breaking problems down and solving them through tail recursion. I’ve come across user-defined functions in Spark that lead to a stack overflow error when executed. Putting in additional thought and writing them in a tail recursive fashion fixed this issue and also led to efficiency improvements. Nonetheless, I’ll probably won’t be actively thinking about recursive solutions at work unless absolutely necessary.
The course was excellent for learning about the thinking that went into the design of Scala as a functional language, and how to use Scala more effectively.
At the concluding lecture, Martin recommended additional learning resources. Two are worth highlighting here. First, there’s the Scala School by Twitter that covers the basics, collections, simple build tool (SBT) and more. Martin also recommended the Scala Exercises by 47 Degrees that covers more features of Scala through solving simple exercises in the browser interactively. I find Scala Exercises to be more practical and likely to improve my software engineering skills in Spark more.
I highly recommend this short six-week course if you would like to learn the basics of Scala from the designer of Scala himself. Martin is fantastic teacher and taught effectively through online videos and lab assignments. The forums were also very helpful. Here, you’ll find people who are stuck on the same problem as you are, and teaching assistants providing helpful hints.
Questions? Want to follow my journey? Reach out on Twitter @eugeneyan!
I write about how to be effective in data science, learning, and career. Get weekly updates.
Welcome gift: A 5-day email course on How to be an Effective Data Scientist 🚀