What I Learned During My Summer Internship at Cloudera

The purpose of this post is to link you to a post I wrote for Cloudera’s blog:  What I Learned During My Summer Internship at Cloudera

Reprint:

Catherine Ray, a Summer Intern at Cloudera this year, was kind enough to summarize her experiences for you below. Best of luck in your new field, Catherine!

I’m currently 16 and a rising senior at George Mason University, majoring in Computational Physics. (The full title is Computational and Data Sciences with a concentration in Physics.).

I had a wonderful time working on my project. In short, I worked on an Apache Hadoop-based downloads tracking system. In this system, raw downloads logs are ingested via Apache Flume into HDFS, then parsed with an MR Job into a Cloudera Impala-friendly format. I had the opportunity to collaborate with one of our teams in New York to pull the whole system together. To fully utilize the data contained in the logs, I created a Java library that finds the organizational information associated with a given IP address. I also helped to create dashboards that use queries against the collected data to analyze it and produce sales leads.

As my internship came to an end, I was able to use a skill I developed through one of my many hobbies: making YouTube videos. Specifically, when faced with the task of creating my intern presentation, I ditched my PowerPoint and made a video in order to explain my project in a more engaging format. The video below describes the system I created this summer.

Before I began my internship, I worried that I would encounter a specific obstacle in enjoying experiences that has repetitively appeared in my past. I worried that I wouldn’t be taken seriously due to my age. After meeting my fellow interns and conversing with my mentor, my fears were quickly assuaged. The Cloudera community was truly accommodating; I was treated as the other interns were treated: just like a full-time employee.

My experience at Cloudera revealed a perspective previously hidden to me. (I also had amazing discussions with brilliant people over lunch and in the hallways on a regular basis.) Here, it is well known that one can find success at being both a scientist and engineer. The best data scientist has both curiosity and passion of a scientist faced with an unsolved problem, and the methodology and efficiency of an engineer given the task of implementing an effective solution. This realization has convinced me to pursue graduate studies in computer science, instead of narrowing my future studies to the applications of computer science in physics.

I also learned coding tricks and new ways of thinking in programming languages I thought I knew well. I fell in love with regular expressions; I learned the practices of documenting code and creating readable source. Outside of computer science, I learned how to ride a RipStick (a skateboard variation), I learned the art of collaboration, and I experienced a strong sense of community. My mentor was happy to answer any questions I had in detail, and our code review sessions completely changed the way I think about object-oriented programming.

I can’t thank my mentor (Aditya Acharya) enough for the time he devoted to answering all of my questions; I learned an incredible amount from him. The members of the teamswith which I worked were similarly kind, accommodating, and resourceful. My fellow interns were extremely friendly and helpful — competition did not taint our interactions.

All in all, a very successful summer.

Written on August 24, 2013