How I got selected in GSoC 2019

How I got selected in GSoC 2019

I am really pleased to announce that I have been selected under the Google Summer of Code 2019 and would be working with CDLI (Cuneiform Digital Library Initiative) in making a Multi-Layer Annotation Querying Tool.

About My Organization (CDLI)

The Cuneiform Digital Library Initiative (CDLI) is a joint project of the University of California Los Angeles, the University of Oxford, and the Max Planck Institute for the History of Science, Berlin. It is an international digital library project aimed at putting text and images of an estimated 500,000 recovered Cuneiform Tablets created from between roughly 3350 BC and the end of the pre-Christian era online.

A little about myself

I am currently a third-year Computer Science Undergraduate at the Indian Institute of Information Technology, Sri City, Chittoor. Previously, during last summer vacations, I was a Research Intern at IIIT Hyderabad and was working in the field of Natural Language Processing(NLP). I along with my team worked on building a Deep Learning based Language Independent Named Entity Recognition model. This year I have been selected for GSoC under CDLI (Cuneiform Digital Library Initiative) and I would be working on building a Multi-Layer Annotation Querying tool.

My Inspiration and Motivation

When I joined my college, some of the seniors in my college had already cleared GSoC and within some time, I realized the value and power of GSoC. It is considered to be one of the most prestigious programmes. Also, the stipend of $3000 offered to students, fascinated me. So clearing GSoC had become one of my dreams.

During last year, summer vacations, while I was interning at IIIT Hyderabad, I was amazed and shocked to see a lot of people wearing the GSoC shirt. In my hostel room, I saw some stickers of open source organizations like FOSSASIA and MOZILLA on my table. I was really thrilled to see all this. Then after some time I came across these statistics and found that IIIT Hyderabad ranks second across the globe in the number of students selected for GSoC. That really pumped me up to work for open source and for GSoC. All this collected in me and my desire to go for GSoC started going up.

How I worked on it

My journey of Open Source Development is not linked with a single Organization. I had been jumping from one Organization to another. I have been a contributor to AIMA-Code, CLTK and had a detailed proposal for CDLI as well.

December 2018, the start of the journey

AIMA-Code (Artificial Intelligence: A Modern Approach)AIMA-Code (Artificial Intelligence: A Modern Approach)

This was time, during my winter break, and I was looking for various Open Source Organizations to start contributing. Initially, I planned to go with Mozilla, since it is a big Organization and it takes a lot of students. I was almost frustrated since I was not even able to start it. I wasn’t even able to set up the development environment of Mozilla. After this, I had a talk with one of my seniors and he recommended me to start with AIMA-Code. Since I had prior experience with Artificial Intelligence and Machine Learning, I was able to get up with this Organization. The Organization maintains the codebase and tutorial notebooks for the book Artificial Intelligence: A Modern Approach — by Dr. Peter Norving. The development environment for this organization was easy to set up and I was able to find some of their bugs in some of the applications which they had built. Since I had found only small bugs they were approved and merged without much delay. And that excited me up more. And I started to contribute more and more, by adding small test cases, improving the tutorials in the Jupyter Notebooks and many more things. Also, since at the time there were not many people in the Organization, I became the top contributor to the Organization and that drove me further.

January, the work continues but with a doubt

During this period, my college semester had started. Due to this, my contribution frequency fell a bit. But I was able to work by adding some more code and tutorials to the codebase. I was still the top contributor for the previous 1 month. By the end of January, I had approx 10 Pull Requests(PRs) merged in AIMA-Code.

But, I felt that something was wrong. I knew my performance was not that good for the month, still, I was the top contributor. And, I saw that the organization didn’t have many contributors even at the end of January. On the other hand, other organizations were flooded with Contributors. Only, I along with 1–2 other students were working in the Organization which meant the Organization was not much active.

So, I contacted one of the previous years GSoC student of the organization and expressed my concern. He was the one who used to approve all the Pull Requests(PRs). He told me that the Organization is not really active as it was last year and due to this the Organization might not even get selected for GSoC. He told me to try for other Organizations parallelly in order to be on the safe side. Similar things were instructed to me by seniors in my college.

February, in search of a new Organization

CLTK (Classical Language Toolkit)CLTK (Classical Language Toolkit)

Many people told me that it was already late to search and try for a new Organization now. Still, I tried with a different Organization to at least have some contributions in my hand when the GSoC Organizations are declared. Since I had prior experience in the field of NLP, I started liking the CLTK project since they were working on building modules similar to NLTK, but for non-English languages. Also, this Organization was much more active. But with my mid-term coming, I wasn’t able to work too much. I just was able to get 1 small Pull Request(PR) merged.

And finally, February 26, the day when GSoC Organizations were announced. And to my complete surprise, AIMA-Code got selected in GSoC 2019, but on the other hand, CLTK got rejected. I was completely shocked by what had happened.

March, back to the previous one

So, now I started contributing back to AIMA-Code. By this time, some new people had already joined AIMA-Code and already had submitted a big PRs. But, I had more count on my side, although they were small PRs, like making the Code compatible to work with the older version of Python, correcting some part of the code, adding a few test cases, etc. So, I also planned to go big and wrote a few long tutorials along with its coding application explaining new concepts.

After a few days, my long PRs got reviewed and I was almost shocked to see that they had requested changes at approximately 20 places in each of the 2 PRs. I wasn’t able to believe how silly mistakes I had done and how did they get the patience to review a PR which had got so many errors in the tutorial part :). Still, I was happy, and then updated my PR and resubmitted it for review.

I wrote a proposal for AIMA-Code. In AIMA-Code, it is suggested not to write the project timeline since it is decided by the mentor himself after selection. We just have to show our previous contributions and interest for AI to the organization. Some of the previous year proposals of selected students were as small as 2 pages. So, it didn’t take me much time to write the proposal for AIMA-Code. I wrote 2 proposals for 2 projects in AIMA-Code i.e. for AIMA-Code Python and AIMA-Code Example Notebooks.

But as the days were passing, the crowd was increasing exponentially in AIMA-Code. A friend of mine suggested me to even try for research Organizations and other Organizations which do not require any contribution and select only on the basis of the quality of proposal and profile of the student. Since he was also in the field of NLP and he had also been a GSoC Intern last year, he suggested me to have a look and contact mentors at CDLI, a research Organization in which he worked last summer.

CDLI (Cuneiform Digital Library Initiative)CDLI (Cuneiform Digital Library Initiative)

On having a talk and discussion on certain projects, I planned to submit a proposal for CDLI as well, since I really liked the project. The complete project process was explained to me and clarified to me by the mentors in the organization. I had finalized on the project “Multi-Layer Annotation Querying”. Here, I had to make a tool which can query through multiple layers of linguistic annotations of a sentence using SPARQL.

April, the rush for proposal submission

So, with the deadlines of the proposal arriving, I had to rush on writing the proposal. I started working on the proposal and made a basic structure containing all the steps and details and submitted it for review in the CDLI Organization. It was reviewed with great detail by the mentors and a long list of improvements was given to me. After improving on the points, a second review was again done on my proposal by another mentor. And I was asked to make a small demo application and include it in the proposal. I added a basic working model of the project and added it to the proposal. By this time, it was almost the final deadline. After completing everything, I submitted the proposal, I submitted the proposal for review again, one night before the submission of the proposal. And finally, it was reviewed by another mentor.

I am really thankful to all the mentors of CDLI for helping me in understanding the project and reviewing my proposal repeatedly and helping me in improving it.

After Proposal Submission

Since there were so many ups and downs, I was not at all sure of my selection in GSoC. Since I was in the pre-final year of my degree, my B.Tech project was lagging a lot and I had to cover that up too. I planned to search for an internship, just to be on the safe side.

Yes, I did a mistake from my side, by stopping to contribute after submission of my proposal. But my proposals already consisted of approximately 20 PRs, so even if I made a PR, I couldn’t add it in my proposal.

Results

So, finally, the result date had arrived i.e. 6th May. I was so much scared that I didn’t even check, the time of result announcement. I thought that the results would be announced on 6th May at 12:00 midnight IST and they would send a mail at that very time to all the selected students confirming their selection.

But, I was surprised when a friend congratulated me at 11:30 PM. I thought he was playing a prank on me. And then, I opened up my laptop to see the result. It was unbelievable to me. I had got selected in CDLI, the proposal which I had submitted at the end. I wasn’t even sure even after 12 AM because I hadn’t got the confirmation mail. On confirming, I got to know that it takes a few hours for the mail to arrive. I wasn’t able to sleep and was just waiting for the confirmation mail whole night.

My GSoC Dashboard after getting selected.My GSoC Dashboard after getting selected.

Side Effects

Since my complete semester revolved around the work of GSoC, my academic grades suffered for it :(.

Credits

I am really thankful to all my friends, seniors, professors, and my family who supported me throughout the journey and also constantly motivated me to go for GSoC. I am also thankful to the mentor of the organizations who helped in making the problem statement easy to understand and interesting.

I am also thankful for the YouTube Channel I.O. Stream, which constantly guided various steps in GSoC through his videos and a dedicated playlist just for GSoC.