For many years I have telling myself and everyone that I want to join Kaggle. I was quite afraid that I couldn’t start this from anywhere. I think Ken Jee’s video has motivated me to start now. One of his video explains the projects that I should start off with. That one was the House Pricing Prediction project and I have gained some good starts from learning in that project. After that I’ve got the gist on how to participate in Kaggle, and I feel like the first competition is a good way to see how others do, because you can then compare to your submission.
The first competition is called Riiid! Answer Correctness Prediction, which is to predict how student’s answer a question and if they get correct. Which is an interesting topic to me, and I have keen interest in knowledge tracing. The competition requires a predicted test dataset for submission and the AUC is calculated. Long story short, I would provide a summary in bullet points:
- I found there is a tool to report EDA is a nicer and concise way, which is called Pandas profiling. It tells you everything from missing data, to correlation with other variables.
- I used LighGBM to train the model, which is something when I browsed other notebooks.
- Originally I though random forest might work, but seems I sticked to this one now.
- There isn’t much feature engineering happening in my notebook, which is something I should look into. The EDA I’ve done isn’t sufficient to give me insights.
- It is my first time to do a competition, and submitting in a real competition took me quite some time figure out what is happening. For example, when the compilation will use too much RAM and CPU power, the script will stop running and I have never realise that. I didn’t know that there is a module from the organiser for the competition.
So I’ve submitted my first version for the competition. If time permits, I may do some improvements. Without further, I should let you see how I go in below.