September 20, 2019

421 words 2 mins read

goru001/code-with-ai

goru001/code-with-ai

Interface for people to use my model which predicts which techniques one should use to solve a competitive programming problem to get an AC

repo name goru001/code-with-ai
repo link https://github.com/goru001/code-with-ai
homepage https://code-with-ai.app.render.com/
language Jupyter Notebook
size (curr.) 18097 kB
stars (curr.) 129
created 2019-01-15
license MIT License

Code with AI

Code with AI is a tool which predicts which techniques one should use to solve a competitive programming problem to get an AC

Tool: https://code-with-ai.app.render.com/

Usage:

  1. Copy paste the problem statement
  2. Hit Enter :)

Alt Text

  • This tool was very well received by competitive programming community. Check out my blog post on codeforces here

  • This tool was also shared by Jeremy Howard on Twitter. Check out his tweet.

  • Some Analytics: 3300+ unique users have used this tool, till June 2019 since its launch.

Dataset

I scraped the dataset from Codechef and Codeforces, the scripts for which along with the dataset you can find in ulmfit-model/competitive-programming-websites-scripts

The dataset on which model has been trained has 92 classes, which are:

'mobius-function', 'queue', 'greedy', 'suffix-array', 'bitmasks', 'digit dp', 'lca', 'gcd', 'probabilities', 'combinatorics', 'graph matchings', 'easy-medium', 'precomputation', 'sprague-grundy', 'math', 'centroid-decomposition', 'link-cut-tree', 'expression parsing', 'constructive algorithms', 'medium', 'schedules', 'euler tour', 'easy', 'challenge', 'implementation', 'binary search', 'matrices', 'two pointers', 'dfs', 'dp+bitmask', 'sets', 'sqrt-decomposition', 'dijkstra', 'line-sweep', 'data structures', 'tree-dp', 'hard', 'mst', 'recursion', 'games', 'suffix-trees', 'kmp', 'stack', 'brute force', 'medium-hard', 'prefix-sum', 'graphs', '2-sat', 'shortest paths', 'heavy-light', 'heaps', '*special problem', 'trees', 'array', 'sliding-window', 'inclusion-exclusion', 'meet-in-the-middle', 'dfs and similar', 'sortings', 'pigeonhole', 'xor', 'gaussian-elimination', 'lucas theorem', 'divide and conquer', 'flows', 'strings', 'matrix-expo', 'number theory', 'bipartite', 'knapsack', 'sieve', 'ternary search', 'modulo', 'backtracking', 'treap', 'trie', 'dp', 'fenwick', 'observation', 'fibonacci', 'convex-hull', 'chinese remainder theorem', 'string suffix structures', 'geometry', 'lazy propagation', 'factorization', 'dsu', 'fft', 'segment-tree', 'hashing', 'bfs', 'prime', 'mo algorithm'

These classes are a union of tags which codeforces and codechef have! You can check out more details about data preparation in ulmfit-model/code-with-ai-data-preparation.ipynb

Model

You can check out the code for model in notebook ulmfit-model/code-with-ai.ipynb

Basically, the classifier is built starting from Wikitext103 Language Model pretrained weights, fine-tuning the language model on current dataset and then building classifier on top of fine-tuned language model.

Results

The classifier has a F1 Score of ~49.

TODOs

  • Try improving the score further by using bidirectional RNN

  • Try improving the score further by using an approach similar to DeViSE paper, i.e instead of training the model to predict O or 1, train the model to go closer towards the embedding vector representation of labels - the intuition behind this is that labels in competitive programming like graph, dfs, bfs etc aren’t disjoint.

comments powered by Disqus