goru001/code-with-ai
Interface for people to use my model which predicts which techniques one should use to solve a competitive programming problem to get an AC
repo name | goru001/code-with-ai |
repo link | https://github.com/goru001/code-with-ai |
homepage | https://code-with-ai.app.render.com/ |
language | Jupyter Notebook |
size (curr.) | 18097 kB |
stars (curr.) | 129 |
created | 2019-01-15 |
license | MIT License |
Code with AI
Code with AI is a tool which predicts which techniques one should use to solve a competitive programming problem to get an AC
Tool: https://code-with-ai.app.render.com/
Usage:
- Copy paste the problem statement
- Hit Enter :)
-
This tool was very well received by competitive programming community. Check out my blog post on codeforces here
-
This tool was also shared by Jeremy Howard on Twitter. Check out his tweet.
-
Some Analytics: 3300+ unique users have used this tool, till June 2019 since its launch.
Dataset
I scraped the dataset from Codechef and Codeforces, the scripts for which along with the dataset you can find in ulmfit-model/competitive-programming-websites-scripts
The dataset on which model has been trained has 92 classes, which are:
'mobius-function', 'queue', 'greedy', 'suffix-array', 'bitmasks', 'digit dp', 'lca', 'gcd', 'probabilities', 'combinatorics', 'graph matchings', 'easy-medium', 'precomputation', 'sprague-grundy', 'math', 'centroid-decomposition', 'link-cut-tree', 'expression parsing', 'constructive algorithms', 'medium', 'schedules', 'euler tour', 'easy', 'challenge', 'implementation', 'binary search', 'matrices', 'two pointers', 'dfs', 'dp+bitmask', 'sets', 'sqrt-decomposition', 'dijkstra', 'line-sweep', 'data structures', 'tree-dp', 'hard', 'mst', 'recursion', 'games', 'suffix-trees', 'kmp', 'stack', 'brute force', 'medium-hard', 'prefix-sum', 'graphs', '2-sat', 'shortest paths', 'heavy-light', 'heaps', '*special problem', 'trees', 'array', 'sliding-window', 'inclusion-exclusion', 'meet-in-the-middle', 'dfs and similar', 'sortings', 'pigeonhole', 'xor', 'gaussian-elimination', 'lucas theorem', 'divide and conquer', 'flows', 'strings', 'matrix-expo', 'number theory', 'bipartite', 'knapsack', 'sieve', 'ternary search', 'modulo', 'backtracking', 'treap', 'trie', 'dp', 'fenwick', 'observation', 'fibonacci', 'convex-hull', 'chinese remainder theorem', 'string suffix structures', 'geometry', 'lazy propagation', 'factorization', 'dsu', 'fft', 'segment-tree', 'hashing', 'bfs', 'prime', 'mo algorithm'
These classes are a union of tags which codeforces and codechef have! You can check out more details about data preparation in ulmfit-model/code-with-ai-data-preparation.ipynb
Model
You can check out the code for model in notebook ulmfit-model/code-with-ai.ipynb
Basically, the classifier is built starting from Wikitext103
Language Model pretrained weights,
fine-tuning the language model on current dataset and then building classifier on
top of fine-tuned language model.
Results
The classifier has a F1 Score of ~49.
TODOs
-
Try improving the score further by using bidirectional RNN
-
Try improving the score further by using an approach similar to DeViSE paper, i.e instead of training the model to predict O or 1, train the model to go closer towards the embedding vector representation of labels - the intuition behind this is that labels in competitive programming like graph, dfs, bfs etc aren’t disjoint.