Color Pie Prediction



Methodology

Technical description for mtg-color-0.1

Overview

For this version of the model we went with a very simple and naive approach, just to get the ball rolling. The high level overview is that we trained a word2vec model on the rules text of all cards as sentences, and then used those word vectors as an embedding input into a stacked LSTM with a simple feed-forward classifier.


Data Source

We used the AllCards dataset available from mtgjson.com


Data Wrangling

We did very little data wrangling for this version of the model, mostly for the purpose of getting something deployed and having a benchmark for future iterations.

The minimal processing we did was lowercase all the text, and then split on punctuation and whitespace (keeping punctuation) to get our word tokens. A word2vec model was trained using gensim. All cards that had any rules text were included. This includes cards from sets that intentionally break the color pie, as well as non-legal sets (such as Unglued/Unhinged/Unstable).


Model Training

We did an 80/20 split on the data and trained a stacked LSTM using Tensorflow. The model came out ~76% accurate on the test set, which is great for a minimal-effort first pass. If you're interested in more nitty-gritty details please reach out.