Week 1
Programming Languages
This is just a short update about what I've done so far. In my plan the first week was allocated to general planning and research. I looked into machine learning frameworks for the various languages I considered, focusing on Python, JavaScript, F# and Haskell. I've decided against F# and Haskell because it's unnecessary extra complexity: I want to minimise the number of new concepts I have to learn in this project to the bare essentials. LSTM networks are new to me, as is malware analysis in general, and I also need to learn a deep learning framework. F# is a new language, and I can't write Haskell as intuitively as I can the imperative languages I know. So I'm now between JavaScript and Python. I'm currently learning TensorFlow JS which provides a native code LSTM implementation that should be very fast (it can run on GPU). There is also TensorFlow for Python, but I would rather write JS all else being equal. The TensorFlow website says that the Python version is 1.5-2x faster than JavaScript in the browser [1], but it says nothing about using Node with the native or GPU binding. Since Node tends to be a lot faster than Python [2], I'd imagine Tensorflow JS in Node, without the browser overhead, is faster. (If I have time I'll implement a model in both and benchmark them myself.)Books
I bought the book Practical Malware Analysis by Michael Sikorski. It's interesting and I've learned a few things about ways malware tries to evade detection, but I'm not sure how I'll apply this to the project yet. It might be best to encode these methods as features and have the model learn the features that correlate with a file containing malware.
Git Repository
I've created a Git repository on Github: https://github.com/ChrisSwinchatt/MalwareLearningMachine
Comments
Post a Comment