Posts

Showing posts from June, 2018

Week 1

Programming Languages This is just a short update about what I've done so far. In my plan the first week was allocated to general planning and research. I looked into machine learning frameworks for the various languages I considered, focusing on Python, JavaScript, F# and Haskell. I've decided against F# and Haskell because it's unnecessary extra complexity: I want to minimise the number of new concepts I have to learn in this project to the bare essentials. LSTM networks are new to me, as is malware analysis in general, and I also need to learn a deep learning framework. F# is a new language, and I can't write Haskell as intuitively as I can the imperative languages I know. So I'm now between JavaScript and Python. I'm currently learning TensorFlow JS which provides a native code LSTM implementation that should be very fast (it can run on GPU). There is also TensorFlow for Python, but I would rather write JS all else being equal. The TensorFlow website says ...

Plan

Time, Time, Time With the deadline falling on the 4th September 2018, there are roughly 12 weeks to complete the project from the time of writing. Treating this as a full-time job gives 40*12= 480 hours . The dissertation is 12,000 words; if it takes an hour to write and edit 100 words, then I will spend 120 hours (3 weeks) writing the dissertation, leaving 360 hours (9 weeks) to design & implement the software. I also won't count writing this blog against my 480 hours, as I can do it on the weekends. I'm going to spend the first week planning, reading, learning and prototyping. After that I will have 8 weeks of development time and three weeks of QA and dissertation-writing. Project Goal and Breakdown Goal of the project Predict whether a given executable file contains malware. Steps Predict whether a binary contains a valid program  Disassemble a binary and produce valid disassembly (this is the minimal viable product) Predict whether the disassembled prog...

Introduction

Hi, I'm Chris, a postgrad student of artificial intelligence (officially Intelligent and Adaptive Systems ) at the University of Sussex. This blog is going to serve as my notes for my dissertation project, which is exploring automatic classification of files as malware or non-malware based on machine learning models. I'm interested in many computer science topics, including systems programming, security and, of course, AI, so this project is my attempt to kill two (or three) birds with one stone. This post will give an overview of what I plan to do with this project. Just a quick note on terminology: malware is what the average person would call a (computer) virus. Technically a virus is just one type of malware (one which is capable of spreading by itself) so I will generally use the word "malware" instead, which is any software designed to cause harm to a computer. If I do use the word "virus", assume it's used interchangeably with "malware...