Posts Tagged ‘ Python

Writing Files From GridFS To An Archive Without Writing To Disk First

I came across a situation where I needed to generate a bz2 compressed archive of a bunch of files extracted from GridFS. This process is going to occur regularly, so I had to take into consideration performance hits against the server. I felt it would be best if I could take the data as it is extracted from GridFS and write it directly to the compressed archive, instead of writing each file to disk first, and then adding it to the archive.

Python has a library for generating tarballs called tarfile. This was actually very useful since you can write files directly to a bz2 compressed archive. The issue I ran into is that in order for the data coming out of GridFS to be written to the archive, it had to be written as if it were a file (with file attributes). Using any sort of file IO would force me to write to disk, and that’s not what I wanted to do.

Luckily there is StringIO. Using this in combination with tarfile’s TarInfo object, this became a very easy task to accomplish:

import tarfile
import time
from StringIO import StringIO

tar = tarfile.open("sometarfile.tar.bz2", "w:bz2")
for file in gridfs_files:
    info = tarfile.TarInfo(name="%s" % file.name)
    info.mtime = time.time()
    info.size = len(file.data)
    tar.addfile(info, StringIO(file.data))
tar.close()

The basic idea is you use TarInfo to specify the filename, size, modified time (this is important otherwise tar will complain when the date is older than epoch), etc. You use StringIO to turn your data into an object tarfile will accept, and you use the two to add the file to the archive. This works really well, except for one issue that I am still working on. If you bunzip2 the compressed archive, and then attempt to do a tar -t on it, it hangs and does nothing. It’s possible that gnu tar has a problem, or that the way tarfile is creating the file isn’t correct, but it does decompress and explode properly which is the important part!

Presentation at CarolinaCon 7

I had a great time presenting at CarolinaCon 7 this year! We had over 200 people at the conference, so it was great getting to give a talk to that many people on what I consider to be a fun topic.

If you are interested in the presentation, you can find the slides here. If you’d like to see the examples, here’s the list:

CarolinaCon 7

It’s that time of year again! Tonight marks the start of CarolinaCon 7. If you’ve never been, it is a great little tech conference put on by nc2600. It’s grown over the years and is now up to around 200 attendees. I’ve been very lucky to be around for the last 3 (including the one this weekend) and have also been extremely lucky to have the honor of giving a talk at each one.

This year I will be giving a talk on Malware Identification and Classification. Specifically, I will be showing how to do this using Yara and Python. Since malware has become a major problem and is exploding in growth, I thought it would be a great topic to talk about. If you’re in the Raleigh area and want to attend, the conference is extremely cheap to get into, and it gives you access to an entire weekend full of over 15 talks, trivia, lock picking, capture the flag, and more! If you don’t get a chance to attend, but are still interested in my talk, I’ll be posting the slides and demo content after I’m done presenting. My talk is at 2pm tomorrow. Also on Sunday Gerry Brunelle will be giving a talk on Malware Analysis, which fits in beautifully with my talk. Between those two talks, you should have a great intro into the world of Malware.

Hope to see you there!

IndentationError: unexpected indent

Tabs and spaces are code killers, visually and syntactically. Dealing with Python, you might see the error in the title often when sharing code with others. Focusing on VIM, people love to setup their rcfile so they get the most out of what VIM has to offer, but also makes their code easy to write and easy to read. Unfortunately, many times it’s just easy to read for them and no one else.

Python is very picky about spacing and indentation since it’s used to determine the flow of your script or program. Miss a tab or series of spaces somewhere and all the sudden you are executing code that’s not inside the if statement you just made, or you’ll get errors like above because your code just flat out fails.

When dealing with Git, and having several people committing code to a similar script or project, you are bound to run into situations where tabstops and shiftwidths are in conflict. It may look like all of the code lines up visually, but where one person uses spaces, another may have used actual tabs. Then you notice they made over 100 changes to the code you’re working on, and there’s tabs and spaces mixed all over the place. You have the following in your .vimrc:

set expandtab
set tabstop=4
set shiftwidth=4

It would be great to fix all of the tabs and spaces to be the same. Now what? VIM to the rescue! Bare with me, this is a bit complicated. Once you open the file in VIM, execute the following:

:%retab

You’re done, by the way. This uses your settings, and converts all of the tabs to the settings in your .vimrc. Now your entire file is standardized as far as your spacing goes and you can say goodbye to your unexpected indent errors. If you find yourself having to do this often, you can bind a key to automagically make the changes for you by adding this into your .vimrc (using F2 in this example):

map <F2> :retab <CR> :wq! <CR>

This will convert all of the tabs to spaces and save the file for you. Enjoy!