<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet title="CSS_formatting" type="text/css" href="http://www.interglacial.com/rss/rss.css"?>
<!-- name="generator" content="loathsxome/0.99" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91"><channel>
<link>http://research.xvx.ca</link><title>Research Log</title>
<description>What Adam Wolfe Gordon is Thinking About</description><language>en</language>

<item><title>A Batch Script for Hadoop on VMs</title><link>http://research.xvx.ca/hadoop-batch-script</link><description>&lt;p&gt;
I&apos;ve been benchmarking modifications to Hadoop in virtual machines lately, with
others using the same server to benchmark their code.  Paul suggested I should
write a batch script we can use to run our benchmark jobs with the PBS batch
scheduler, which is already set up on our server.  The trickiness is that
ideally we should not be running VMs when we&apos;re not using them, to avoid memory
and cache contention.  So, I wrote the following script, which starts the VMs,
starts Hadoop, runs Hadoop commands from a file, shuts down Hadoop, and shuts
down the VMs.
&lt;/p&gt;

&lt;pre&gt;
#!/bin/zsh

#PBS -N hadoop_job
#PBS -l nodes=1:ppn=8
#PBS -V

###############################################################################
#                                                                             #
# submit_hadoop.sh                                                            #
# Adam Wolfe Gordon, June 2010                                                #
#                                                                             #
# Usage: ./submit_hadoop.sh                                                   #
#                                                                             #
# Starts hadoop in virtual machines, runs the hadoop commands from a file     #
# called commands in the current (or PBS working) directory, then shuts down  #
# hadoop and the virtual machines.                                            # 
#                                                                             #
# Useful for batch scheduled submission of VM-based hadoop jobs when          #
# benchmarking on a shared system.                                            #
#                                                                             #
# Relies on some zsh-isms, so probably don&apos;t run it with another shell.       #
#                                                                             #
###############################################################################

# Set these to where hadoop lives, and where your VMs live.
# Your VMs must be started by a script called run in $VM_HOME.
HADOOP_HOME=/local/data/awolfe/hadoop_stuff/hadoop-0.20.2/hadoop
VM_HOME=/local/data/awolfe/hadoop_stuff/ubuntu_vms

# If we were submitted with qsub, then go into our work directory.
if [ -n $PBS_O_WORKDIR ]; then
    cd $PBS_O_WORKDIR;
fi;

# Make sure the commands file exists.
if [ !-e commands ]; then
    echo &quot;commands file not found. Aborting.&quot;
    exit 1;
fi;

# Start the virtual machines
$VM_HOME/run;
sleep 2;

# Wait for them to come up
for i in $(cat $HADOOP_HOME/conf/slaves); do
    while true; do
        nc -w0 $i 22;
        if [ $? = 0 ]; then break; fi;
    done;
done;

# Format the HDFS, since it will have gone away on the VMs.
yes Y | $HADOOP_HOME/bin/hadoop namenode -format;

# Start hadoop
$HADOOP_HOME/bin/start-all.sh;

# Wait hadoop to come up
for i in $(cat $HADOOP_HOME/conf/slaves); do
    # HDFS
    while true; do
        nc -w0 $i 50010;
        if [ $? = 0 ]; then break; fi;
    done;
    while true; do
        nc -w0 $i 50075;
        if [ $? = 0 ]; then break; fi;
    done;
    # MR
    while true; do
        nc -w0 $i 50060;
        if [ $? = 0 ]; then break; fi;
    done;
done;

# Now we can run our job
export TIMEFMT=&quot;TIME: %J -- %U user %S system %P cpu %*E total&quot;;
(cat commands |
 while read cmd; do
    d=&quot;/usr/bin/time -f &apos;%C -- %U user %S system %P cpu %e total&apos; $HADOOP_HOME/bin/hadoop $cmd 2&gt;&amp;1&quot;;
    eval &quot;$d&quot;;
 done;
) 2&gt;&amp;1 &gt; output.txt;

# Shut down hadoop
$HADOOP_HOME/bin/stop-all.sh

# Shut down the VMs - this requires passwordless ssh and passwordless sudo for /sbin/halt
for i in $(cat $HADOOP_HOME/conf/slaves); do
    ssh $i sudo /sbin/halt;
done;
&lt;/pre&gt;
</description></item>
<item><title>Organizing Papers on Linux</title><link>http://research.xvx.ca/organizing-papers</link><description>&lt;p&gt;As I read papers, I try to keep an up-to-date bibliography.  This is party to
keep track of what I&apos;ve read, and partly so that when I&apos;m writing papers or my
thesis later, I already have the bibliography ready to go.  I used to do this by
manually updating a BibTeX file as I went, but this is a bit onerous and not
much fun.&lt;/p&gt;

&lt;p&gt;After searching a bit, I found &lt;a
href=&quot;http://www.mendeley.com&quot;&gt;Mendeley&lt;/a&gt;, which is an online tool and desktop
application for managing the papers you read and generating bibiographies as
needed.  It imports and generates BibTeX, lets you add papers by adding the
PDF and letting it search Google Scholar for the title or by using a bookmarklet
in your browser, and syncs between the website and the desktop application so
you always have your bibliography and a copy of your papers available.&lt;/p&gt;

&lt;p&gt;As an added bonus, it lets you hilight and annotate PDFs, which nothing else
in Linux does very well.  It&apos;s available for Mac, Windows, and Linux, plus the
website works anywhere, so it&apos;s really quite handy.&lt;/p&gt;
</description></item>
<item><title>Hello</title><link>http://research.xvx.ca/hello</link><description>&lt;p&gt;
Hi there.  I&apos;m Adam Wolfe Gordon, a masters student
in &lt;a href=&quot;http://www.cs.ualberta.ca&quot;&gt;Computing Science at the University of
Alberta&lt;/a&gt;.  My supervisor, &lt;a href=&quot;http://www.cs.ualberta.ca/~paullu&quot;&gt; Paul Lu&lt;/a&gt;, likes
to say that there are three aspects to research: reading, thinking, and doing.
Like (I suspect) many CS students, I&apos;m best at the doing part: writing code,
running experiments, etc.  It&apos;s easy for reading and thinking to fall by the wayside,
or go unrecorded.
&lt;/p&gt;

&lt;p&gt;
That&apos;s what this website is for.  I will try to take some time each week to
update this with what I&apos;ve been reading and thinking about, mostly for my own
benefit.  Enjoy!
&lt;/p&gt;
</description></item>
</channel></rss>

