Introduction to Unix and SGE for beginners

File management and directories

Directories

File and directory paths in UNIX use the forward slash "/" to separate directory names in a path. If you are using a server your shell will start from /home/yourUserName/ directory. Also have a look at the conventional directory layout here.

Here are a few examples of the directory strcucture:

directory explanation
/ "root" directory
/usr directory usr (sub-directory of / "root" directory)
/usr/local local is a subdirectory of /usr

Creating a directory

mkdir command creates a new directory. The command below creates a new directory named "newDir" under the current directory.

$ mkdir newDir

This command creates a new directory in user's home directory.

$ mkdir ~/newDir

The next command creates a the target directory and all the non-existing directories in the path. The command will create samtools directory, and will create "opt" directory if it does not exist. All of this will be done in user's home directory as indicated by "~/" in that path.

$ mkdir -p ~/opt/samtools

Moving around the file system with cd

cd command stands for "change directory" lets you move around the file system. Here are a few examples of the cd command and pwd.

Type variants of these to your shell to move around your file system.

Command + arguments explanation
pwd Show the "present working directory", or current directory.
cd Change current directory to your HOME directory.
cd /usr/local Change current directory to /usr/local
cd INIT change current directory to INIT which is a sub-directory of the current
cd .. Change current directory to the parent directory of the current
cd ~akalin Change the current directory to the user akalin's home directory (if you have permission).

Listing directory contents

ls command lists the contets of a directory. It can take multiple options, some of those are explained below.

commands explanation
ls list a directory
ls -l list a directory in detailed format including file sizes and permissions
ls -a List the current directory including hidden files. Hidden files start with "."
ls -ld * List all the file and directory names in the current directory using long format. Without the "d" option, ls would list the contents of any sub-directory of the current. With the "d" option, ls just lists them like regular files.
ls -lh list detailed format this time file sizes are human readable not in bytes

Moving, renaming and copying files

cp command copies the files and mv command moves the files. They are generally used with two main arguments. cp target_file destination_file or mv target_file destination_file.

commands explanation
cp file1 file2 copy file1 as file2
cp /data/seq_data/file1 ~/ copy file1 at /data/seq_data to your home directory.
mv file1 newname move or rename a file
mv file1 ~/opt/ move file1 into sub-directory opt in your home directory.

Finding files

There are a couple of ways you can find files in your file system. We will show the find command, it works in the following syntax find directory -name targetfile. It is useful when you have a rough idea about file location.

The following finds all files ending in ".html" under /home/user directory.

$ find /home/user -name "*.html"

find can also do more than just finding files. It also execute commands on the files you find via -exec option. The following command finds all files in the current directory with ".txt" ending and counts the number of lines in every text file. The '{}' is replaced by the name of each file found and the ';' ends the -exec clause.

$ find . -name "*.txt" -exec wc -l '{}' ';'

Another command that can find files is locate. The locate command provides a faster way of locating all files whose names match a particular search string. For example:

 $ locate ".txt"

will find all filenames in the filesystem that contain ".txt" anywhere in their full paths.

A disadvantage of locate is that it stores all filenames on the system in an index that is usually updated only once a day. This means locate will not find files that have been created very recently.

Searching the contents of a text file

Often times you would need search a file for existence of certain characters or words. Imagine that you need to find gene ids in a text file containing some scores and gene ids, you would like to get the line(s) that contains your gene id of interest. This is similar to "find" functions in modern text processors such as MS Word. This can be achieved via grep commmand. Syntax of the command is: grep options pattern files

Command Explanation
grep id1 genes.txt searches and prints lines matching "id1" in "genes.txt"
grep id1 *.txt searches and prints lines matching "id1" in files ending with ".txt"
grep -vi id1 *.txt similar to above, but -i option ignores the case (Id1,ID1,iD1 and id1 treated equally), -v option prints lines that don't match the pattern

using grep and find together

You can search all files in an entire directory tree for a particular pattern by combining grep and find. The following command prints lines containing "genes" string, from the files 'find . -name "*.txt" -print' found.

$ grep genes `find . -name "*.txt" -print`

The search patterns that grep uses are a special named regular expressions. You can have more comlicated searches using regular expressions, but that is more of an advanced application. See [] for more on that.

See disk usage and free disk space with du and df

Deleting files and directories

Use these with caution.