Scripting and Programming Languages
Jeremy Sanders
October 2011
Here is a brief guide to available scripting and programming
languages. It is not complete (I won't cover IRAF scripting here), and
of course I'm sure to have missed out favourite languages.
Scripting languages are programming languages designed to make
programming tasks easier. If you write a Perl or Python (or even tcl
or Ruby) script, then you won't have to compile the script to run the
program.
I'll run down the differences between Perl, Python, a shell script,
IDL, C, Fortran 77, Fortran 90/95, C++ and Java. Also I'll include
some examples of some of the languages in operation.
Shell scripts are useful for programming repetitive Unix tasks and
simple manipulation of files. I would not use them for any complex
task.
Advantages: Fairly popular, quick to write for simple repetitive
tasks.
Disadvantages: Lack required facilities for general programming (e.g.
floating point arithmetic). Syntax can be difficult to get used to
(e.g. quoting, tests). tcsh shell scripting is broken (see shell
scripting notes).
Python is popular language. Many people agree that it has a simple
clear syntax. Its most unusual feature is that it uses alignment of
text to mark blocks of code (this is not too difficult to get used
to). It is also the language used to drive PyRAF (an easier interface
to IRAF).
Advantages: Popular, used in PyRAF, simple syntax, fast development,
numerical/scientific library, a large library of code, good
object-oriented features, good for writing larger programs.
Disadvantages: Slower than Fortran/C/C++, especially for loops. Lack
of static variables may lead to bugs if you do not test programs.
See: http://www.python.org/
Note that Python 3 is different to Python 2. The Python 3 language has
improvements, but not all libraries work under Python 3 yet.
Books: Learning Python (Mark Lutz, David Ascher), Programming Python
(Mark Lutz et al.), Python Cookbook (Alex Martelli (Editor), David
Ascher (Editor))
Perl is another popular scripting language, used from system
administration scripts, to scripts to produce web pages (and
astronomical scripts). It has fallen out of popularity a bit because
of the slow development of Perl 6. Most people still use Perl 5.
Its philosophy is to provide lots of quick ways to do tasks. It was
originally created for quick and easy text processing tasks, but is
now being used for a wide range of programs.
Advantages: Popular, fast at certain tasks (like text processing),
large library of code to use (CPAN library), fast development, great
for processing text files.
Disadvantages: The syntax looks quite messy (in my opinion). If
written without care your code will be hard to read in 6 months
time. Not fast enough for complex numerical code. Hard to write and
maintain large Perl programs.
See: http://www.perl.com/
Books: Programming Perl (Larry Wall et al.), Learning Perl (Randal L.
Schwartz, Tom Phoenix)
Interactive Data Language (IDL) is a proprietary language for
manipulating data and plotting. Its plotting is powerful and it can do
quite advanced numerical work.
Advantages: Popular in astronomy, powerful facilities (especially
astronomy library for FITS files)
Disadvantages: Quite expensive (to buy for yourself), nasty syntax,
slow to run for many cases, inherits some bad design descisions
(IMHO).
C is a popular standard programming language, used especially in Unix.
It is quite a low level language but gives you tremendous control. The
main difficulty in learning it is understanding pointers. It
has many features not found in Fortran 77 (e.g. memory
management). The language is relatively simple.
Advantages: Popular (outside astronomy, not as popular as F77 inside
astronomy), lots of code available in C (e.g. GSL), standard Unix
language, powerful.
Disadvantages: Can be complex to learn and debug pointers (used
throughout C), lacks features available in C++, harder to parallelise
than Fortran.
Books: C Programming (A modern approach) (K.N. King) [recommended
introduction], The C Programming Language (Kerningham and Ritchie)
[rather dry].
Fortran is a programming language designed for scientists. It is
intended for writing numerical code. Fortran 77 is very popular among
scientists, but lacks features making certain tasks hard (e.g
allocating variable-sized arrays, text handling).
Advantages: Commonly used in science, good for numerics.
Disadvantages: Not common outside science (e.g. for jobs), lacks
features, slower development than Python/Perl.
Books: Structured Fortran 77 for Engineers and Scientists (D. Etter),
Programming in standard Fortran 77 (Balfour, Balfour, Alexander).
Fortran 90 and 95 are improvements to the Fortran 77 language,
providing modern features (like allocatable arrays). F90/95 is less
commonly used than F77, but is far more powerful. F90/95 is easier to
learn than C if you have used F77 and provide similar functionality
(though I expect scientists would find F90/95 closer to the scientific
mindset).
Advantages: More powerful than F77, increasingly common, easy parallel
programming. There are some good free compilers now, including
gfortran (included in gcc 4.x) and g95.
Disadvantages: Many older supervisors will only have used Fortran 77.
Books: Fortran 90/95 Explained (Metcalf and Reid), Fortran 90/95 for
Scientists and Engineers (Chapman) [expensive!].
C++ was based on the C programming language, providing
object-oriented features. It is widely used outside science
(and increasingly in science, but not very much at the moment). It
provides almost everything C does, but adds powerful features like
classes, templates, strings and references. It is a very large
language (few people understand everything), but you can get things
done with much less knowledge.
Advantages: Powerful language, very common outside astronomy (lots of
jobs!), can link C code easily to C++ code.
Disadvantages: Is a very complex language in full, probably less
popular than Fortran in astronomy.
Books: Accelerated C++ (Practical programming by example) (Koenig and
Moo) [very simple introduction], The C++ Programming Language (Bjarne
Stroustrup) [the bible, complex].
Online course: http://www.liv.ac.uk/HPC/F90page.html
Java is a modern popular programming language. It is object-oriented,
but is not as complex to learn as C++ (but lacks some of its
features). Java programs are cross-platform, as they run under a
virtual machine. Java might be a bit slower than C/C++/Fortran,
but is not as slow as Python/Perl.
Advantages: Cross platform, modern, widely used outside astronomy
(good for jobs).
Disadvantages: Not so popular in astronomy.
Books: Java in a Nutshell (Flanagan), Learn to Program with Java
(Smiley).
You are likely to want to program some numerical code at some point in
your PhD. This code might be to solve an equation, minimise a function
or integrate it (if you can't solve the problem analytically or
generally). The options for writing numerical code are:
- Use a higher-level language like NumPy (Python with numerical
library), IDL or Mathematica to do it. A good idea for many
programs.
- Write the numerical code yourself from scratch. This is often
difficult and fraught with difficulties unless you are an expert (I
advise you to read ``What Every Scientist Should Know About
Floating-Point Arithmetic'' in my links section).
- Read Numerical Recipes (Press et al.) and write your own
code. Numerical Recipes often has good descriptions for algorithms
which you can implement freely (if you don't copy their code).
- Type code in from Numerical Recipes. This is fine except some of
the NR code contains bugs in the printed versions (care!), and you
are not allowed to give the code to someone else under the
licencing terms (e.g. allow your code to be downloaded). NR code is
provided in Fortran, C and C++ (though Fortran was the original
language it was written in).
- Use the GSL library (GNU Scientific Library). GSL is a `free'
library and is freely distributable. If you write a program in GSL
and give it to people, you need to distribute the source code of
your program too (GPL Licence). GSL is a C library, but you can call
it from Fortran with a C wrapper function.
- Use the NAG library (Numerical Algorithms Group). NAG is not
free, but the IoA has a licence. The IoA has a set of manuals for
NAG. You can't give your code to someone unless they own a copy of
NAG. Unfortunately all the function names are difficult to remember.
If you are using a compiled programming language (unlike Perl or
Python), the steps to get a running program are:
- Write the source code in a text file (e.g. prog.f, prog.c).
- Compile the source code with an appropriate compiler, e.g.
cc -g test.c -o program.out
The C compilers are cc (Sun) or gcc (GNU). The
Fortran compilers f90 and f77 (both Sun), or
g77/g95 (GNU). The C++ compiler is c++ or g++ (both GNU). If
compiling doesn't succeed (check how to switch on extra warnings in
your compiler if possible) go back and fix the problem. Otherwise
continue.
The compiler will take options, including optimisation (usually -O),
making the program work faster at the expense of compiling speed and
ease of debugging. Other options include debugging options, like -g
which puts extra debugging information in the program so you can
look at a core file if it crashes. See the man page of your
compiler to find out more (or info for gcc/g77)
- Run the program.
- If the program crashes, insert print statements every so
often. This will enable you to see where it crashes. If you get a
core file, you can use this to find the line it crashed on using a
debugger (gdb or dbx). Use the commands:
dbx program.out core (Sun) /or/
gdb program.out core (GNU)
to start a debugger (read man pages to understand them, but the
command backtrace in gdb is the most useful to print out
where the program crashed, or where in dbx).
- Look at its results, check them if possible against what you
would expect for simple cases.
- Go back to the text editor and the source code if there is a bug
or the program needs improvements.
There are several different compilers available for the compiled
languages C/C++/Fortran. They differ in their ability to optimise your
code, do debugging and give good warnings and diagnostics about your
code.
The compilers include
- Gcc (gcc, g++, g95) - standard GNU compiler collection used on
Linux
- Solaris compilers on Linux - the Sun Studio Compiler, which
comes with the SUN Performance Library, a set of optimized,
high-speed mathematical subroutines for solving linear algebra and
other numerically intensive problems. Includes the Sun graphical
debugger.
- Intel compilers for Linux.
You can switch between the compilers easily using the `module
environment' - see the IoA computing users' guide or my guide `Useful
Unix Topics'.
If you code takes a long time to run I would recommend that you test
the compilers to see which is the fastest. Remember to switch on
optimisation when using the GNU compiler (-O2).
Good programming style will make it much easier to write your
code. Here are some tips to make your life easier.
- Think how you could get your result without writing another
program (don't write another sort!). Think how you can
reuse the code you (or someone else) already has.
- Plan how you are going to write your program before you start
typing, even if you only plan in your head. Think carefully about
the bits that will be tricky.
- Unless you plan to write in terms of objects, divide your code
into functions or subroutines that have a particular task (i.e.
modular programming).
- Start from the smallest functions, write them, compile and test
them one by one (or together for very simple functions). Do
not write all your code in one go and expect it to work!.
- Keep unrelated code in separate source files, and compile them
together (or link .o files). If you have multiple files you
can do things like
f90 -g -c test1.f90 <-- makes test1.o
f90 -g -c test2.f90 <-- makes test2.o
f90 test1.o test2.o -o testprogram <-- makes testprogram
a Makefile is very useful to keep a program up to date. If
you don't want to learn Makefiles write a little script to compile
your program.
- Assertions can help with tricky code. If you expect
variables to be in a particular state (e.g. positive) at a
particular point in the program, test whether they are and stop the
program if they aren't. If the program stops then you know there is
a bug somewhere else is in the code.
- Comment your code. Do not write anything but the simplest
program without something to document your code. However, don't
waste time documenting obvious things:
a=1; /* this sets a to 1 */
Keeping code in small chunks (functions or subroutines), with an
explanation for each, is often enough commenting, except for
particularly non-obvious code.
- Backup your code (pretty obvious!).
- Avoid arbitrary limits in your code (e.g. size of arrays). Work
out how big your arrays should be. If the sizes need to change on
running the program use allocate in Fortran 90,
malloc in C, or new in C++ to allocate it.
- In C and C++ code, use const as often as possible.
Avoid casting unless necessary.
- In Fortran, explicitly declare all variables (implicit
none). Write in standard F90 or F77, not a mixture of both! Don't
use proprietary compiler extensions.
- Avoid global variables. Keep data in subroutines or functions
(or hidden in modules). Doing this enforces the robustness of code
and makes it easier to reuse later.
Scripting and Programming Languages
This document was generated using the
LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -font_size 10pt -no_navigation programming.tex
The translation was initiated by Jeremy Sanders on 2011-10-02
Jeremy Sanders
2011-10-02