C and assembler on Linux
This is a page for the C and assembler on linux class, Tuesdays at 5:30 PM in Church.
Here's a write up that covers the first half of the first C on Linux class that I gave last Tuesday 20120619 in the Church classroom from 5:30 to 7 PM.
I hope to write up the balance of last Tuesday's
class before the weekend's out.
Note the To: list, please; if you know of anyone
who's missing, please let them and me know.
Complaints, suggestions, sarcasms, all are welcome.
jim 415 823 4590 my cellphone, call anytime
Learning C programming on Linux
- C programming language is a specification that defines keywords,
operators, and rules of syntax.
This may sound stupidly obvious or useless knowledge, but you may,
if you really get into using C, find that it's a practical concept--useful, intelligently obvious.
- C compiler is a software program that implements the C specification:
parser, keywords, operators, syntax rules.
The practical purpose of this idea is that there are different C
compilers for different machines and for different purposes. If you're just starting to learn C, this idea will seem pretty nearly as useless as the idea that C is a specification.
The tools you use to write C programs include an editor and a C
compiler at minimum. There are a lot more tools available, such as debuggers and profilers and more.
The process you follow is to use a text editor to write some ASCII
text that complies with the rules of the C language then use a C compiler to read your ASCII file and create a new file that contains executable machine code.
Look for C compiler-generated error messages. If there are any, even
one, then the compiler does not make an executable file; you have to fix all errors. You may see warning messages that indicate the compiler found one or more things that are not perfect but let the compiler continue. If you don't have too many warning messages, the compiler will probably make the executable file.
If you get an executable file, run it and see if it works as you
expect. If it does, you probably won't learn anything more from this exercise. If it doesn't, you get to learn about runtime and logic errors: you wrote a program that is correct according to the C language but incorrect in terms of implementing what you hoped it would do.
The following commands exemplify the process using a bash shell:
$ vi myfile.c $ gcc myfile.c $ ls a.out $ chmod 755 a.out $ ./a.out
You use a text editor such as vi to create a file of text that
conforms to the rules of the C specification.
You run the C compiler so that it reads what you wrote. The C
compiler sees your program file as an ASCII character stream that it interprets as a token stream.
So, what is a "token"? A token is one or more ASCII characters that
the compiler sees as a meaningful thing. To compare with the English language, think of a token as a word or a word ending or punctuation or some other element that's meaningful.
The C compiler is a software program that conforms to a particular
design: the design for interpreters and compilers. Generally, any compiler or interpreter includes an input stage that parses the incoming ASCII (token) stream and also has a set of keywords and operators that are reserved ASCII character(s) and a set of rules that the compiler applies to the tokens it reads.
When the compiler begins, it sets itself to a neutral state, which
is to say that it will examine the first ASCII characters to verify that it can parse it as a stream of tokens.
When the compiler identifies the first token, it verifies that that
token is of a class that can be a first token and then resets its (the compiler's) state so that the following token must be one of a limited set of tokens. For example: 1+2
The compiler reads the 1 and then the '+' character, at which point
it determines that it has at least one valid token:1. The compiler continues reading and sees the 2 and determines that it now has two tokens, 1 and '+'. The 1 token is an integer type of data the value of which is 1. The '+' token, because it occurs between the 1 and the 2 represents the addition operator. The compiler continues reading to find only whitespace and then is able to identify the ASCII stream as a set of three tokens--a value, an operator, and a value--that together form an expression.
An expression is at least one operand and zero or more operators
that must be resolved to a single value.
The compiler resolves the expression 1+2 to be a single value of 3. If you know how to write a C program that is exactly 1+2 and nothing
else, it's very likely your compiler will generate an error message (remember, a compiler implements the C programming language specification, and does so in its own way--the C specification is deliberately permissive in some aspects of implementation).
If you get an error message, very likely it will be a complaint that
there's not a complete statement or there's a problem at the end of the file or some such.
The C compiler is designed to read statements. A statement is a set
of valid tokens that follow the rules of the C programming language and end with a statement termination character, which is the ; character.
Try revising your program to read
The 1+2 is an expression: the C compiler sees 1 followed by +
followed by 2 and verifies that this is a valid sequence of tokens that makes an expression. It interprets the ; character as a statement terminator, which means the compiler creates the machine code for the expression and resets itself to a neutral state, ready to read the next statement (ASCII character stream of valid tokens).
The compiler may compile the program with only warning messages. If
so, it will make a new file that is named a.out. It is not a loadable program, nor is it executable. Very likely the entire contents is 3, which means the compiler did the addition as it did the compiling. You may think that the compiler would leave the 1+2 in the file as data and machine instructions that the CPU runs to create the sum, 3. That the compiler does the arithmetic before it is done is a matter of optimization.
The C compiler generally runs in four different phases:
1 preprocessor 2 compiler 3 optimizer 4 linker
Consider the program:
The preprocessor runs and sees nothing to do. The compiler runs and translates the ASCII to data and machine code,
which properly is a set of 1 bits and 0 bits that represent integer 1, integer 2, and the operation of addition.
The optimizer recognizes that this expression can be resolved now
without doing any harm to any other parts of the program, so the optimizer replaces the code with the integer value of 3.
The linker runs and does nothing: there is no code to which to link
Consider the following program:
1+2 3 + 4 ;
How many statements do you see? How many expressions? How many
There is a single statement that has two expressions and a total of
seven tokens: 1, +, 2, 3, +, 4, and ; (we're not counting the space characters or the newline characters).
Note that the C compiler sees 1+2 and 3 + 4 identically: two
expressions that add two integer values together. Very likely the resulting program will effectively be 3 7 after the optimizer pass does its thing.
Note that the 3 and the 7 are there in the program but the program
does nothing with them.
Now it may be that the optimizer of your compiler detects that there
are no machine operations for the CPU and the optimizer might eliminate the data itself. I doubt it, as it's possible that you may want to make a file that contains only data and link it to one or more other programs that you'll write at some time.
The discussion so far includes the terms ASCII stream, token stream,
values, operands, operators, expressions, statements, and the four compiler passes: preprocessor, compiler, optimizer, and linker.