Depending on the environment or operating system you are using, starting R may be a bit different. Typically you will click on the relevant icon or menu item, or on UNIX-like systems you can run the command R on the shell prompt.
When you start R, it will print some default information, and wait for your commands.
First thing you need to get used to (if you are not already) is that R is controlled through a command line interface. After the initial information, you should see the cursor next to the command prompt ‘> ’. R presents this prompt when it is ready to accept commands.
The command-line interface may feel awkward or old-fashioned at first, but once you get used to, you will see that it is not as scary as it may seem at first sight, and it has its advantages in many cases.
The greeting message you see at the startup already gives you a few tips. Now type
> help()
including the parentheses but not the command prompt. In this tutorial we will follow this convention: the commands you should type will be displayed after the command prompt, ‘> ’.
If you type the above command and press enter, R will present the built-in documentation about how to get help. Depending on the R configuration on your system, you may get the help text either on the same window, or R may present the help in another window. If you get help on the same window, you can scroll up and down using arrow keys or page up/down keys on your keyboard. Pressing ‘q’ will quit help and give the command prompt back. As you were instructed at the greeting message, you could alternatively type
> help.start()
and get the documentation in an external browser. To get help on a particular command, for example pnorm, you can type
> help(pnorm)
but in case you do not remember the exact command, you can search a keyword in the documentation using help.search(). For example, if you were wondering what was the command that did Student’s T test, you can try
> help.search(’student’)
R will list the help topics that match, and you can again use help to read the documentation. Two shortcuts you may appreciate if you use the help facility frequently are ? and ?? which correspond to help() and help.search() respectively. When using ? and ??, you should just type the keyword(s) without parentheses. If the keyword contains white spaces, you need to use double or single quotes around it.
A tip that you may be happy to hear is that R remembers your previous commands. You can return to the previous commands using the up arrow key on your keyboard, navigate between them with up and down arrow keys, and you can modify and re-run them if you wish.
There are many small tips and tricks you will collect while working with R, one last tip to mention here is that R command line allows ‘tab completion’. That is, if you type a unique initial segment of a command (or variable or file name in the right context), and press the ‘tab’ key on your keyboard, R will try to complete the rest for you. If there are more commands that match the initial string you typed, pressing ‘tab’ twice will list all matching commands.
Besides the documentation built in to R, the official web site of the R project contains the reference manual [R Core Team, 2014], and many useful documents and pointers to other sources. If all else fail, you can ask your questions at one of the mailing lists (that you can find through the official R site), or sites like http://stats.stackexchange.com/. Before asking questions on online lists and groups, you should always make an effort to find the answer in obvious places.
R can be used as a calculator. Try typing a few arithmetic expressions at R’s prompt and check what happens. Listing 1 demonstrates some of the arithmetic operations.
The lines that do not start with a command prompt in Listing 1 are the outputs. In line 5, the multiplication operation takes precedence: it is calculated as 6 - 12, not 3⋆4. In line 7, to make sure that addition is done before division, we used parentheses. If you are familiar with usual operator precedence in programming languages, R will not surprise you. However, there is no harm in adding a couple of parentheses to make sure you get the result you want.
Another thing to note in this listing is that R regards any text after the hash sign (#) until the end of the line as a comment, and ignores it. Comments do not have much use during interactive use, but they come handy when you save command sequences (R scripts or programs) in files for future reference.
Under the hood, R provides a complete general purpose programming language (in fact R is an implementation of language SPLUS) which may be really handy if you have some programming background. In this set of exercises we will not go into programming. However, we will be using variables frequently.
Use of variables may save you from quite some typing, and R will save the values of variables on exit by default so that you can access the same values when you restart R.
To assign a value to a variable you can use the assignment operator, ‘=’, (or, equivalently, <- as R experts do). And you can use the variables in calculations or if you type a variable name and press enter, R will report the value. Listing 2 demonstrates the basic use of variables.
In line 1 we store the value 2010 in variable now (yes, now is relative). In line 2 we use the alternative assignment operator <-, this is equivalent to =. In this tutorial we use both somewhat randomly to remind you that you may see R code using both, and they are equivalent (see the answer of Exercise 1.3, for one more assignment operator).
In line 2 and 3 we use a dot ‘.’ instead of space. R variable names cannot contain space characters, and dot is the conventional character instead of space in R community. There are more rules for variable names. For example, they cannot contain many other special characters (like -, +, /) and they cannot start with numbers.
Line 3 demonstrates use of character strings. Character strings must be enclosed in matching double (") or single (’) quotes. R supports a variety of operations on string type, and it may come quite handy while working with language data (e.g., corpora). Apart from numbers and strings there are other types that your variables can take. For example booleans that take values TRUE or FALSE and categorical variables (or factor variables as R calls them) are interesting for many statistical tasks. We will return to discussion of these types later.
Line 4 subtracts value of birth.year from now and stores the result in a new variable age. As demonstrated in line 5, if we type the name of the variable R tells us the value stored in the variable.
Line 7 may be confusing for non-programmers. This line adds 2 to variable now, and re-assigns the new value to the same variable now. In other words, we increment now by 2.
In line 8, we (re)calculate the age, but beware: the case matters in variable names. Age is not the same as age. As a result we have two variables now, lowercase age still contains the previous calculation on line 4, and uppercase Age contains the calculation in line 8. The rest of the lines demonstrate this difference.
You should enter this command sequence in R to check if all works as in the listing.
If you’d like to see the user variables, you can use the function ls(), and if you want to get rid of one, for saving space, for keeping your environment clean and tidy or for any other reason, you can use rm().
In statistics, we are generally interested in a sample, or a list of values. For that purpose, R offers a data structure called vector. Vectors in R are similar to arrays or lists in programming languages. The important thing to know is that a vector is a container of a set of values of the same type.
For the exercises in this section, we will use the following data. For a class, students are asked to submit a 3,500 to 4,000-word report. 10 students turned in the reports with the following word lengths:
3510,3508,3468,3520,3516,3525,3505,3519,3558,3487
To enter this data into a vector variable we type,
> nwords = c(3510,3508,3468,3520,3516,3525,
3505,3519,3558,3487)
> nwords
[1] 3510 3508 3468 3520 3516 3525 3505 3519
[9] 3558 3487
This example demonstrates the primary way of assigning a vector to a variable. The function c (stands for concatenate), puts together its arguments into a vector. Like simple data types, if we type the name of the variable, we get its value displayed (in fact, the simple variables we have been working with are vectors containing single elements). Entering large datasets this way is, at best, cumbersome, and R provides other ways of entering data to which we will return later.
At this point you should type the above assignment command to create the vector nwords. We will use this data set in the next few sections.
R supports mathematical operations between vectors and the scalar values and vectors and vectors. Standard R functions that normally take a basic value can also take vectors as arguments, in which case the function is applied to all members of the vector.
Elements of a vector can be selected by specifying the position of the element(s) between square brackets after their name. For example, if we want to refer to the fourth element of vector nwords, nwords[4] (in fact, as we will see later, one can also select possibly discontinuous ranges of data with this notation).
Listing 3 demonstrates some of these operations.
The first line multiplies a vector with a scalar value. In other words, all members of the vector is multiplied with 2. Line 4, on the other hand, sums two vectors. Finally, in line 6, the function log() is applied to each member of the resulting vector.
Besides the arithmetic operations and scalar functions applied to vector elements, there are a set of functions that operate on vectors. Listing 4 demonstrates some of these functions. Note that the listing already includes a few statistical functions (finally we are getting closer to the point!).
where s is the (estimated) standard deviation, and n is the size of the sample. Calculate the standard error of the mean for the word count data stored in nwords. You can calculate the standard deviation using the function sd(). It is easy to just count the number of elements in nwords, but you can use the length() function to get the number of elements in a vector.
Calculate z-scores of the values in the vector nwords, and assign
it to a new vector variable named znwords. Display the resulting
vector, its mean and the standard deviation.