Intro to Linux and the Bash command line, pt III
Published 2019-01-11 on Yaroslav's weblog
This text is also available in other languages: Русский
New year, new post. In this third, and most probably final part of these tutorial/guide series I will be mentioning some useful commands and programs usually present in most standard linux installations. I will be talking especially about programs/commands to manipulate text output from programs and files. I will also talk a little bit about regular expressions, a powerful tool to perform searches inside text strings.
Filters
These programs that perform operations on input text and then write them to standard output are commonly known as filters. You may already be familiar with one of these commands, which the first one that I'm going to talk about.
cat
This command allows you the see the content of a text file (or files). It stands for concatenate, and not for the house pet. The most basic use of this command is to view the contents of a text file, just by typing cat followed by the path of the file we wish to see. However, as it name implies, it also has the ability to concatenate the contents of multiple text files, for example text file 'sample.txt'
user@host:~/Documents/notes$ cat sample.txt
Pepe cool
Tide Pods lame
Uganda Knuckles cool
Thanos cool
JPEG ok
Despacito lame
Bowsette cool
Harold cool
Sans coolest
Minions lamest
NPC cool
And we want to concatenate its content with file 'sample2.txt' to standard output
user@host:~/Documents/notes$ cat sample.txt sample2.txt
Pepe cool
Tide Pods lame
Uganda Knuckles cool
Thanos cool
JPEG ok
Despacito lame
Bowsette cool
Harold cool
Sans coolest
Minions lamest
NPC cool
Troll Face old
Can haz chezburger really old
ROFLcopter super old
Dancing baby ancient
As usual, this command accepts different options, like for example the -n option to display line
user@host:~/Documents/notes$ cat -n sample.txt
1 Pepe cool
2 Tide Pods lame
3 Uganda Knuckles cool
4 Thanos cool
5 JPEG ok
6 Despacito lame
7 Bowsette cool
8 Harold cool
9 Sans coolest
10 Minions lamest
11 NPC cool
As always, you can check other options by looking up cat and any other command with man, as explained in the previous part.
head
This is a really simple command, it shows the first n lines of a text file/output. To use it, type head, followed by -n, then the number of lines to show, and then the file. For example, let's say we want to see the first 5 lines of sample.txt
user@host:~/Documents/notes$ head -n 5 sample.txt
Pepe cool
Tide Pods lame
Uganda Knuckles cool
Thanos cool
JPEG ok
If we use the command without passing the number of lines we wish to see, it outputs the first 10 lines by default. For this just type head followed by the path of the file.
tail
Basically the same as head, except it shows the last lines. Let's say we want to see the last three lines
user@host:~/Documents/notes$ tail -n 3 sample.txt
Sans coolest
Minions lamest
NPC cool
As with head, the default is to output 10 lines.
sort
This command is as obvious as it seems. It sorts output. For example
user@host:~/Documents/notes$ sort sample.txt
Bowsette cool
Despacito lame
Harold cool
JPEG ok
Minions lamest
NPC cool
Pepe cool
Sans coolest
Thanos cool
Tide Pods lame
Uganda Knuckles cool
sed
This one is a really powerful utility to transform and manipulate text, however, to keep this tutorial short, I will only be showing a couple of the most used cases. sed stands for "stream editor".
The way to use sed, is to pass it a kind of script (a sed script) that tells it what to do with the text. The first and one of the most basic uses of sed, is to basically perform the same task as head, to get the first n number of lines. For example, let's say we want the first 7 lines of sample.txt
user@host:~/Documents/notes$ sed '7q' sample.txt
Pepe cool
Tide Pods lame
Uganda Knuckles cool
Thanos cool
JPEG ok
Despacito lame
Bowsette cool
Of course what I've just told it you it does is a simplification of what it really does. Most accurately, the command or script that we passed to sed tells it to output the first seven lines, and the q tells it to stop after that.
Another basic use of sed, and arguably the most common one, is to perform
search and replace operations on text. The basic syntax for this operations is
's/
By default it will replace only the first occurrence in each line, however, we
can specify which or how many occurrences we want to replace by adding a number
and/or letter to the end. For example, if we add a two
('s/
But what if we want to replace each and every occurrence in all of the text? For that we would use the letter g at the end. Let's say for example, that we want to replace all occurrences of "cool" in our sample.txt file, for "dank". In this case we would type something like this
user@host:~/Documents/notes$ sed 's/cool/dank/g' sample.txt
Pepe dank
Tide Pods lame
Uganda Knuckles dank
Thanos dank
JPEG ok
Despacito lame
Bowsette dank
Harold dank
Sans dankest
Minions lamest
NPC dank
A thing to keep in mind, is that you should be enclosing the sed script in single quotes. Of course these are only some of the most basic uses of this command.
grep
This is the last program to manipulate text output that I want to mention. I will demonstrate its basic use in this section, but I will show you a little bit more about it in the next section when I will be writing about regular expressions.
Back to grep, it is a program that basically searches a pattern that you give it, and it will print to you the lines that contain that pattern. For example, let's say that we want to see only the cool (or dank) memes in our file to be displayed
user@host:~/Documents/notes$ grep 'cool' sample.txt
Pepe cool
Uganda Knuckles cool
Thanos coolcharacter
Bowsette cool
Harold cool
Sans coolest
NPC cool
This line of text that we passed it, is actually the most basic form of regular expression, of which we will be looking into detail next.
Regular expressions
A regular expression, or regex for short, is a string of text, that define a search pattern for a larger set of text. Regexes are used in many programs, such as in text editors, and search engines, and can be also of great use in the terminal
An intermission
Before going into actual regular expressions in grep, I want to mention a couple of characters that can make your life easier when dealing with files in the terminal. They are called wildcards, and they are the asterisk (*) and the question mark (?). If you've ever wondered why you can't use those characters in any of your files' names, that's why.
I'll start by explaining the asterisk. When you use the asterisk, you are asking to look at or take all files that contain the any number of any combination of symbols in the place where you put it. For example, we could be looking at files that start with sa
user@host:~/Documents/notes$ ls sa*
saturday.txt sample.txt sample2.txt sample.png
Or another example, we could be looking for files that just contain sa in their name
user@host:~/Documents/notes$ ls *sa*
asado.png saturday.txt sample.txt sample2.txt sample.png
Now the question mark. The question mark indicates that there should be a character in its place, just any character. Let's say that we want to see all files with name "sample" that have a three character extension
user@host:~/Documents/notes$ ls sample.???
sample.txt sample.png
Wildcards come really handy when you need to manipulate multiple files with similar names. If the files that you wish to manipulate don't really have similar names, you might want to use curly braces to indicate a list of files to manipulate, separated by commas. For example
user@host:~/Documents/notes$ rm {monday.txt,december1999.txt,saturday.txt}
Back to regex
Now I'll explain some things about regular expressions, and I'll demonstrate some basic uses with grep. Here are some basic concepts
.
- The dot means a single character (any character). e.g. 'be.r' would match bear, beer, befr, etc.*
- The preceding element matches 0 or more times. e.g. 'an*t' would match at, ant, annt, annnt, etc.+
- The preceding element matches one or more times. e.g. 'an+t' would match ant, annt, annnt, etc.?
- The preceding element matches 0 or one time. e.g. 'an?t' would match at, and ant.{n}
- The preceding element matches exactly n times.{min, }
- The preceding element matches at least min times.{min, max}
- The preceding element matches at least min times, and no more than max times.|
- The pipe, logical OR operator. e.g. 'gray|grey' would match gray and grey()
- The parenthesis group multiple characters as one element. e.g. 'gr(a|e)y' would match gray and grey.[abc]
- It matches if a character is one of those inside the brackets.[^abc]
- It matches if none of the characters is one of those inside the brackets.[a-d]
- A range of characters. i.e. a, b, c, or d.^
- Matches the beginning of the line.$
- Matches the end of the line.
So now let's suppose for a practical example with grep, that we want to find all lines that have "cool" or "ok" in them. In this case we would use the "|" pipe symbol. However, if we use normal grep, we would have to escape the pipe symbol like this "|". That's why it is better that we use "grep -E" to enable extended regex, or its shorter alias "egrep". It would look something like this
user@host:~/Documents/notes$ egrep 'cool|ok' sample.txt
Pepe cool
Uganda Knuckles cool
Thanos cool
JPEG ok
Bowsette cool
Harold cool
Sans coolest
NPC cool
Let's suppose, for another example, that we want to match those lines with a 't' as the last character
user@host:~/Documents/notes$ egrep 't$' sample.txt
Sans coolest
Minions lamest
I have already mentioned and shown you the use of regexes with grep (and/or egrep). Now I would like to show a more practical example with sed. Yes, sed uses its own script language to alter text input, however, it also makes use of regular expressions.
Let's suppose that we have a file that looks like this
user@host:~/Documents/notes$ cat shortcuts
# Some shortcuts
d ~/Documents
D ~/Downloads
m ~/Music
pp ~/Pictures
vv ~/Videos
s ~/.scripts # My scripts
cf ~/.config # My configs
As we can see there is a lot of whitespace, and although comments might be of
help to humans, they are of no use to machine. Let's begin by getting rid of
the comments, for that first need to remember the search and replace command of
sed, 's/
user@host:~/Documents/notes$ sed 's/#.*//g' shortcuts
d ~/Documents
D ~/Downloads
m ~/Music
pp ~/Pictures
vv ~/Videos
s ~/.scripts
cf ~/.config
Bam, there it is. However, we still have the blank lines left, and, if you pay close attention, the comments have been deleted, but, the spaces that used to be before some of the comments are still there.
So first, let's improve our current sed command, if we want to match 0 or more
spaces (zero because not every comment has a space before it) we would use the
*
symbol, but what symbol would we use for spaces? Well, that's an easy one, in
sed we escape spaces like this '\s', so now our sed command looks like this
's/\s*#.*//g'.
Let's take care of the last part, getting rid of blank lines. For this we would need to issue a separate command, but fortunately we can stack commands in one line with a semicolon (;). Now that we know that we need a way to match empty lines with a regex, that's very easy - '^$' just match the beginning and the end of line together, after that, we add a sed command for deleting lines which I haven't mentioned (d), and our one liner is ready...
user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts
d ~/Documents
D ~/Downloads
m ~/Music
pp ~/Pictures
vv ~/Videos
s ~/.scripts
cf ~/.config
Of course, issuing this command will not replace the original file, it will simply output the result to the terminal screen. If you want to overwrite the original file with the result of the sed command, you can pass sed the '-i' option.
Piping and redirecting output
This post is already getting too long, however there's one more useful thing about *nix systems that I'd like to mention - the pipeline. The pipeline in Unix and Unix-like OSs is a chain of redirected output to the input of another program. Along with that, there are operators to redirect standard output to files (and viceversa).
Redirecting to and from files
Let's suppose that we want to repeat the last example, and want to clean the file of comments and blank lines. We already now how to overwrite that file, however, what if we want to save it to another file using common Unix operators in bash. For that we can use the '>' and '>>' operators. For example, let's we want to save the result to a second file called "shortcuts_clean"
user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts > shortcuts_clean
Since there was no "shortcuts_clean" file, it has been created automatically. However, if the file had already existed, it would have overwritten it, unless we had used the '>>' operator, in that case, it would have appended the output to the already existent file.
Just as there's '>' to redirect TO files, there's also the '<' to redirect from files to a program's standard input. However, must of the times you would just pass the name/path of the file to the program as an argument.
Piping
Now that we know how to redirect from and to files, we can learn how to redirect from one program to another, with pipes. The pipe operator in *nix systems is the vertical bar symbol (|). Let's suppose that we want to see the first three files in our current directory, for that, we can pipe the output of ls into head, like this
user@host:~/Documents/notes$ ls | head -n 3
asado.png
monday.txt
sample.txt
Now let's get back to our sample.txt file. Let's imagine that we first want to sort our lines, and we want to preserve only those lines that contain "cool" or "lame". Then let's suppose we want to modify to contain legit terms, and not some antiquated boomer slang, so we want to replace cool with dank, and lame with normie. Finally we want that to be output to a file instead of the screen. Whew! Sounds like a lot of stuff to do, but it is quite simple, and it looks like this
user@host:~/Documents/notes$ egrep 'cool|lame' sample.txt | sort | sed 's/cool/dank/g;s/lame/normie/g' > memes.txt
So if we now take a look at the file...
user@host:~/Documents/notes$ cat memes.txt
Bowsette dank
Despacito normie
Harold dank
Minions normiest
NPC dank
Pepe dank
Sans dankest
Thanos dank
Tide Pods normie
Uganda Knuckles dank
And that's basically it.
Post scriptum
Before ending it for good, I want to show some other programs that might be of use in the Bash command line.
less
This command might come in handy when there's another command that outputs a lot of text that overfills the terminal screen. You can pipe (as we have just learned) the output of that command to less, so that you can navigate with your arrow keys, or better yet with vim keys (hjkl). You can also search for terms by typing slash (/), just like with man.
tar
This program is used in Linux to create and extract archives with the .tar format, usually also compressing them using gunzip (.gz).
There usually two ways you will be using the program. One to extract files from a compressed archive
user@host:~/Documents/notes$ tar -xzvf oldnotes.tar.gz
And the other to archive and compress files
user@host:~/Documents/notes$ tar -czvf allnotes.tar.gz *
To learn more about the different options of this program, I recommend you check the man pages of tar ('man tar').
ssh and scp
You may have already heard about ssh, which stands for "secure shell", even if you are new to Linux or Unix/Unix-like systems. This program is used to connect to other computers over a network (or the internet for instance), especially to servers.
Let's suppose that you have a server with ip address 180.80.8.20 and your user is tux
user@host:~$ ssh tux@180.80.8.20
Of course, here we have assumed that the standard ssh port (22) is being used, otherwise you will have to specify it by passing -p followed by the port number.
Now let's talk about scp, which stands for "secure copy". This command uses the same protocol as ssh, and it's used to copy files from one computer to another over a network. Let's suppose that you want to copy a file from your current computer to the server we used in the previous example
user@host:~$ scp somefile tux@180.80.8.20:/home/tux/directory/
If we were trying to do it the other way around, that is, from the remote computer to your local computer, it would look like this
user@host:~$ scp tux@180.80.8.20:/home/tux/directory/somefile directory/
Just as with ssh, if you are not using standard port 22, you need to say to the program to which port you are trying to connect, except that in the case of scp, the flag is '-P' instead of '-p', and goes right after "scp".
Well, that's it for this tutorial/guide series, I really hope it was of use to you.
https://www.yaroslavps.com/weblog/intro-to-linux-and-bash-pt3/
© 2018—2024 Yaroslav de la Peña Smirnov.