In this article we will examine yet another of the GNU Core Utilities that make up the foundation of the Linux operating system. The csplit command is a small, yet powerful text utility that allows you to split a file into two or more parts using context lines.
The csplit command should not be confused with the split command. Although both split large files into smaller pieces, the csplit command can work with context lines, and the split command splits on file sizes. With csplit you can split a file after x number of lines or even use regular expressions to split on a pettern match. Let's look at some examples.
We will use the following text file as an example (one of my favorite songs as a child):
Pepino, oh, you little mouse, oh, won't you go away
Find yourself another house to run around and play
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time
Table of Contents
Splitting a File After x Number of Lines
As we can see, our example text has four lines. We can split the file at the second line by using two as a command line argument.
[mcherisi@putor ~]$ csplit pepino.txt 2
This will output two numbers showing the sizes of the newly created chunks/files. In this case they are 52 bytes and 169 bytes respectively. You can suppress the file size output using the -s
option if desired.
[mcherisi@putor ~]$ csplit pepino.txt 2
52
169
[mcherisi@putor ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 52 Sep 26 23:23 xx00
-rw-rw-r--. 1 mcherisi mcherisi 169 Sep 26 23:23 xx01
The csplit command created two new files named xx00 and xx01. These files contain the source text which has been split at line 2 per our request. So "xx00" holds the first line, and "xx01" holds the rest of the text.
NOTE: Later in this article we will discuss formatting the name of the output files.
[mcherisi@putor ~]$ cat xx00
Pepino, oh, you little mouse, oh, won't you go away
[mcherisi@putor ~]$ cat xx01
Find yourself another house to run around and play
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time
Splitting a File After Matched String
Instead of splitting the file by line number, you can also split after a matched string. For example, we can split the file after matching the word "scare". We must wrap the string we are trying to match in '/'
slashes.
[mcherisi@putor ~]$ csplit pepino.txt /scare/
103
118
[mcherisi@putor ~]$ cat xx00
Pepino, oh, you little mouse, oh, won't you go away
Find yourself another house to run around and play
[mcherisi@putor ~]$ cat xx01
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time
As you can see above, csplit read the file until it found the matched pattern (scare) then split the file. You can use regular expressions here as well.
[mcherisi@putor ~]$ csplit pepino.txt /^[Ff]/
52
169
NOTE: Pattern matching with a /
delimiter will copy up to, but not including the matched line. This will cause the second line to be in the second file chunk.
If you use the %
delimiter, csplit will skip to the matched pattern and ignore the lines before the match.
[mcherisi@putor ~]$ csplit pepino.txt %cheese%
118
[mcherisi@putor ~]$ cat xx00
You scare my girl, you eat my cheese, you even drink my wine
I try so hard to catch you but you trick me all the time
Since we used the percent sign as the delimiter, we only created a single file with the lines AFTER the match. The lines before the match were ignored. We "SKIPPED" to the match.
Repeat Pattern x Number of Times
By default csplit will split the file at the first match only. You can easily instruct it to match the pattern more than once. Here we instruct it to find the pattern three times.
It is important to understand that the number you put here is the number of times it will repeat the pattern. Here we are using the number 2, which means it will stop after it finds the pattern 3 times. The first initial time, and the 2 we told it to repeat.
[mcherisi@putor ~]$ csplit pepino.txt /[Rr]/ {2}
52
51
61
57
Since all but the first line have the letter "r" in them, this creates four files. The second line was the initial match, then each line after it was the 2 repeating matches.
NOTE: You can use an asterisks {*}
to match the pattern as many times as possible.
Splitting the File with a Line Offset
Using line offsets allow you to split x number of lines after or before a match. An offset must have a +
or -
followed by a positive integer. Here we are telling csplit to use an offset of 2 additional lines.
[mcherisi@putor ~]$ csplit pepino.txt /Pepino/+2
103
118
[mcherisi@putor ~]$ cat xx00
Pepino, oh, you little mouse, oh, won't you go away
Find yourself another house to run around and play
Remember pattern matching copies up to, but not including the matched line. This is why using an offset of +2
includes the action matched line and one additional line.
Formatting the csplit Output File Names
There are two parts to the output file names, the prefix (xx) and the suffix (00). Both of these can be easily modified. Let's take a look at how to modify the prefix first.
Here we will use the -f
option followed by our desired prefix (the word putorius followed by a dash).
[mcherisi@putor ~]$ csplit pepino.txt /cheese/ -f putorius-
103
118
[mcherisi@putor ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 103 Sep 27 00:17 putorius-00
-rw-rw-r--. 1 mcherisi mcherisi 118 Sep 27 00:17 putorius-01
As for the suffix, you can use the -n
option to set the number of digits used. By default two digits are used (xx00), here we will set it to four.
[mcherisi@putor ~]$ csplit pepino.txt /cheese/ -f putorius- -n 4
103
118
[mcherisi@putor ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 103 Sep 27 00:24 putorius-0000
-rw-rw-r--. 1 mcherisi mcherisi 118 Sep 27 00:24 putorius-0001
You can also use sprintf formatting with the -b
option, although there are limited format specifiers.
[mcherisi@putor ~]$ csplit pepino.txt /cheese/ -f putorius- -b "%d"
103
118
[mcherisi@putor ~]$ ls -l
total 12
-rw-rw-r--. 1 mcherisi mcherisi 221 Sep 26 23:22 pepino.txt
-rw-rw-r--. 1 mcherisi mcherisi 103 Sep 27 00:26 putorius-0
-rw-rw-r--. 1 mcherisi mcherisi 118 Sep 27 00:26 putorius-1
Conclusion
Just like most of the text utilities included in the coreutils package, csplit is powerful and does it's one job very well. In this article we covered how all the basics of splitting a file on the Linux command line with csplit. If you have any questions or comments we would love to heard from you below.