How to split a very large text or CSV file by a specific number of lines/rows

29
8232

This posting will describe how to split a very large CSV or Text file into a number of smaller parts by specifying the number of desired lines within each of the resulting pieces (for example, by 65536 lines for use with Excel 2003 or 1,048,576 for Excel 2007). The method described will use free software to do this and will work for very large files (even files larger that 4 gigs in size), will avoid loading the entire file into memory when processing, will maintain the format/extension of the original file, and will not add any additional information to the resulting files.

This article was borne out of a need to split a large 600 meg CSV file into several pieces that were no longer than 100,000 rows each for uploading into a client’s website. The main problem we faced in doing so was that it was impossible to load the file into tools such as Excel (or course), text editors, or even Access because doing so would invariably cause any of those apps to crash. In these cases, it was possible to split the file using a freeware program called Gsplit, which processes the file sequentially without loading it into memory all at once. This posting will describe exactly how this is done.

A step by step guide:

  • Background: I am going to describe an example whereby I split a file called “chunk.csv” into several pieces that are a maximum of 10,000 rows each. This same process will work on any type of text file (for example TXT, INI, LOG, BAT, DIZ, BAK, and QUE) and you can customize this process to exactly the number of rows you require for your use.
  1. Download and install Gsplit. This is the program that will perform the operation and it is 100% freeware, even for commercial use.
  2. Run Gsplit.
  3. Click “Original file” in the left sidebar. Click on the “browse” button under “file to split”. Browse to and click on the file you want to split. GSplit Original File3
  4. Click “Destination folder” in the left sidebar. Click on the “browse” button under “destination path” on the right. Browse to the folder where you would like your output files to be saved. GSplit destination folder
  5. Click “Type and Size” in the left sidebar. This is where all the interesting stuff will take place. Click on “Blocked Pieces” icon. Next, from the dropdown select “I want to split after the nth occurrence of a specified pattern”; and from the dropdown under that “Split after the occurrence number”. Enter the number of lines you want to split underneath (10,000 in this case, but you can enter the value that you need). Lastly, make sure that the pattern to use for splitting is “0x0D0x0A” (without quotes). This value is the hex code for a carriage return, and it should be displayed by default. Gsplit type and size
  6. Click “Filenames” in the left sidebar. In the “piece name mask” field enter “{ofw}_{num}{ore}”(without quotes). This will generate pieces with the original file extension and the original file names that look like “filename_1.csv”. Alternately, you could leave this alone altogether, go with the default generated names, and simply rename the extension to “.csv” (or whatever your original extension is) in Windows explorer. GSplit Filenames
  7. Click “Other Properties” in the left sidebar. check “do not add tags to piece files”, This will ensure that no additional information will be added to your original data. GSplit other properties
  8. GSplit SplitYou’re now ready to split. Click on the “Split” button in the upper toolbar. Your file should be split within seconds. Once the splitting process is finished you will see the “Splitting log” screen. Click on the “Open the folder in Windows Explorer” link to instantly jump to the output folder.GSplit splitting log2
  9. GSplit save profileOptional: save your splitting profile. If you are going to repeat this splitting process in the future you might consider saving these settings as a “profile” that can be loaded when you need it so that you do not have to go through these process again. To do so select “Save a Profile As” from the “File” Menu.

That’s it. Happy Splitting 😉 .