How to split a very large text or CSV file by a specific number of lines/rows

This posting will describe how to split a very large CSV or Text file into a number of smaller parts by specifying the number of desired lines within each of the resulting pieces (for example, by 65536 lines for use with Excel 2003 or 1,048,576 for Excel 2007). The method described will use free software to do this and will work for very large files (even files larger that 4 gigs in size), will avoid loading the entire file into memory when processing, will maintain the format/extension of the original file, and will not add any additional information to the resulting files.

This article was borne out of a need to split a large 600 meg CSV file into several pieces that were no longer than 100,000 rows each for uploading into a client’s website. The main problem we faced in doing so was that it was impossible to load the file into tools such as Excel (or course), text editors, or even Access because doing so would invariably cause any of those apps to crash. In these cases, it was possible to split the file using a freeware program called Gsplit, which processes the file sequentially without loading it into memory all at once. This posting will describe exactly how this is done.

A step by step guide:

  • Background: I am going to describe an example whereby I split a file called “chunk.csv” into several pieces that are a maximum of 10,000 rows each. This same process will work on any type of text file (for example TXT, INI, LOG, BAT, DIZ, BAK, and QUE) and you can customize this process to exactly the number of rows you require for your use.
  1. Download and install Gsplit. This is the program that will perform the operation and it is 100% freeware, even for commercial use.
  2. Run Gsplit.
  3. Click “Original file” in the left sidebar. Click on the “browse” button under “file to split”. Browse to and click on the file you want to split. GSplit Original File3
  4. Click “Destination folder” in the left sidebar. Click on the “browse” button under “destination path” on the right. Browse to the folder where you would like your output files to be saved. GSplit destination folder
  5. Click “Type and Size” in the left sidebar. This is where all the interesting stuff will take place. Click on “Blocked Pieces” icon. Next, from the dropdown select “I want to split after the nth occurrence of a specified pattern”; and from the dropdown under that “Split after the occurrence number”. Enter the number of lines you want to split underneath (10,000 in this case, but you can enter the value that you need). Lastly, make sure that the pattern to use for splitting is “0x0D0x0A” (without quotes). This value is the hex code for a carriage return, and it should be displayed by default. Gsplit type and size
  6. Click “Filenames” in the left sidebar. In the “piece name mask” field enter “{ofw}_{num}{ore}”(without quotes). This will generate pieces with the original file extension and the original file names that look like “filename_1.csv”. Alternately, you could leave this alone altogether, go with the default generated names, and simply rename the extension to “.csv” (or whatever your original extension is) in Windows explorer. GSplit Filenames
  7. Click “Other Properties” in the left sidebar. check “do not add tags to piece files”, This will ensure that no additional information will be added to your original data. GSplit other properties
  8. GSplit SplitYou’re now ready to split. Click on the “Split” button in the upper toolbar. Your file should be split within seconds. Once the splitting process is finished you will see the “Splitting log” screen. Click on the “Open the folder in Windows Explorer” link to instantly jump to the output folder.GSplit splitting log2
  9. GSplit save profileOptional: save your splitting profile. If you are going to repeat this splitting process in the future you might consider saving these settings as a “profile” that can be loaded when you need it so that you do not have to go through these process again. To do so select “Save a Profile As” from the “File” Menu.

That’s it. Happy Splitting ;) .


 
 
 
Samer Kurdi

Samer Kurdi

Has been reviewing software since 2006 when he started Freewaregenius.com
flattr this!
  • http://www.Rarst.net Rarst

    Had similar issues recently with parsing proxy log. Not THAT large but still ~480000 lines. :)

    I ended up using Jet OLE connection and hooked it into AnySQL Maestro. Treating text file as SQL database felt weird at first but worked surprisingly well.

  • SOA

    Nice write up. Good utility.

  • John

    I don’t think whoever wrote this article used the application in question to split the 600 mb file using the “split occurance” function. Sure it might work ok for smaller files(like from the picture 17 MB) but when you start trying to use it to split files of a 600 MB nature in this way it just doesn’t work.

    • http://crediblefeet.com suren

      Hi,

      I split 25GB TXT using GSplit into 250 files of around 100MB each. It did the job in 12 minutes. GSplit is an excellent utility & yes i followed the steps mentioned here.

      Except that in type&size menu I selected –i want to set the size of all pieces–

      That worked very well. Thanks to the author of this page.

      rgds,
      Suren

  • Anonymous

    John, Samer wrote this article, and why wouldn’t it work just as well on larger files?

  • Anonymous

    it doesn’t work on large files, as claimed. i tried.

  • SRK

    I just used it to split a file that had just over 2 million lines. Got 20 files of 100,000 each. It worked great for me.

    Thanks!

  • Simz

    Worked like a charm for me, on a 800k lines CSV file.
    Thanks

  • http://a.com a

    Great post, thanks!

  • VCR

    This works wonderfully! I just completed splitting a number of huge text files (1 to 1.75 Gb).
    It took 2.5 mins to split the largest file into 4 files each with 2 million lines! 2 mil lines of my text file contents roughly translates to 500 Mb. It is still big for Notepad to open. But, MS Access can handle this easily.

    I like the UI too besides the speed of the Gplit.

    Thank you very much!

  • ronmagic

    Worked fine with a 700 MB csv file with over 3.5 million rows.
    Split into 5 parts, each with 800K lines.
    Thanks for the great article.

  • http://books-sapa.com Bassiebub

    Have to get used to it’s layout, but the first split was ok in 10 minutes. 12 files with a max of 20000 lines per file.

    Bas

  • Kev Taylor

    I had to split a 3.5 million line text file so I could open it in Excel. Could not get it to split with pattern function but used size split into 5 equal pices and it worked like a charm!

    • Samer

      @ Kev: glad to hear it!

  • Rene

    I needed to load a large – 36GB, 300 million rows – file into mysql. Tried many differnet ways but didn’t work. I came across your article, downloded GSplit followed the intstructions and have now got a fully populated table with no errors. Admittedly it took a while to load 30 files of 10 million rows each but I can’t complain at a load rate of around 8 mins per 100million rows.

    Thanks for the article. Saved a lot of headaches and what is left of my hair.

  • Rajbardhan

    Very Very Good Post.. I required this from Long Back

    Thanks a lot SAMER KURDI

  • William

    Thanks Samer Kurdi. Works like a charm for my files which go up to 600 mgb in size.

  • LuckyDj

    excelente ,Gsplit sirve con una perfeccion y velocidad inpresionante, hay que ir a donar dinero a estos muchachos , porque su software gratuito sirve mejor que muchos de paga

  • happy splitter

    works like charm

  • Bogdan Tkachyk

    You can also use this (small) CSV splitter. There is possibility to choose encoding of input file or do it automaticaly. This feature very useful if intput file have non-english language.
    http://dfiles.ru/files/mzxb7faqx

  • Bogdan Tkachyk

    You can also use this small CSVSplitter. It has nice feature to choose input file encoding, it’s very useful if file language is non-english
    (delete space before “.ru”)
    dfiles. ru/files/mzxb7faqx

  • Shobhit

    Thanks you so much you saved my nights

  • Economy

    For some txt files the pattern “0x0A” should be used instead of the pattern “0x0D0x0A” (without quotes)

  • Freesoul

    Dude
    I am speechless. I can’t thank you enough for this information. I was searching this for many days. I tried cmd and other things, which are not my cup of tea, but to no avail. I tried to use CSV splitter as mentioned in a comment, but it didn’t perform function what I required, and then comes Gsplit which save my day and night :)
    Moreover, I also like the caption at this website- Hope someday all software will be free.
    many many thanks again

  • Ralph

    Thanks!

  • http://www.basslinebargains.com Paula Mulvey

    Thank you so much for this.

  • Vignesh Kalai

    is there a way to automate file splitting i e. i get 5g.b file at 12 in the morning i want to split it to 100 mb files automatically without my instruction . is there anyway

    thank u

  • guest

    Thank you so much for this posting, really helpful