Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
How to process Hundreds of millions records in ms sql
Message
General information
Forum:
Microsoft SQL Server
Category:
Database management
Miscellaneous
Thread ID:
01507091
Message ID:
01507356
Views:
71
Hi Sergey,

I DON’T think that my article misleads anybody!! Michel at Message #1506501 thought it is a very good read and so did other people. As they provided me with a positive feedback. Your language is a bit strange, in the way you replied.

When was the last time you tried to delete 200 million records? Or when was the last time you tried to split an address to a Street Number and StreetName in 800 million records table? Will you do that when other people work on the database? Or will you do that on a production data base when constantly people adding or editing records? Obviously not.. Running such a transaction will tie up the server. So in this case no other transactions are happening…

Internally LOG file is divided into virtual log files and if it needs to grow it creates addition virtual log files. However, if you are using FULL/BULKED recovery models and running BACKUP LOG commands SQL Server is able to reuse those virtual log files and therefore keep physical size of the log from growing.

It also depends how large is your batch. When using 1 or 2 million records per batch the log file does not grow in my case and other’s people case. It is possible that during batch processing SQL Server needs to allocate more virtual logs and as result log file will be grown. Before moving to the new range of the batching it is possible to backup log file to flush all committed transactions to the disk to make SQL Server reuse VL files.

But, always before I do the batch processing I make sure I have a nice full back up and let the batch run. For back up I use LightSpeed and the Enterprise version. I use the batch technique for the last 2 years and it works like a charm. I wish I would know this technique before. Not many people know how to work with a large records table and this article saved many people a lot of time.

Well, I work a lot with billions of records and also use SSIS for different data cleansing and data migration etc... With SSIS you get also an amazing performance but that requires learning new technology. See some things that I do: http://dfarber.com/microsoft-solutions/mssql-server.aspx

Regards,

Doron
http://www.dfarber.com


>Hi Doron,
>
>The following statement "This way the log file stays small and whenever a new process starts, the new batch will reuse the same log file space and it will not grow" from your article is misleading. It's only true if database is in Simple recovery mode or it's in Full recover mode and log backup is done between batches. And even then another transaction may prevent reusing previous batch log space.
>
>>
>>I added another important part into this article, which helps to debug bad records in ms sql tables. It will show you how to process hundreds of GB of data in less than an hour and keeps the log file small while not choking your hard disk.
>>
>>See this article:
>>http://www.dfarber.com/computer-consulting-blog/2011/1/14/processing-hundreds-of-millions-records-got-much-easier.aspx
>>
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform