What's the difference between find_each, find_in_batches,  in_batches in Rails?

Photo by Ilya Pavlov on Unsplash

What's the difference between find_each, find_in_batches, in_batches in Rails?

·

2 min read

Sometimes we will use klass_name.all to get records from the database.

That’ll be ok, if we only have 100 records in the database.

However, using klass_name.all might not be the best way to get records, especially when we need to query large numbers of records.

For example, when we have 1 million records and using the query below, ActiveRecord will instantiate all the objects at once. The memory consumption will increase quickly. The worst case is the application will be unable to load any additional program.

Project.all.map { |p| p.do_something_great }

Rails provides find_each, find_in_batches, and in_batches these three public methods to work with the records in batches, which helps reduce memory consumption.

What’s the difference between the three of them? Let’s see!


find_in_batches

Generally, if we do not specify the size of the batch, the default batch size is 1,000.

For example, there are 3,000 records, No.1~1,000 records will be the first batch, then No.1001~2000 will be the second batch, and so on.

If the block isn’t given to find_in_batches, it returns an Enumerator:

Project.find_in_batches.class
#=> Enumerator < Object

Project.find_in_batches.first.class
#=> Array < Object

If the block is given...

After records of each batch finish project.do_something_great!, the type of projects will be changed to an array.

Project.where(status: 'success').find_in_batches do |projects|
  projects.each { |project| project.do_something_great! }
end

find_each

The same as find_in_batches, the default batch size is 1,000.

If the block isn’t given to find_each, it returns an Enumerator:

Project.find_each.class
#=> Enumerator < Object

Project.find_each.first.class
#=> Project < ApplicationRecord

If the block is given, it will call find_in_batches.

# File activerecord/lib/active_record/relation/batches.rb, line 68

def find_each(start: nil, finish: nil, batch_size: 1000, error_on_ignore: nil, order: :asc)
  if block_given?
    find_in_batches(start: start, finish: finish, batch_size: batch_size, error_on_ignore: error_on_ignore, order: order) do |records|
      records.each { |record| yield record }
    end
  else
    #....
  end
end

According to the source code, we can get the same result from the two queries below. So, if we would like to iterate in batches, we can use find_each as a shortcut.

Project.where(status: 'success').find_in_batches do |projects|
  projects.each { |project| project.do_something_great! }
end

Project.where(status: 'success').find_each do |project|
  project.do_something_great!
end

in_batches

The default batch size is 1,000, too.

If the block isn’t given to in_batches, it returns a BatchEnumerator.

Different from find_each and find_in_batches return Enumerator, in_batches returns BatchEnumerator, and the type of each record is an ActiveRecord_Relation object.

Project.in_batches.class
#=> ActiveRecord::Batches::BatchEnumerator < Object

Project.in_batches.first.class
#=> Project::ActiveRecord_Relation < ActiveRecord::Relation

If the block is given…

Yields ActiveRecord::Relation objects to work with a batch of records.

Project.where(status: 'success')in_batches do |projects|
  projects.update_all(status: 'draft')
end

Reference Info: