Let’s say you have an application where a User can subscribe to a Channel. With ActiveRecord associations, it would look something like this:
# app/models/subscription.rb class Subscription < ActiveRecord::Base belongs_to :channel belongs_to :user end # app/models/user.rb class User < ActiveRecord::Base has_many :subscriptions has_many :users, through: :subscriptions end # app/models/channel.rb class Channel < ActiveRecord::Base has_many :subscriptions has_many :users, through: :subscriptions end
Unfortunately, someone forgot to add
dependent: :destroy to the
has_many :subscriptions. When a user or channel was deleted, an orphaned subscription was left behind.
This issue was fixed by
dependent: :destroy, but there was still a large number of orphaned records lingering around.
There are three ways you can use to remove the orphaned records.
Approach #1 — Bad
Subscription.find_each do |subscription| if subscription.channel.nil? || subscription.user.nil? subscription.destroy end end
This executes a separate SQL query for each record, checks whether it is orphaned, and destroys it if it is.
Approach #2 — Better, but still pretty bad
Subscription.all.each do |subscription| if subscription.channel.nil? || subscription.user.nil? subscription.destroy end end
This loads all records into memory, and then iterates over them performing the same check as above.
Approach #3 — Good
Subscription.where([ "user_id NOT IN (?) OR channel_id NOT IN (?)", User.pluck("id"), Channel.pluck("id") ]).destroy_all
This approach first gets the IDs of all Users and Channels, and then executes one query to find all Subscriptions that don’t belong to either a User or a Query.
Let’s take a look at how long it takes in each case.
When you run it on 2596 subscriptions, out of which 1058 are orphaned, you get:
user system total real bad 3.020000 0.160000 3.180000 ( 4.058246) better 2.950000 0.170000 3.120000 ( 3.982329) good 0.010000 0.000000 0.010000 ( 0.030346)
Even as the number of subscriptions is reduced,
Subscription.all.each remains faster than
Subscriptions.find_each, because it performs less SQL queries.
With 10 subscriptions, the results were
user system total real bad 0.010000 0.000000 0.010000 ( 0.022374) beter 0.010000 0.010000 0.020000 ( 0.017584) good 0.010000 0.000000 0.010000 ( 0.014330)
Approach #3 is a whopping 134 times faster than approach #1.
While the time difference in our scenario was only 4 seconds, it could be hours for millions of records.
This demonstrates how important refactoring is to the development process. Refactoring your code furthers your understanding of the technologies you work with. It also illustrates how important it is to minimize the SQL queries performed within your web application. If this was an action initiated by the user, the difference in the request time would be 4.02 seeconds vs. 30 ms.