I want to add a column to my users table and generate slug for all of them. The problem is I have over 1 million users in my DB.
I have seen various blogs explaining different methods but I do not want to take risk by doing it in my production DB.
Methods that I found:
The below method suggests to add the code to generate slug in migration file itself.
class AddStatusToUser < ActiveRecord::Migration class User < ActiveRecord::Base end def up add_column :users, :status, :string User.find_each do |user| user.status = 'active' user.save! end end def down remove_column :users, :status end end
I have written this method which is run by a rake task: The problem with the below one is that it has been running 4 days in which only 400 000 slugs have been generated so far. I wanted to do it quickly, but do not know how.
Yields each batch of records that was found by the find options as an array. The size of each batch is set by the :batch_size option; the default is 1000.
You can control the starting point for the batch processing by supplying the :start option. This is especially useful if you want multiple workers dealing with the same processing queue. You can make worker 1 handle all the records between id 0 and 10,000 and worker 2 handle from 10,000 and beyond (by setting the :start option on that worker).
It’s not possible to set the order. That is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work. This also mean that this method only works with integer-based primary keys. You can’t set the limit either, that’s used to control the batch sizes.
In order to avoid DB performance issues I have given a sleep time of 2 seconds after every slug generation for 1000 users. Should I remove the sleep method? Should I just run User.find_each(&:save)
or method 1?
task :add_slug_to_all_users => :environment do
i=0
batchSize = 1000
puts "started at :#{Time.now}"
# find_in_batches method provides the users in batches of 1000
# so that the update is not triggered for all the rows at once which may lock the table completely.
User.where("slug is null and email is not null").find_in_batches(batch_size: batchSize) do |users|
sleep(2)
users.each {|u| u.save!; i+=1;}
puts "updated #{i} records at: #{Time.now}"
end
puts "Completed the Task at: #{Time.now}\n"
end
Update 1: I am using friendly_id gem to generate slugs.
Update 2: I have run SHOW CREATE TABLE users
and I got this:
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`last_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`p_views` int(11) DEFAULT '0',
`p_desc` text COLLATE utf8_unicode_ci,
`p_title` text COLLATE utf8_unicode_ci,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`t_zone` varchar(255) COLLATE utf8_unicode_ci DEFAULT 'UTC',
`college` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL,
`degree` text COLLATE utf8_unicode_ci,
`p_no` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`slug` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_phone_number` (`p_no`),
UNIQUE KEY `index_users_on_phone_no` (`p_no`),
UNIQUE KEY `index_users_on_slug` (`slug`),
KEY `use_index_on_college` (`college`(255))
) ENGINE=InnoDB AUTO_INCREMENT=2194 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
Please note that I have removed most of the fields from the above result. The column slug
stores the combination of first_name and last_name in a url friendly manner.
For e.g. if a user's name is:
id first_name last_name
1 Arun Kumar
2 Arun Kumar
The slug generated would look like this:
id slug
1 arun-kumar
2 arun-kumar1