Yes, they get converted. I will email you tomorrow!
You have to reach trust level 1 to unlock PMs, and you havenāt spent enough time reading topics here to achieve that yet.
Hi, Iām on Docker container (DigitalOcean) and Iām trying to import mybb (I know this tutorial is for Vbulletin but Gemfile thing is similar). Iām stuck to bundle install:
Hello,
We tried to import vbulletin v3.8 to Discourse. This script work fine, with database 300Mb, around 40k user and 60k post. But at the end of this import process we faced the problem with charset.
- our vBulletin 3.8 encoded with charset: latin1
----> when process import script MySQL 5.6 on Discourse docker also config with charset UTF-8, - The import script force the import process convert data to UTF-8,
so at the end of import process Discourse forum display data as UTF-8 encode error. Itās look like below picture.
-
before import, vB 3.8
-
after import to Discourse
We tried:
- Convert charset on vB 3.8 to UTF-8 before run import script
- Tested this vb 3.8 database on new one Mysql server the text display normal no encoded error happen.
So, could you have any advice in this case ?
Appreciate for any support about this (Also very sorry about my English if you hard to understand)
Hereās a piece of how I fixed a similar problem:
### WIN1252 encoding
win_encoded = ''
begin
win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
invalid: :replace, undef: :replace, replace: ""
).force_encoding('utf-8').scrub
rescue => e
puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
win_encoded = ''
end
raw = win_encoded
You save my life.
For the easy way, I tried the convert script on post Importing from phpBB3 , help me fix my database charset problem quickly, and it work like charm now.
Thank you so much for advise
Did anyone migrate using the vbulletin5 importer? I may use it in the future, Iād like to know if it already has been used flawlessly.
I just did an import of vBulletin5 and added some features (permalinks, some formatting, and maybe some other things I donāt remember). I intend to submit a PR, but it hasnāt happened yet.
I have a vb5 database dump that contains attachments in it. Can I import them in Discourse or do I need to have all the attachments as files?
Also confused about this. Where should I copy the attachment files in the discourse folder exactly?
Hi again,
From what I understand, attachments from the database will work as they seem to be handled the same way as avatars which are in the database as well.
My import is going well, but I ran an arror at 91% of the imported posts
importing posts...
1425149 / 1564573 ( 91.1%) [1040 items/min] Traceback (most recent call last):
14: from script/import_scripts/vbulletin5.rb:726:in `<main>'
13: from /home/canapin/discourse/script/import_scripts/base.rb:47:in `perform'
12: from script/import_scripts/vbulletin5.rb:49:in `execute'
11: from script/import_scripts/vbulletin5.rb:300:in `import_posts'
10: from /home/canapin/discourse/script/import_scripts/base.rb:862:in `batches'
9: from /home/canapin/discourse/script/import_scripts/base.rb:862:in `loop'
8: from /home/canapin/discourse/script/import_scripts/base.rb:863:in `block in batches'
7: from script/import_scripts/vbulletin5.rb:320:in `block in import_posts'
6: from /home/canapin/discourse/script/import_scripts/base.rb:508:in `create_posts'
5: from /usr/local/rvm/gems/ruby-2.6.5/gems/rack-mini-profiler-2.0.4/lib/patches/db/mysql2.rb:8:in `each'
4: from /usr/local/rvm/gems/ruby-2.6.5/gems/rack-mini-profiler-2.0.4/lib/patches/db/mysql2.rb:8:in `each'
3: from /home/canapin/discourse/script/import_scripts/base.rb:509:in `block in create_posts'
2: from script/import_scripts/vbulletin5.rb:321:in `block (2 levels) in import_posts'
1: from script/import_scripts/vbulletin5.rb:450:in `preprocess_post_raw'
script/import_scripts/vbulletin5.rb:450:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)
How can I properly identify the post to see what the content looks like in the vbulletin database?
Someone suggested ways to use rescue
to solve those, so you might go back and find that (I canāt remmber if it was in this topic or another one). You could put a put
in the rescue
to print out the id
and/or the text that caused the problem.
You have an encoding problem.
I used this in a similar import (I think youād put it in preprocess_post_raw
)
begin
win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
invalid: :replace, undef: :replace, replace: ""
).force_encoding('utf-8').scrub
rescue => e
puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
win_encoded = ''
end
Hi,
I modified the the importer and added your script as following:
def preprocess_post_raw(raw)
return "" if raw.blank?
begin
win_encoded = raw.force_encoding('utf-8').encode("Windows-1252",
invalid: :replace, undef: :replace, replace: ""
).force_encoding('utf-8').scrub
rescue => e
puts "\n#{'-'*50}\nWin1252 failed for \n\n#{raw}\n\n"
win_encoded = ''
end
# decode HTML entities
raw = @htmlentities.decode(raw)
# fix whitespaces
raw = raw.gsub(/(\\r)?\\n/, "\n")
.gsub("\\t", "\t")
The invalid byte sequence in UTF-8 happens on this part : raw = raw.gsub(/(\\r)?\\n/, "\n") .gsub("\\t", "\t")
.
Then I started the importer again. Though it skips already importing data, it took about 6 hours to get to the post that generates an error, and it didnāt add the expected information to see the post content. Any idea why?
edit:
This is probably the post raw content that leads to the error:
I wonder if Billy is enjoying the parade.
Qwertyuiopasdfghjklzxcvbnm��
Iāll try to modify the importer script to make it skip (for real) the previous 1.4M posts. WIsh me luck.
I modified many other importers to include an import_after setting to allow importing only recent data. You can look at some others to see how I did that.
Hi,
Iāve been able to import almost all my posts! I fixed a few dozens by hand and restarted the import each time it came across a new utf-8 errorā¦
Now, I need to import the attachments (which are stored in the VBulletin database), but it doesnāt work:
When it starts the process, my ram consumption increases a lot in about 10 or 20 seconds and this error happens:
importing attachments...
Failed to create upload: Cannot allocate memory - grep
Fail
My ram:
I use a Discourse development version on a Ubuntu 18 subsystem on Windows 10 and I have 16 GB of RAM.
The attachments take 7 GB from the 13 GB vBulletin database.
Note that I use the vbulletin5 importer.
The issue comes from this query:
SELECT n.parentid nodeid, a.filename, fd.userid, LENGTH(fd.filedata) AS dbsize, filedata, fd.filedataid
FROM #{DB_PREFIX}attach a
LEFT JOIN #{DB_PREFIX}filedata fd ON fd.filedataid = a.filedataid
LEFT JOIN #{DB_PREFIX}node n on n.nodeid = a.nodeid
If I do this query in mysql, my remaining ram is filled within seconds.
(editing my post to remove useless info and questions since Iām figuring out things and providing a workaround)
Workaround:
I added a limit and a offset to the importer SQL query. I imported the attachments by selecting 20000 of them each time:
uploads = mysql_query <<-SQL
SELECT n.parentid nodeid, a.filename, fd.userid, LENGTH(fd.filedata) AS dbsize, filedata, fd.filedataid
FROM #{DB_PREFIX}attach a
LEFT JOIN #{DB_PREFIX}filedata fd ON fd.filedataid = a.filedataid
LEFT JOIN #{DB_PREFIX}node n on n.nodeid = a.nodeid
LIMIT 20000 OFFSET 0
SQL
I also added an exit
at the end of the uploads.each do |upload|
loop to prevent the import script to continue doing things after importing my 20000 uploads.
When my 10000 uploads are imported, I edit the script (thanks nano +353 ./scripts/import_scripts/vbulletin5.rb
to open the file at the right line) to increase the SQL query OFFSET
by 10000, and start the importer againā¦ And doing so for my 65000 attachments.
During the attachment imports, I faced several errors and warnings including:
W, [2020-08-20T12:05:37.402860 #31042] WARN -- : Bad date/time value "0000:00:00 00:00:00": mon out of range
-
Post for 490451 not found
(dangling old attachments I guess?) - some EXIF data error it seems
-
Fail
This one puzzled me and stopped the import script. I checked the first āFailā I got and the bulletin attachment was sort of broken (no filename), so I commented theexit
instruction to let the importer continue his importing job when he āfailsā, hoping that wouldnāt break anything.
puts "Fail"
#exit
I also had a more annoying error that interrupted the import:
1: from /usr/local/rvm/gems/ruby-2.6.5/gems/activerecord-6.0.3.2/lib/active_record/validations.rb:53:in `save!'
/usr/local/rvm/gems/ruby-2.6.5/gems/activerecord-6.0.3.2/lib/active_record/validations.rb:80:in `raise_validation_error':
Validation failed: Body is limited to 32000 characters; you entered 32323. (ActiveRecord::RecordInvalid)
Fortunately, it was a rare error, and I just skipped this attachment until I came across the next identical error. It happened maybe a dozen times on a total of 65000 attachments. I just restarted the import script with a different SQL query offset.
Hi,
I noticed that the custom field import_pass was absent for about 400 users of my remaining 27000 users (I cleaned up 154000 inactive users).
Any idea why?
The forum was migrated from phpBB to vBulletin in May. Could it have something to do with that?
I wonāt try to āfixā this thing and import passwords for these 400 users (unless thereās an easy way to do itā¦?) and thatās not a big issue, so Iām just being curious more than anything else.
Hey guys,
Imported images have the wrong width/height ratio unless I rebake the posts. Iād like to find a way to have the correct ratio (during the import for example) without rebaking.
More verbose description of the issue:
From what I understand, imported posts arenāt ābakedā when Discourse creates the corresponding post (though the cooked field is generated somehow), so thatās why importing posts is way faster than baking existing Discourse posts.
My issue is that my imported images have the wrong width/height ratio.
Example of the raw Discourse text related to an imported image:
![SH-MUniFrame.JPG|600x800](upload://6Li1nnjbA8zDz6YJ3FqeYHV5zXK.jpeg)
The content of the ācookedā field:
<img src="https://d11a6trkgmumsb.cloudfront.net/original/3X/0/3/0379f53ed8221730ccb31807238e9c46e9fe1d37.jpeg" alt="SH-MUniFrame.JPG" data-base62-sha1="6Li1nnjbA8zDz6YJ3FqeYHV5zXK" width="517" height="500" class="d-lazyload">
How the image appears in: the post
Here is the original image: https://d11a6trkgmumsb.cloudfront.net/original/3X/f/7/f73a0ae8594219dd5a1620e59b3c17f9b02b1583.jpeg
The original image size from the vBulletin database is:
select width, height from filedata where filedataid = 76237
+-------+--------+
| width | height |
+-------+--------+
| 600 | 800 |
+-------+--------+
My understanding is that the height attribute is constrained by Discourseās setting which sets a max height of 500px, hence the same value in the <img>
height attribute. The <img>
width is somewhat modified from 600 to 517 though I canāt figure it how and why.
The issue is the same for older images that have 0 in both width and height vBulletin attachment fields. They also have the wrong height/width issue. I donāt know if these values are really used during the import.
The issue is resolved by rebaking (rebuild HTML) the post. the image will then be properly resized and the image viewer is added. But I have 1.6M posts and Iād prefer to avoid rebake all of them.
A quick fix would be to use this CSS on my Discourse:
.cooked img:not(.emoji) {
height: auto;
width: auto;
}
But it implies that no one will be able to choose an arbitrary size when uploading an image, and there may be collateral effects Iām not aware of.
Why idea how I could have proper images width/height ratio on imported attachments?
I suspect thatās because you didnāt let them cook after the import. I cannot imagine a way to solve the problem without rebaking the posts. Perhaps you want to just rebake the posts that are broken rather than all of them?
Are they supposed to be cooked automatically over time after the import? Starting from the last or the first created post?
Thatās not a big issue though, and if they are not automatically cooked, Iād probably start a rebake of all posts and be patient, although I admit that I read this post a few days ago and it scared me a little bit: My journey into a massive posts rebake job. I also have questions about that, but Iāll ask them in the proper topic.
Hmm yes that looks like itās my code. Sorry for that.
This should be following this pattern:
batches(BATCH_SIZE) do |offset|
(Sql code)
LIMIT #{BATCH_SIZE}
OFFSET #{offset}
(Other code)
end
Just raise the max post length
site setting prior to the import.