There is no reason to try to limit the feature to a category…
Unless you have a lot of topics and consider the cost of translating everything
This is why we were thinking to stick to one category.
This in itself kind of reminded me of this bit from the first post:
I think it would be a substantial amount for our Forum if we enable it for all categories
Could you offer any advise on how to estimate that cost? I think it is based on the number of characters, but I don’t know how the plugin works on the backend. Does it maybe grab only first X characters of each post to reduce the cost of language detection?
Thus, we were thinking of our use-case and it would be to allow English speakers to translate non-English posts in one specific category to be able to answer in English and then the topic authors being able to translate these responses to their language (=the language of the first post).
The cost is a good point that I hadn’t considered. We use microsoft’s translation service and have never had to pay for it, but then the site I set this up for is fairly small. Maybe limiting translations by category is indeed a valid feature request.
Personally I’ve also never really fully understood how much is sent to the translator and how it works in practice. It just “works”.
Here’s how I did the cost estimate for my forum. All of the queries are for the Data Explorer.
Estimate the average number of characters per post
The plugin sends the cooked text to the translation service last I checked.
SELECT AVG(LENGTH(p.cooked))
FROM posts AS p
JOIN topics AS t ON p.topic_id = t.id
WHERE t.archetype != 'private_message'
Estimate the number of posts read per user visit
I took the last 30 days to get a relatively recent estimate.
-- [params]
-- int :from_days_ago = 0
-- int :duration_days = 30
WITH t AS (
SELECT CURRENT_TIMESTAMP - ((:from_days_ago + :duration_days) * (INTERVAL '1 days')) AS START,
CURRENT_TIMESTAMP - (:from_days_ago * (INTERVAL '1 days')) AS END
)
SELECT AVG(posts_read)
FROM user_visits
JOIN t ON visited_at > t.START AND visited_at < t.END
Number of user visits in the last 30 days
-- [params]
-- int :from_days_ago = 0
-- int :duration_days = 30
WITH t AS (
SELECT CURRENT_TIMESTAMP - ((:from_days_ago + :duration_days) * (INTERVAL '1 days')) AS START,
CURRENT_TIMESTAMP - (:from_days_ago * (INTERVAL '1 days')) AS END
)
SELECT COUNT(1)
FROM user_visits
JOIN t ON visited_at > t.START AND visited_at < t.END
Estimate of number of characters read in the last 30 days
Multiplying the three previous figures together gave me an estimate of the number of cooked characters of posts that were read in the last 30 days.
Estimate of number of non-primary language users
Since English is the primary language for our forum, I used Google Analytics to determine the percentage of users that had their browsers configured for a non-English language.
Final estimate
Then I did a low/medium/high estimate by assuming that the current rate of non-English visitors would be the “common case”, halved that for the low estimate, and doubled it for the high estimate. That gave me a low/medium/high number of characters in 30 days and multiplied that by the rate per X characters for the translation service.
It might be nice to still correlate the data from the users.locale field and get a % of users who have it set to non-english value (if your site doesn’t automatically adapt based on the user environment, which I think is an option in admin settings)
did you notice a significant spike when you first launched the plugin based on this?
I believe something like this could still be added to complete the estimation:
SELECT LENGTH(COALESCE(string_agg(posts.cooked, ''),''))
FROM posts
JOIN topics on posts.topic_id = topics.id
WHERE topics.archetype <> 'private_message'
I didn’t do any tracking after we decided to launch, so I’m not sure if that impacted anything, no. But yes, including your query into the cost estimate would be good