Importing mbox files fails at creating topics due to html_to_markdown.rb error


(Yaw Anokwa) #1

I am using Discourse v1.8.0.beta12 +34 and I’ve used Icy’s Google Group Crawler to generate mboxes of the various Google Groups I’m importing into Discourse. I’ve successfully imported a couple of groups, but I’ve run into an issue where the script errors with this message.

creating forum topics
	441 / 1429 ( 30.9%)  [104 items/min]  /var/www/discourse/lib/html_to_markdown.rb:117:in `visit_li': undefined method `name' for nil:NilClass (NoMethodError)
		from /var/www/discourse/lib/html_to_markdown.rb:63:in `visit'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `block in traverse'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:187:in `block in each'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `upto'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `each'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `traverse'
		from (eval):3:in `visit_div'
		from /var/www/discourse/lib/html_to_markdown.rb:63:in `visit'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `block in traverse'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:187:in `block in each'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `upto'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `each'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `traverse'
		from /var/www/discourse/lib/html_to_markdown.rb:63:in `visit'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `block in traverse'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:187:in `block in each'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `upto'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `each'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `traverse'
		from /var/www/discourse/lib/html_to_markdown.rb:63:in `visit'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `block in traverse'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:187:in `block in each'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `upto'
		from /usr/local/lib/ruby/gems/2.3.0/gems/nokogiri-1.7.1/lib/nokogiri/xml/node_set.rb:186:in `each'
		from /var/www/discourse/lib/html_to_markdown.rb:48:in `traverse'
		from /var/www/discourse/lib/html_to_markdown.rb:42:in `to_markdown'
		from /var/www/discourse/lib/email/receiver.rb:198:in `select_body'
		from mbox.rb:426:in `block (2 levels) in create_forum_topics'
		from /var/www/discourse/script/import_scripts/base.rb:432:in `block in create_posts'
		from /var/www/discourse/script/import_scripts/base.rb:431:in `each'
		from /var/www/discourse/script/import_scripts/base.rb:431:in `create_posts'
		from mbox.rb:419:in `block in create_forum_topics'
		from /var/www/discourse/script/import_scripts/base.rb:784:in `block in batches'
		from /var/www/discourse/script/import_scripts/base.rb:783:in `loop'
		from /var/www/discourse/script/import_scripts/base.rb:783:in `batches'
		from mbox.rb:413:in `create_forum_topics'
		from mbox.rb:57:in `execute'
		from /var/www/discourse/script/import_scripts/base.rb:45:in `perform'
		from mbox.rb:555:in `<top (required)>'
		from googlegroup.rb:51:in `load'
		from googlegroup.rb:51:in `import_to_discourse'
		from googlegroup.rb:73:in `<main>'

Within the index.db that is generated by mbox.rb, I was able to use the following query to find the problematic email.

select msg_id from emails where reply_to is null order by date(email_date) limit 1 offset 441;

The text of the problematic email is:

X-Received: by 10.180.19.132 with SMTP id f4mr6705803wie.2.1361472079953;
        Thu, 21 Feb 2013 10:41:19 -0800 (PST)
X-BeenThere: opendatakit-developers@googlegroups.com
Received: by 10.180.74.115 with SMTP id s19ls244190wiv.28.gmail; Thu, 21 Feb
 2013 10:41:19 -0800 (PST)
X-Received: by 10.180.94.41 with SMTP id cz9mr4982820wib.1.1361472079334;
        Thu, 21 Feb 2013 10:41:19 -0800 (PST)
Received: by 10.194.136.148 with SMTP id qa20mswjb;
        Thu, 21 Feb 2013 09:58:15 -0800 (PST)
X-Received: by 10.180.93.230 with SMTP id cx6mr6427354wib.5.1361469494823;
        Thu, 21 Feb 2013 09:58:14 -0800 (PST)
X-Received: by 10.180.93.230 with SMTP id cx6mr6427353wib.5.1361469494759;
        Thu, 21 Feb 2013 09:58:14 -0800 (PST)
Return-Path: <msundt@cs.washington.edu>
Received: from mail-wg0-f49.google.com (mail-wg0-f49.google.com [74.125.82.49])
        by gmr-mx.google.com with ESMTPS id cv1si227271wib.0.2013.02.21.09.58.14
        (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
        Thu, 21 Feb 2013 09:58:14 -0800 (PST)
Received-SPF: neutral (google.com: 74.125.82.49 is neither permitted nor denied by domain of msundt@cs.washington.edu) client-ip=74.125.82.49;
Authentication-Results: gmr-mx.google.com;
       spf=neutral (google.com: 74.125.82.49 is neither permitted nor denied by domain of msundt@cs.washington.edu) smtp.mail=msundt@cs.washington.edu;
       dkim=pass header.i=@cs.washington.edu
Received: by mail-wg0-f49.google.com with SMTP id 15so7416651wgd.28
        for <opendatakit-developers@googlegroups.com>; Thu, 21 Feb 2013 09:58:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cs.washington.edu; s=goo201206;
        h=mime-version:x-received:in-reply-to:references:date:message-id
         :subject:from:to:cc:content-type;
        bh=Buq7rT8CzH7De/JtrpoxUeJdzRrJ9ebIwPE9NSuOC64=;
        b=ScbOu/UEky1QHpZxiATAYDUKxqEiIACnA0CJwTfzo1no0xjqXuHeHXVUWDLsb4aAbk
         dwXTSK0l2dccz/fz+qb2mPFtPN5FSFHJ2T38YRE7Y/btJOinp7gRolGVnTuxJLggsZU2
         2wg+x3zRwil32GSYH4PEqygT1B4fF3JTzGIGM=
MIME-Version: 1.0
X-Received: by 10.194.238.226 with SMTP id vn2mr43228790wjc.23.1361469494368;
 Thu, 21 Feb 2013 09:58:14 -0800 (PST)
Received: by 10.217.104.197 with HTTP; Thu, 21 Feb 2013 09:58:14 -0800 (PST)
In-Reply-To: <117D9812611FDE4FA732C14A851163E543674E06@CH1PRD0102MB145.prod.exchangelabs.com>
References: <CAMPkn0C=Hm3n06Ok-u5Y7aBo-QTv0LoOb1dOUAcsevrAoNqqOQ@mail.gmail.com>
	<117D9812611FDE4FA732C14A851163E543674E06@CH1PRD0102MB145.prod.exchangelabs.com>
Date: Thu, 21 Feb 2013 09:58:14 -0800
Message-ID: <CAO69cUAShG6n51t=iHNrh2h0Yb-tShNPqEfAs_MS=smKst9GmQ@mail.gmail.com>
Subject: Re: Appspot quota
From: Mitchell Sundt <msundt@cs.washington.edu>
To: opendatakit-developers@googlegroups.com
Cc: Amarjeet Singh <amarjeet@iiitd.ac.in>
Content-Type: multipart/alternative; boundary=089e01493c9897e72904d63fd17d
X-Gm-Message-State: ALoCoQkaRp0jxqJY2XkY8WaEWSqgu9Mp735fLsw+Vf7hvP64h7s7H8G6tUk9kR9HMgxrV9nTvPhh

--089e01493c9897e72904d63fd17d
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Adding opendatakit-developers@ to this, as the answers could be of interest
to other developers...

The error page: 'you may have exceeded quota' is displayed whenever the
server has an uncaught exception or otherwise returns a 500 error (internal
server error). It could have several causes, but the most likely is an
over-quota error. There is no way to be more specific with this error, as
it is using the static error page to present the error and, by the time you
have an over-quota exception, you can't reliably execute any java code to
diagnose the true cause.

If the appengine dashboard indicates that the app is not over-quota (no red
bars), and that the app is not currently subject to resource throttling
(this will be a small triangular ! warning just below the Billing Status
heading), then you should examine the Logs for the application (from the
appengine Dashboard, click on the 'Logs' link on the left sidebar). Look
first for log entries with Critical or Error entries. If none show up, then
look at warnings.  The key warning to look for is the 'your request took
longer than 60 seconds, so AppEngine terminated it' warning.  Otherwise,
any error within the ODK Aggregate code should generate a Critical or Error
entry.

--------------
As for profiling, I'd be interested in hearing your findings. Profiling
AppEngine is problematic because you cannot glean any useful information
from running in the development environment.  ODK Aggregate is a bit odd in
that it does not use Memcache because configuring that on Tomcat would have
greatly complicated the installer. Instead, frequently-accessed values are
cached within the Java layer and refreshed about every 3-6 seconds.

AppEngine performance is definitely limited to what we can do with
BigTables datastores without using any indices beyond the default
one-dimensional indices (this is because our data tables are dynamically
constructed, and there is no Java API to dynamically construct composite
indices). There is very little optimization you can do for AppEngine
datastore interactions because of this -- the limitations are effectively
written into the code.

MySQL and PostgreSQL are much more flexible, and database profiling can be
used to create extra indices as needed to tailor and tune for performance,
and those extra indices will be used by the database to improve query
performance.

Additionally, MySQL and PostgreSQL databases both provide mechanisms for
tracking how many of each type of query or datastore action (insert/delete)
are performed (e.g., CRUD reports, query timing reports).  So no
instrumentation is provided on that activity within ODK Aggregate.

AppEngine, however, provides no visibility into things like CRUD reports or
query timing reports.

To remedy that, code within ODK Aggregate periodically emits the equivalent
of a CRUD report (independent of the datastore used).  This is a set of log
entries of the form:
2012-12-05 15:03:38.317
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: ---------- 2012-12-05T23:03:38.317+0000 @ 3892 ------------
 I 2012-12-05 15:03:38.318
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: TASK_LOCK,0,0,0,0,0
 I 2012-12-05 15:03:38.318
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: _COST_LOGGING_,1,1,0,0,0
 I 2012-12-05 15:03:38.318
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: _STRING_FIELD_LENGTHS_,0,0,0,0,0
 I 2012-12-05 15:03:38.318
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit.EXTREMELY_WIDE_FLAT_CORE,0,0,0,0,0
 I 2012-12-05 15:03:38.319
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._backend_actions,0,0,6,0,0
 I 2012-12-05 15:03:38.319
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._filter_group,0,0,0,0,0
 I 2012-12-05 15:03:38.319
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_data_model,0,0,0,0,0
 I 2012-12-05 15:03:38.319
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info,4,52,0,0,0
 I 2012-12-05 15:03:38.320
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info_fileset,0,0,0,0,0
 I 2012-12-05 15:03:38.320
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info_manifest_bin,0,0,0,0,0
 I 2012-12-05 15:03:38.320
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info_submission_association,0,0,0,0,0
 I 2012-12-05 15:03:38.320
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info_xform_bin,22,22,0,0,0
 I 2012-12-05 15:03:38.320
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info_xform_blb,0,0,1,0,0
 I 2012-12-05 15:03:38.321
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_info_xform_ref,1,1,0,0,0
 I 2012-12-05 15:03:38.321
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._form_service_cursor,0,0,0,0,0
 I 2012-12-05 15:03:38.321
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._fusion_table,0,0,0,0,0
 I 2012-12-05 15:03:38.321
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._fusion_table_repeat,0,0,0,0,0
 I 2012-12-05 15:03:38.322
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._granted_authority_hierarchy,1,13,0,0,0
 I 2012-12-05 15:03:38.322
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._misc_tasks,0,0,0,0,0
 I 2012-12-05 15:03:38.322
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._persistent_result_file_bin,0,0,0,0,0
 I 2012-12-05 15:03:38.322
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._persistent_results,0,0,0,0,0
 I 2012-12-05 15:03:38.322
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._registered_users,1,1,2,0,0
 I 2012-12-05 15:03:38.323
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._security_revisions,0,0,7,0,0
 I 2012-12-05 15:03:38.323
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._server_preferences,0,0,0,0,0
 I 2012-12-05 15:03:38.323
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: opendatakit._user_granted_authority,2,2,0,0,0
 I 2012-12-05 15:03:38.323
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics
logUsage: -----------------------------------------
The comma-separated numbers are reset to zero after this report is logged,
and the 5 values correspond to:


   1. queries issued,
   2. rows of query results returned,
   3. count of gets (query by-primary-key),
   4. count of puts (insert/update),
   5. counts of deletes


These are reported against each table in the datastore (the string after
logUsage: and before the string of numbers).
They are reported for all datastores, so you should not see any significant
variation in these numbers when migrating from AppEngine to MySQL or
PostgreSQL.

-----------
To get the AppEngine-equivalent functionality to a longest-query report on
MySQL or PostgreSQL, there is a cost logging feature within ODK Aggregate
(and only within the AppEngine datastore).
If any query takes longer than the value of the
COST_LOGGING_MEGACYCLE_THRESHOLD in the _COST_LOGGING_ table, then ODK
Aggregate will emit a log message with the query and the time the query
took. These are logged as Warnings to the log, and are the most common
warnings you'll see in the log.

An example of this is:
  W 2012-12-05 15:03:39.029
org.opendatakit.common.persistence.engine.gae.ExecutionTimeLogger
intermediateLogging: 000268 **intermediate** Form.getForms[_form_info]
SELECT * FROM opendatakit._form_info WHERE _URI >=3D
md5:04f2a292201cba8fac41e968e19636e9 ORDER BY _URI
 W 2012-12-05 15:03:39.029
org.opendatakit.common.persistence.engine.gae.ExecutionTimeLogger
wrapUp: 000316 **final** Form.getForms[_form_info]
The **intermediate** entry reports that the given query took 000268 units
to complete, and the **final** reports that the overall time was 000316
units.

By default, the unit threshold for logging long queries is 120 units. You
can use the Datastore Viewer via the appengine dashboard to change that
default up or down, as you see fit.

-----------------
Mitch


On Tue, Feb 19, 2013 at 9:30 PM, Waylon Brunette <wrb@cs.washington.edu>wro=
te:

>  Amarjeet,****
>
> ** **
>
> There is also a timeout if the request takes longer than 60 seconds which
> maybe showing up incorrectly as a quota issue. I assume you looked at you=
r
> appengine control panel when you got it. ****
>
> ** **
>
> Mitch has a bunch of cost logging built in and I don=92t know what is
> enabled or disabled in the current 1.3 release so I am going to let Mitch
> answer the question since he will know what was included in the 1.3 relea=
se.
> ****
>
> ** **
>
> Waylon****
>
> ** **
>
> *From:* Amarjeet Singh [mailto:amarjeet@iiitd.ac.in]
> *Sent:* Tuesday, February 19, 2013 8:21 PM
> *To:* Waylon Brunette
> *Subject:* Appspot quota****
>
> ** **
>
> Dear Waylon****
>
> ** **
>
> In one of our uses of ODK, we got quota error from appspot. We are sure
> that we did not exceed any of the normal quotas.****
>
> ** **
>
> I was wondering if you ever received any such error from your other field
> studies and what could be the suspected reason.****
>
> ** **
>
> We are also thinking of catching the quota exception in ODK and also do
> profiling of the application (I was wondering if you are already doing th=
e
> same)****
>
> ** **
>
> This is all with respect to the released version of ODK.****
>
> ** **
>
> Thank you****
>
> ** **
>
> --
> Best Regards****
>
> ** **
>
> Amarjeet Singh, ****
>
> Asst. Professor, IIIT Delhi****
>
> http://www.iiitd.edu.in/~amarjeet/ ****
>
> ****
>
> ** **
>



--=20
Mitch Sundt
Software Engineer
http://www.OpenDataKit.org <http://www.opendatakit.org/>
University of Washington
mitchellsundt@gmail.com

--089e01493c9897e72904d63fd17d
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div>Adding opendatakit-developers@ to this, as the answers could be of int=
erest to other developers...</div><div><br></div>The error page: &#39;you m=
ay have exceeded quota&#39; is displayed whenever the server has an uncaugh=
t exception or otherwise returns a 500 error (internal server error). It co=
uld have several causes, but the most likely is an over-quota error. There =
is no way to be more specific with this error, as it is using the static er=
ror page to present the error and, by the time you have an over-quota excep=
tion, you can&#39;t reliably execute any java code to diagnose the true cau=
se.<div>
<br></div><div>If the appengine dashboard indicates that the app is not ove=
r-quota (no red bars), and that the app is not currently subject to resourc=
e throttling (this will be a small triangular ! warning just below the Bill=
ing Status heading), then you should examine the Logs for the application (=
from the appengine Dashboard, click on the &#39;Logs&#39; link on the left =
sidebar). Look first for log entries with Critical or Error entries. If non=
e show up, then look at warnings. =A0The key warning to look for is the &#3=
9;your request took longer than 60 seconds, so AppEngine terminated it&#39;=
 warning. =A0Otherwise, any error within the ODK Aggregate code should gene=
rate a Critical or Error entry.</div>
<div><br></div><div>--------------</div><div>As for profiling, I&#39;d be i=
nterested in hearing your findings. Profiling AppEngine is problematic beca=
use you cannot glean any useful information from running in the development=
 environment. =A0ODK Aggregate is a bit odd in that it does not use Memcach=
e because configuring that on Tomcat would have greatly complicated the ins=
taller. Instead, frequently-accessed values are cached within the Java laye=
r and refreshed about every 3-6 seconds.</div>
<div><br></div><div>AppEngine performance is definitely limited to what we =
can do with BigTables datastores without using any indices beyond the defau=
lt one-dimensional indices (this is because our data tables are dynamically=
 constructed, and there is no Java API to dynamically construct composite i=
ndices). There is very little optimization you can do for AppEngine datasto=
re interactions because of this -- the limitations are effectively written =
into the code.</div>
<div><br></div><div>MySQL and PostgreSQL are much more flexible, and databa=
se profiling can be used to create extra indices as needed to tailor and tu=
ne for performance, and those extra indices will be used by the database to=
 improve query performance.</div>
<div><br></div><div>Additionally, MySQL and PostgreSQL databases both provi=
de mechanisms for tracking how many of each type of query or datastore acti=
on (insert/delete) are performed (e.g., CRUD reports, query timing reports)=
. =A0So no instrumentation is provided on that activity within ODK Aggregat=
e.=A0</div>
<div><br></div><div>AppEngine, however, provides no visibility into things =
like CRUD reports or query timing reports.=A0</div><div><br></div><div>To r=
emedy that, code within ODK Aggregate periodically emits the equivalent of =
a CRUD report (independent of the datastore used). =A0This is a set of log =
entries of the form:</div>
<div><li class=3D"ae-logs-applog"><h5>2012-12-05 15:03:38.317=A0<span style=
=3D"font-size:small;font-weight:normal">org.opendatakit.common.persistence.=
engine.DatastoreAccessMetrics logUsage: ---------- 2012-12-05T23:03:38.317+=
0000 @ 3892 ------------</span></h5>

</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.318 <span style=3D"font-size:small;font-weight:normal">=
org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage: =
TASK_LOCK,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.318=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: _COST_LOGGING_,1,1,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.318=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: _STRING_FIELD_LENGTHS_,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.318=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit.EXTREMELY_WIDE_FLAT_CORE,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.319=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._backend_actions,0,0,6,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.319=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._filter_group,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.319=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_data_model,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.319 o<span style=3D"font-size:small;font-weight:normal"=
>rg.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage: =
opendatakit._form_info,4,52,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.320=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_info_fileset,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.320=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_info_manifest_bin,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.320=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_info_submission_association,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.320=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_info_xform_bin,22,22,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.320=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_info_xform_blb,0,0,1,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.321=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_info_xform_ref,1,1,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.321=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._form_service_cursor,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.321=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._fusion_table,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.321=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._fusion_table_repeat,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.322=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._granted_authority_hierarchy,1,13,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.322=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._misc_tasks,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.322=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._persistent_result_file_bin,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.322=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._persistent_results,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.322=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._registered_users,1,1,2,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.323=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._security_revisions,0,0,7,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.323=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._server_preferences,0,0,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.323=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: opendatakit._user_granted_authority,2,2,0,0,0</span></h5>
</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-1" title=3D"Logging Severi=
ty: Info">
I
</span>
2012-12-05 15:03:38.323=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.DatastoreAccessMetrics logUsage=
: -----------------------------------------</span></h5></li></div><div>The =
comma-separated numbers are reset to zero after this report is logged, and =
the 5 values correspond to:</div>
<div><br></div><div><ol><li>queries issued,=A0</li><li>rows of query result=
s returned,=A0</li><li>count of gets (query by-primary-key),=A0</li><li>cou=
nt of puts (insert/update),=A0</li><li>counts of deletes</li></ol></div><di=
v><br>
</div><div>These are reported against each table in the datastore (the stri=
ng after logUsage: and before the string of numbers).</div><div>They are re=
ported for all datastores, so you should not see any significant variation =
in these numbers when migrating from AppEngine to MySQL or PostgreSQL.</div=
>
<div><br></div><div>-----------</div><div>To get the AppEngine-equivalent f=
unctionality to a longest-query report on MySQL or PostgreSQL, there is a c=
ost logging feature within ODK Aggregate (and only within the AppEngine dat=
astore).</div>
<div>If any query takes longer than the value of the COST_LOGGING_MEGACYCLE=
_THRESHOLD in the _COST_LOGGING_ table, then ODK Aggregate will emit a log =
message with the query and the time the query took. These are logged as War=
nings to the log, and are the most common warnings you&#39;ll see in the lo=
g.</div>
<div><br></div><div>An example of this is:</div><div>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-2" title=3D"Logging Severi=
ty: Warning">
W
</span>
2012-12-05 15:03:39.029=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.gae.ExecutionTimeLogger interme=
diateLogging: 000268 **intermediate** Form.getForms[_form_info] SELECT * FR=
OM opendatakit._form_info WHERE _URI &gt;=3D md5:04f2a292201cba8fac41e968e1=
9636e9 ORDER BY _URI</span></h5>

</li>
<li class=3D"ae-logs-applog">
<h5>
<span class=3D"ae-logs-severity ae-logs-severity-2" title=3D"Logging Severi=
ty: Warning">
W
</span>
2012-12-05 15:03:39.029=A0<span style=3D"font-size:small;font-weight:normal=
">org.opendatakit.common.persistence.engine.gae.ExecutionTimeLogger wrapUp:=
 000316 **final** Form.getForms[_form_info]</span></h5></li></div><div>The =
**intermediate** entry reports that the given query took 000268 units to co=
mplete, and the **final** reports that the overall time was 000316 units.</=
div>
<div><br></div><div>By default, the unit threshold for logging long queries=
 is 120 units. You can use the Datastore Viewer via the appengine dashboard=
 to change that default up or down, as you see fit.</div><div><br></div>
<div>-----------------</div><div>Mitch</div><div><br></div><div><br></div><=
div><div><div class=3D"gmail_quote">On Tue, Feb 19, 2013 at 9:30 PM, Waylon=
 Brunette <span dir=3D"ltr">&lt;<a href=3D"mailto:wrb@cs.washington.edu" ta=
rget=3D"_blank">wrb@cs.washington.edu</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">





<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Amarjeet,<u></u><u></u></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">There is also a timeout i=
f the request takes longer than 60 seconds which maybe showing up incorrect=
ly as a quota issue. I assume you looked at your appengine
 control panel when you got it. <u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Mitch has a bunch of cost=
 logging built in and I don=92t know what is enabled or disabled in the cur=
rent 1.3 release so I am going to let Mitch answer the question
 since he will know what was included in the 1.3 release.<u></u><u></u></sp=
an></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">Waylon<u></u><u></u></spa=
n></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Amarjeet=
 Singh [mailto:<a href=3D"mailto:amarjeet@iiitd.ac.in" target=3D"_blank">am=
arjeet@iiitd.ac.in</a>]
<br>
<b>Sent:</b> Tuesday, February 19, 2013 8:21 PM<br>
<b>To:</b> Waylon Brunette<br>
<b>Subject:</b> Appspot quota<u></u><u></u></span></p>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<p class=3D"MsoNormal">Dear Waylon<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">In one of our uses of ODK, we got quota error from a=
ppspot. We are sure that we did not exceed any of the normal quotas.<u></u>=
<u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I was wondering if you ever received any such error =
from your other field studies and what could be the suspected reason.<u></u=
><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">We are also thinking of catching the quota exception=
 in ODK and also do profiling of the application (I was wondering if you ar=
e already doing the same)<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">This is all with respect to the released version of =
ODK.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Thank you<span class=3D"HOEnZb"><font color=3D"#8888=
88"><u></u><u></u></font></span></p><span class=3D"HOEnZb"><font color=3D"#=
888888">
<div>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
<p class=3D"MsoNormal">-- <br>
Best Regards<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
<div>
<p class=3D"MsoNormal">Amarjeet Singh,=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Asst. Professor, IIIT Delhi<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal"><a href=3D"http://www.iiitd.edu.in/~amarjeet/" targe=
t=3D"_blank">http://www.iiitd.edu.in/~amarjeet/</a>=A0<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><img border=3D"0" width=3D"200" height=3D"50" src=3D=
"http://www.iiitd.ac.in/sites/default/files/images/graphicidentity/logo/sty=
le3colorsmall.jpg"><u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=A0<u></u></p>
</div>
</div>
</div>
</div>
</div>
</font></span></div>
</div>
</div>

</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Mitch Sundt<=
br>
Software Engineer<br>
<a href=3D"http://www.opendatakit.org/" target=3D"_blank">http://www.OpenDa=
taKit.org</a><br>
University of Washington<br>
<a href=3D"mailto:mitchellsundt@gmail.com" target=3D"_blank">mitchellsundt@=
gmail.com</a><br>
<br>

</div></div>

--089e01493c9897e72904d63fd17d--

(Jay Pfaffman) #2

First, I’d delete that message and make sure that it’s the right one (and you’re not the victim of some off by one error).

My guess is that it has to do with all of those stars that could be screwing with markdown.


(Yaw Anokwa) #3

I can confirm that it’s that message that is the problem. Removing it lets the import go through.


(Jay Pfaffman) #4

Maybe delete the HTML section and let the importer use the text version.

The other thing is to just do a binary search. Delete half the message, see if the the error is in the remaining half, split that In half and so on until you find it.