replication - Is it possible to prevent fetching of remote design document in couchdb -
update
as @akshatjiwansharma suggested have tried few things while locally replicating. instructive! have renamed question since problem not design document gets replicated, in fact isn't replicated, but fetched via http part of initial replication "negotiation" phase.
i've moved original question bottom make new question clearer. new question is:
it seems inefficient (particularly in case of couchapps) fetch entire design document - i.e. entire remote app - when initiating replication remote source. can avoided?
it particularly problematic in our case, on high latency links (less 7.2kbps), relatively large design documents (3mb).
remote target
i have first tried using "remote" target setting replication target http://127.0.0.1:5984/emr_replica.
[fri, 08 aug 2014 08:36:20 gmt] [info] [<0.18947.7>] document `88fa1b1a1315d27ded663466c6003578` triggered replication `e8e66a554d198b88b6263a572a072fd3+continuous` [fri, 08 aug 2014 08:36:20 gmt] [info] [<0.18946.7>] starting new replication `e8e66a554d198b88b6263a572a072fd3+continuous` @ <0.18947.7> (`emr_demo` -> `http://127.0.0.1:5984/emr_replica/`) [fri, 08 aug 2014 08:36:20 gmt] [info] [<0.18928.7>] 127.0.0.1 - - post /emr_replica/_revs_diff 200 [fri, 08 aug 2014 08:36:20 gmt] [info] [<0.18915.7>] y.y.y.y - - /_utils/_sidebar.html 200 [fri, 08 aug 2014 08:36:20 gmt] [info] [<0.18916.7>] y.y.y.y - - /_replicator/88fa1b1a1315d27ded663466c6003578?revs_info=true 200 in case design document doesn't seem fetched.
remote source
then setting source "remote" this
{ "_id": "88fa1b1a1315d27ded663466c6003a4a", "_rev": "3-b6408e98acafe729da0153c35d9df113", "source": "http://127.0.0.1:5984/emr_demo", "target": "emr_replica", "continuous": true, "filter": "emr/user_data", "owner": "jun" } then server fetches remote design document before starting replication (get /emr_demo/_design/emr 200).
[fri, 08 aug 2014 08:42:17 gmt] [info] [<0.19687.7>] document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous` [fri, 08 aug 2014 08:42:17 gmt] [info] [<0.19686.7>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` @ <0.19687.7> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`) [fri, 08 aug 2014 08:42:17 gmt] [info] [<0.19648.7>] 127.0.0.1 - - head /emr_demo/ 200 [fri, 08 aug 2014 08:42:17 gmt] [info] [<0.19648.7>] 127.0.0.1 - - /emr_demo/_design/emr 200 [fri, 08 aug 2014 08:42:18 gmt] [info] [<0.19656.7>] 127.0.0.1 - - /emr_demo/5cc2db69a32a84091b96c244273fda0e?revs=true&open_revs=%5b%221-ef8967557f2e99eb137f963daccddb3f%22%5d&latest=true 200 further testing shows fetching of design document done once. further replications (including after restarting server) fetch changes appropriate filter:
[fri, 08 aug 2014 09:06:36 gmt] [info] [<0.520.0>] document `88fa1b1a1315d27ded663466c6003a4a` triggered replication `bd8f6288970bca974dba36dbc6e5353b+continuous` [fri, 08 aug 2014 09:06:36 gmt] [info] [<0.519.0>] starting new replication `bd8f6288970bca974dba36dbc6e5353b+continuous` @ <0.520.0> (`http://127.0.0.1:5984/emr_demo/` -> `emr_replica`) [fri, 08 aug 2014 09:06:36 gmt] [info] [<0.335.0>] 127.0.0.1 - - /emr_demo/_changes?filter=emr%2fuser_data&feed=continuous&style=all_docs&since=1607&heartbeat=1666 200 [fri, 08 aug 2014 09:06:36 gmt] [info] [<0.334.0>] 127.0.0.1 - - /emr_demo/5cc2db69a32a84091b96c24427560310?atts_since=%5b%2218-b613d3160bd09c45ac07a5485c9c7bce%22%5d&revs=true&open_revs=%5b%2219-d50438143337a3a0af5ed8ceb75b42f5%22%5d&latest=true 200 former question
we're trying use couchdb replication on high latency link (slow, frequent disconnections,...). want avoid replicate design document heavy. have filter in place , when using following curl command, design document doesn't appear, expected:
curl http://x.x.x.x:5984/emr/_changes?filter=emr/user_data our replication document is:
{ "_id": "e0e38be8cc0b11356dfb03bc8400074d", "_rev": "1-d77117f03d63099e1e505b9f9de3371d", "source": "http://x.x.x.x:5984/emr", "target": "emr", "continuous": true, "filter": "emr/user_data", "create_target": true, "owner": "jun" } we have deactivated authentication while we're debugging. when using existing database , removing create_target, same problem occurs.
the source server outputs following:
[mon, 10 mar 2014 21:22:03 gmt] [info] [<0.135.0>] retrying head request http://x.x.x.x:5984/emr/ in 0.25 seconds due error {conn_failed,{error,etimedout}} [mon, 10 mar 2014 21:23:47 gmt] [info] [<0.135.0>] retrying request http://x.x.x.x:5984/emr/_design/emr in 0.25 seconds due error req_timedout [mon, 10 mar 2014 21:24:14 gmt] [error] [<0.135.0>] replicator, request "http://x.x.x.x:5984/emr/_design/emr" failed due error {error,req_timedout} [mon, 10 mar 2014 21:24:14 gmt] [error] [<0.135.0>] replication manager, error processing document `e0e38be8cc0b11356dfb03bc8400074d`: couldn't open document `_design/emr` source database `http://x.x.x.x:5984/emr/`: {'exit',{http_request_failed,"get","http://x.x.x.x:5984/emr/_design/emr", {error,{error,req_timedout}}}} when using tcpdump, it's clear replication fails because replication manager attempts download heavy design document (http://x.x.x.x:5984/emr/_design/emr).
fyi replicator's configuration is:
replicator connection_timeout 5000 db _replicator http_connections 1 max_replication_retry_count 3 retries_per_request 1 socket_options [{keepalive, true}, {nodelay, true}] ssl_certificate_max_depth 3 verify_ssl_certificates false worker_batch_size 1 worker_processes 1 edit: user_data function (which correctly hides design document when ran through curl above) :
exports.user_data = function(doc, req) { if (doc.collection == "visits" || doc.collection == "patients" || doc.collection == "reports") { return true; } return false; } hope can help!
suggestion
try defining filter function in another, small, dedicated design document , see if fixes problem.
// replicator document: { "_id": "e0e38be8cc0b11356dfb03bc8400074d", "_rev": "1-d77117f03d63099e1e505b9f9de3371d", "source": "http://x.x.x.x:5984/emr", "target": "emr", "continuous": true, "filter": "small-design-doc/user_data", "create_target": true, "owner": "jun" } // _design/small-design-doc // -- replicated, quite small: { "_id": "_design/small-design-doc", "_rev": "1-...", "filters": { "user_data": "function(doc, req) { ... }" } } explanation
according current snapshot of source code, seems replicator trying fetch design document (_design/emr) source database, because filter function defined there (emr/user_data).
if specify filter function in design document, replicator should try download document before executing replication. cannot quite circumvent downloading any design document, able select which one.
great question way. , thoroughly investigated!
Comments
Post a Comment