i trying insert 2 million nodes neo4j , having trouble performance.
i using neo4j enterprise 2.2.0 server extension written in java. computer has ssd, 32gb ram, intel core i7 cpu , running windows 8. run standalone version of server , start running neo4j.bat in bin-folder.
it takes 25 seconds insert 10 000 nodes no relationships right (i need add relations later, 1 problem @ time).
i think matter of configuration played around settings bit, no change in performance. find weird if set initmemory , maxmemory settings 15000 in neo4j-wrapper.conf java process allocate 3gb maximum.
i attached code , configurations below, have clue doing wrong? performance should expect when inserting large graph?
code inserting
for (thing t : things) { list<valuepair> properties = parsething(t); string uid = createuid(t); try (transaction tx = graphdb.begintx()) { node node = graphdb.createnode(); node.setproperty("uid", uid); (valuepair vp : properties) { node.setproperty(vp.getname(), vp.getvalue()); } tx.success(); } } (first adding dynamiclabel when creating nodes, slower. possible use labels if want performance when inserting nodes?)
configurations
neo4j.properties
################################################################ # neo4j # # neo4j.properties - database tuning parameters # ################################################################ # enable able upgrade store older version. #allow_store_upgrade=true # amount of memory use mapping store files, in bytes (or # kilobytes 'k' suffix, megabytes 'm' , gigabytes 'g'). # if neo4j running on dedicated server, recommended # leave 2-4 gigabytes operating system, give jvm enough # heap hold transaction state , query context, , leave # rest page cache. # default page cache memory assumes machine dedicated running # neo4j, , heuristically set 75% of ram minus max java heap size. dbms.pagecache.memory=4g # enable specify parser other default one. #cypher_parser_version=2.0 # keep logical logs, helps debugging uses more disk space, enabled # legacy reasons limit space needed store historical logs use values such # as: "7 days" or "100m size" instead of "true". #keep_logical_logs=7 days # autoindexing # enable auto-indexing nodes, default false. #node_auto_indexing=true # node property keys auto-indexed, if enabled. #node_keys_indexable=name,age # enable auto-indexing relationships, default false. #relationship_auto_indexing=true # relationship property keys auto-indexed, if enabled. #relationship_keys_indexable=name,age # enable shell server remote clients can connect via neo4j shell. #remote_shell_enabled=true # network interface ip shell listen on (use 0.0.0 interfaces). #remote_shell_host=127.0.0.1 # port shell listen on, default 1337. #remote_shell_port=1337 # type of cache use nodes , relationships. cache_type=hpc cache.memory_ratio=70 # maximum size of heap memory dedicate cached nodes. node_cache_size=2g #relationship_cache_size=6g # maximum size of heap memory dedicate cached relationships. #relationship_cache_size= # enable online backups taken database. online_backup_enabled=true # port listen incoming backup requests. online_backup_server=127.0.0.1:6362 # uncomment , specify these lines running neo4j in high availability mode. # see high availability setup tutorial more details on these settings # http://neo4j.com/docs/2.2.0/ha-setup-tutorial.html # ha.server_id number of each instance in ha cluster. should # integer (e.g. 1), , should unique each cluster instance. #ha.server_id= # ha.initial_hosts comma-separated list (without spaces) of host:port # ha.cluster_server of instances listening. typically # same cluster instances. #ha.initial_hosts=192.168.0.1:5001,192.168.0.2:5001,192.168.0.3:5001 # ip , port instance listen on, communicating cluster status # information iwth other instances (also see ha.initial_hosts). ip # must configured ip address 1 of local interfaces. #ha.cluster_server=192.168.0.1:5001 # ip , port instance listen on, communicating transaction # data other instances (also see ha.initial_hosts). ip # must configured ip address 1 of local interfaces. #ha.server=192.168.0.1:6001 # interval @ slaves pull updates master. comment out # option disable periodic pulling of updates. unit seconds. ha.pull_interval=10 # amount of slaves master try push transaction upon commit # (default 1). master optimistically continue , not fail # transaction if fails reach push factor. setting 0 # increase write performance when writing through master potentially # lead branched data (or loss of transaction) if master goes down. #ha.tx_push_factor=1 # strategy master use when pushing data slaves (if push factor # greater 0). there 2 options available "fixed" (default) or # "round_robin". fixed start pushing slaves ordered server id # (highest first) improving performance since slaves have cache # 1 transaction @ time. #ha.tx_push_strategy=fixed # policy how handle branched data. #branched_data_policy=keep_all # clustering timeouts # default timeout. #ha.default_timeout=5s # how heartbeat messages should sent. defaults ha.default_timeout. #ha.heartbeat_interval=5s # timeout heartbeats between cluster members. should @ least twice of ha.heartbeat_interval. #heartbeat_timeout=11s neo4j-server.properties
################################################################ # neo4j # # neo4j-server.properties - runtime operational settings # ################################################################ #*************************************************************** # server configuration #*************************************************************** # location of database directory org.neo4j.server.database.location=data/graph.db # low-level graph engine tuning file org.neo4j.server.db.tuning.properties=conf/neo4j.properties # database mode # allowed values: # ha - high availability # single - single mode, default. # run in high availability mode, configure neo4j.properties config file, uncomment line: #org.neo4j.server.database.mode=ha # let webserver listen on specified ip. default localhost (only # accept local connections). uncomment allow connection. please see # security section in neo4j manual before modifying this. #org.neo4j.server.webserver.address=0.0.0.0 # require (or disable requirement of) auth access neo4j dbms.security.auth_enabled=true # # http connector # # http port (for data, administrative, , ui access) org.neo4j.server.webserver.port=7474 # # https connector # # turn https-support on/off org.neo4j.server.webserver.https.enabled=true # https port (for data, administrative, , ui access) org.neo4j.server.webserver.https.port=7473 # certificate location (auto generated if file not exist) org.neo4j.server.webserver.https.cert.location=conf/ssl/snakeoil.cert # private key location (auto generated if file not exist) org.neo4j.server.webserver.https.key.location=conf/ssl/snakeoil.key # internally generated keystore (don't try put own # keystore there, deleted when server starts) org.neo4j.server.webserver.https.keystore.location=data/keystore # comma separated list of jax-rs packages containing jax-rs resources, 1 # package name each mountpoint. listed package names loaded # under mountpoints specified. uncomment line mount # org.neo4j.examples.server.unmanaged.helloworldresource.java # neo4j-server-examples under /examples/unmanaged, resulting in final url of # http://localhost:7474/examples/unmanaged/helloworld/{nodeid} #org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged org.neo4j.server.thirdparty_jaxrs_classes=my.project.package=/mypath #***************************************************************** # http logging configuration #***************************************************************** # http logging disabled. http logging can enabled setting # property 'true'. org.neo4j.server.http.log.enabled=false # logging policy file governs how http log output presented , # archived. note: changing rollover , retention policy sensible, # changing output format less so, since configured use # ubiquitous common log format org.neo4j.server.http.log.config=conf/neo4j-http-logging.xml #***************************************************************** # administration client configuration #***************************************************************** # location of servers round-robin database directory. possible values: # - absolute path /var/rrd # - path relative server working directory data/rrd # - commented out, default database data directory. org.neo4j.server.webadmin.rrdb.location=data/rrd neo4j-wrapper.conf
#******************************************************************** # property file references #******************************************************************** wrapper.java.additional=-dorg.neo4j.server.properties=conf/neo4j-server.properties wrapper.java.additional=-djava.util.logging.config.file=conf/logging.properties wrapper.java.additional=-dlog4j.configuration=file:conf/log4j.properties #******************************************************************** # jvm parameters #******************************************************************** wrapper.java.additional.1=-xx:+useconcmarksweepgc wrapper.java.additional.2=-xx:+cmsclassunloadingenabled wrapper.java.additional.3=-xx:-omitstacktraceinfastthrow wrapper.java.additional.4=-xx:hashcode=5 # remote jmx monitoring, uncomment , adjust following lines needed. # make sure update jmx.access , jmx.password files appropriate permission roles , passwords, # shipped configuration contains read role called 'monitor' password 'neo4j'. # more details, see: http://download.oracle.com/javase/7/docs/technotes/guides/management/agent.html # on unix based systems jmx.password file needs owned user run server, # , have permissions set 0600. # details on setting these file permissions on windows see: # http://docs.oracle.com/javase/7/docs/technotes/guides/management/security-windows.html #wrapper.java.additional=-dcom.sun.management.jmxremote.port=3637 #wrapper.java.additional=-dcom.sun.management.jmxremote.authenticate=true #wrapper.java.additional=-dcom.sun.management.jmxremote.ssl=false #wrapper.java.additional=-dcom.sun.management.jmxremote.password.file=conf/jmx.password #wrapper.java.additional=-dcom.sun.management.jmxremote.access.file=conf/jmx.access # systems cannot discover host name automatically, , need line configured: #wrapper.java.additional=-djava.rmi.server.hostname=$the_neo4j_server_hostname # uncomment following lines enable garbage collection logging #wrapper.java.additional=-xloggc:data/log/neo4j-gc.log #wrapper.java.additional=-xx:+printgcdetails #wrapper.java.additional=-xx:+printgcdatestamps #wrapper.java.additional=-xx:+printgcapplicationstoppedtime #wrapper.java.additional=-xx:+printpromotionfailure #wrapper.java.additional=-xx:+printtenuringdistribution # java heap size: default java heap size dynamically # calculated based on available system resources. # uncomment these lines set specific initial , maximum # heap size in mb. wrapper.java.initmemory=15000 wrapper.java.maxmemory=15000 #******************************************************************** # wrapper settings #******************************************************************** # path relative bin dir wrapper.pidfile=../data/neo4j-server.pid #******************************************************************** # wrapper windows nt/2000/xp service properties #******************************************************************** # warning - not modify of these properties when application # using configuration file has been installed service. # please uninstall service before modifying section. # service can reinstalled. # name of service wrapper.name=neo4j # user account used linux installs. default current # user if not set. wrapper.user= #******************************************************************** # other neo4j system properties #******************************************************************** wrapper.java.additional=-dneo4j.ext.udc.source=zip wrapper.java.additional=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -xdebug-xnoagent-djava.compiler=none-xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005 you make me happy if me solve this!
you need create more 1 node in transaction, otherwise transaction overhead consumes of time.
please try way:
try (transaction tx = graphdb.begintx()) { (thing t : things) { list<valuepair> properties = parsething(t); string uid = createuid(t); node node = graphdb.createnode(); node.setproperty("uid", uid); (valuepair vp : properties) { node.setproperty(vp.getname(), vp.getvalue()); } } tx.success(); }
Comments
Post a Comment