i experimenting apache tika: app & server, gui , command line.
with tika app, can
java -jar tika-app-1.7.jar --gui and choose 'view' -> 'main content', or
java -jar tika-app-1.7.jar --text-main http://www.cnn.com/2015/07/09/politics/russian-bombers-u-s-intercept-july-4/index.html i need main content, seems in server mode can plain text. checking this guide.
curl -s "http://amzn.com/b005iwm8pu" | curl -x put -t - http://<server_ip>:9998/meta curl -s "http://amzn.com/b005iwm8pu" | curl -x put -t - http://<server_ip>:9998/tika maybe, comes after http://:9998/ trick? there way main content in server mode?
at end, request has made in ruby, tika-server-1.3.jar. far looks this:
require "net/http" tika_prefix = uri('http://<server_ip>:9998/tika') url = 'http://www.cnn.com/2015/07/09/politics/russian-bombers-u-s-intercept-july-4/index.html' request = net::http::put.new(tika_prefix.to_s) request.body = url request.content_type = 'text/html' http = net::http.start(tika_prefix.hostname, tika_prefix.port) http.request(request).body
this possible of today. tika 1.15 implements tika-2343 feature request, adds --text-main equivalent in server mode.
vaites/php-apache-tika php binding tika use, , i've opened an issue regarding this, should able see being implemented soon.
edit: php binding library supports feature.
Comments
Post a Comment