From charlesreid1

Line 164: Line 164:
The example that we built created some sample XML documents that can be used as inputs to Solr. Go to <code>/opt/solr/example/exampledocs</code> to have a look.
The example that we built created some sample XML documents that can be used as inputs to Solr. Go to <code>/opt/solr/example/exampledocs</code> to have a look.


(More later, tutorial still in progress..........)
==Indexing Data==
 
You will want to start by indexing data. In the exampledocs folder is a file post.jar - this can be used to POST xml data (i.e. index the data).
 
If you're running Solr on a Tomcat server, index data by executing the command:
 
<pre>
$ java -jar -Durl=http://localhost:8080/solr-example/update post.jar solr.xml monitor.xml
</pre>
 
And if you're running Solr on Jetty, index data by executing the command:
 
<pre>
$ java -jar post.jar solr.xml monitor.xml
</pre>
 
==Indexing HTML/TXT Files==
 
To index HTML and TXT files, you need to edit the search engine's schema configuration file. This is located in <code>/opt/solr/example/solr/conf/schema.xml</code>. I recommend making a backup copy before you touch it.
 
===Add a data field for the text===
 
Find some lines that specify various data fields:
 
<pre>
  <field name="category" type="text_general" indexed="true" stored="true"/>
  <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
  <field name="last_modified" type="date" indexed="true" stored="true"/>
  <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
</pre>
 
and add a field of your own, for the body of the HTML or text document:
 
<pre>
  <field name="category" type="text_general" indexed="true" stored="true"/>
  <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
  <field name="last_modified" type="date" indexed="true" stored="true"/>
  <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
  <field name="body" type="text" indexed="true" stored="true"/>
</pre>
 
===Add a copy action for the text===
 
Now that a "body" field has been created, an action will be added that will copy anything put in the "body" field into the "text" field, since the text field is already defined/utilized.
 
Find the lines that specify copy actions:
 
<pre>
  <copyField source="includes" dest="text"/>
  <copyField source="manu" dest="manu_exact"/>
</pre>


and add a new copyField action:


<pre>
  <copyField source="includes" dest="text"/>
  <copyField source="manu" dest="manu_exact"/>
  <copyField source="body" dest="text"/>
</pre>








(More later, tutorial still in progress..........)


[[Category:Programs]]
[[Category:Programs]]
[[Category:Web]]
[[Category:Web]]

Revision as of 07:08, 8 June 2012

Solr is a search engine server that allows for querying via HTTP, JSON, or XML, and returns results in JSON or XML.

I'm trying to use it to create a searchable database of text files.

Installation

Download it and compile it by using Ant (a Java-based make program):

$ wget http://mirror.metrocast.net/apache/lucene/solr/3.6.0/apache-solr-3.6.0-src.tgz

$ tar xzf apache-solr-3.6.0-src.tgz

$ cd apache-solr-3.6.0

$ ant ivy-bootstrap # this installs ivy, an Ant dependency

$ ant compile

It'll take a couple of minutes to finish.

Test

You can test everything by running

$ ant test

Making War

Make a .war file by doing this:

$ cd /path/to/apache-solr-3.6.0/solr

$ ant dist

Again, this will take a while.

Making Example

Make the Ant example by typing

$ cd /path/to/apache-solr-3.6.0/solr

$ ant example



Running Solr on a Web Server

Using Jetty (Defualt)

To run Solr, you have to have a web server running locally. The example that is distributed with Solar is also distribute with Jetty, a lightweight Java web server. After you've finished running the above commands and have made the Solr example, type:

$ java -jar start.jar

This will start the Jetty server and get Solr running from within Jetty. Visiting hlocalhost:8983/solr/admin should look something like this:

SolrExample1.png

Using Tomcat

You can run Solr through Tomcat, a Java-based HTTP server from the Apache Software Foundatino (contrast that with the more common C-based Apache HTTP server). See my Tomcat page for installation/run instructions for Tomcat.

Download Solr

See above.

Build Solr and Solr example

See above.

Create Solr user in Tomcat

in $CATALINA_HOME/conf/tomcat-users.xml, define a new admin user for Solr:

<role rolename="manager"/>
<role rolename="admin"/>
<user username="admin" password="password" roles="manager,admin"/>

Create standalone Solr example directory

You will want to create a standalone directory that holds your Solr example. Tomcat will run a particular instance of Solr out of this standalone directory. I used /opt/solr.

Now you'll copy the Solr example that you built above into /opt/solr:

$ cp -r /path/to/apache-solr-6.0/example /opt/solr/.

Specify Solr Data Directory

To specify where the Solr instance is located, you'll need to edit /opt/solr/conf/solrconfig.xml and change the dataDir tag to point to the standalone Solr example's data directory:

 
<dataDir>${solr.data.dir:/opt/solr/example/solr/data}</dataDir>

Tell Tomcat How To Run Solr

You can tell Tomcat how to run Solr by creating a docBase fragment that points to the Solr war file. Create this file:

$CATALINA_HOME/conf/Catalina/localhost/solr-example.xml

with the following contents:


<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/opt/solr/example/solr/solr.war" debug="0" crossContext="true">
  <Environment name="solr/home" type="java.lang.String" value="/opt/solr/example/solr" override="true"/>
</Context>

Test It Out

Start up the Tomcat server:

$CATALINA_HOME/bin/startup.sh run

and go to

http://localhost:8080/solr-example/admin

and you should see something like this:

SolrExample2.png


Using Solr

So you've got Solr up and running, but you don't have any data. Let's fix that.

Following Apache's Solr tutorial: http://lucene.apache.org/solr/api/doc-files/tutorial.html

The example that we built created some sample XML documents that can be used as inputs to Solr. Go to /opt/solr/example/exampledocs to have a look.

Indexing Data

You will want to start by indexing data. In the exampledocs folder is a file post.jar - this can be used to POST xml data (i.e. index the data).

If you're running Solr on a Tomcat server, index data by executing the command:

$ java -jar -Durl=http://localhost:8080/solr-example/update post.jar solr.xml monitor.xml

And if you're running Solr on Jetty, index data by executing the command:

$ java -jar post.jar solr.xml monitor.xml

Indexing HTML/TXT Files

To index HTML and TXT files, you need to edit the search engine's schema configuration file. This is located in /opt/solr/example/solr/conf/schema.xml. I recommend making a backup copy before you touch it.

Add a data field for the text

Find some lines that specify various data fields:

   <field name="category" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>

and add a field of your own, for the body of the HTML or text document:

   <field name="category" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="body" type="text" indexed="true" stored="true"/>

Add a copy action for the text

Now that a "body" field has been created, an action will be added that will copy anything put in the "body" field into the "text" field, since the text field is already defined/utilized.

Find the lines that specify copy actions:

   <copyField source="includes" dest="text"/>
   <copyField source="manu" dest="manu_exact"/>

and add a new copyField action:

   <copyField source="includes" dest="text"/>
   <copyField source="manu" dest="manu_exact"/>
   <copyField source="body" dest="text"/>



(More later, tutorial still in progress..........)