mediawiki api

I'm sick and tired of copying and pasting the first paragraph of the wikipedia article into my pages. Wait, they have an API?

Wikipedia (Mediawiki) has an public API (i.e. no authentication required) which allows you to send well formed requests (i.e. not wrong) to the Wikipedia website and get back data on the contents of pages and loads of other stuff that I don't pretend to know anything about.

Whilst creating articles on The Computing Café, I started by copying and pasting the summary paragraph (section 0) from various Wikipedia pages onto the site. This was super boring, I probably forgot to reference the text and I ran the risk of my text getting out of date. Enter the Wikipedia public API.

The Wikipedia public API is available at https://en.wikipedia.org/w/api.php. If you visit this without telling the script what you'd like it to do, you'll get a helpful documentation page (I bet you tried it didn't you?).

To give the API something to deal with (so it can give you cool things back), you pass parameters to the script through the URL. There are literally lots of

actions

that you can give to the API but the simplest one is the

parse

action.

The simplest of requests

Let's ask Wikipedia for it's article on 'Dragon_Data' 🐉 (my favourite vintage computer manufacturer).

https://en.wikipedia.org/w/api.php?action=parse&page=Dragon_Data

You should see the MediaWiki API Results page with a heading which reads...

This is the HTML representation of the response to the API request. HTML is good for debugging, but is unsuitable for application use. Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json. See the complete documentation, or the API help for more information.

...followed by some formatted computer geekery.

So, whilst it's good for geeks to read Wikipedia articles, it's not very useful for us in this situation.

Return a machine readable format

So, let's tell the API to give us back the article in a more suitable format than HTML. We opt for JSON (JavaScript Object Notation) which is a strict structure which Javascript uses to store and access data.

https://en.wikipedia.org/w/api.php?action=parse&page=Dragon_Data&format=json

"Yuk! I can't read that properly - it's just a block of text." Sure, but our script will be able to read it. If you want to format the JSON in the browser so that you can see the structure more clearly, add the Chrome extension called JSON Formatter from the Chrome Web Store.

Give me 'section-0'!

The last API call gave us back the whole article (it's long) but we just want the first section called 'section-0' (sounds like something from a Sci-Fi movie).

https://en.wikipedia.org/w/api.php?action=parse&page=Dragon_Data&format=json&section=0

OK, cool enough, but still too much cause it's got all the language options, categories etc. There are some cool things in here though like links to the images and the external references. Anyhoo - let's stay focussed.

Filter out everything but the text of section-0

Now we'll use the

prop

parameter to strip down the JSON object.

https://en.wikipedia.org/w/api.php?action=parse&page=Dragon_Data&format=json&section=0&prop=text

You'll notice that we get the

text

property like we asked but also the

title

and the

pageid

. You always get these no matter what

prop

value we use. If you look carefully, you will see that there is a 'child' in the 'text' node labelled

. This contains the summary we need!

Now comes the Javascript

So, it's all well and good looking at reams of data in a JSON file but it's not really much use. If we can use Javascript to ask for the data and then parse it into the format we need, we can display it in a much more use-friendly way. Here we go...

<div id="article"></div>
<script>
  $(document).ready(function(){
    $.ajax({
      url : 'https://en.wikipedia.org/w/api.php',
      method : 'GET',
      data : {
        action   : 'parse',
        format   : 'json',
        prop     : 'text',
        section  : '0',
        page     : 'Dragon_Data',
        origin   : '*'
      },
      success :
        function(response,status,xhr){
          if(response.parse !== undefined){
            var markup = response.parse.text["*"];
            var summary = $(markup)
                          .find('p')
                          .map(function(){return $(this).text()}).get().join('<br/><br/>')
                          .replace(/\[[^\]]+\]/gm,'')
                          .replace(/\n/g,'')
                          .replace(/^(<br(?:\/)?>)+/g,'')
                          .replace(/(<br(?:\/)?>)+$/g,'');
            $('#article').html(summary);
          } else {
            $('#article').html('Nothing to show - check console');
            console.log(response);
          }
        },
      error : // Ajax error
        function(message){
          $('#article').html('Error: '+message);
        }
    })
  });
</script>

This is what the output would look like (I've styled the box a little to make it look nice)...

Let me explain the code for you.

An empty

div

to hold the summary text. The Javascript will find this and fill in the content dynamically when it's received and cleaned up the Wikipedia article summary.

Start of the

script

. This should come after the

div

This JQuery function will be called only when the document is loaded. That way, we know that the

div

has been created in the page.

Call the JQuery ajax function (Asynchronous Javascript And Xml). The function accepts a URL and a collection of settings, and returns three objects - the response, the status of the response and a jsXHR header object.

This is a base

url

of the API for Wikipedia.

This is the

method

we will use for the request.

This is a collection of parameters we will send with the request. Notice they look similar to the parameters we appended to the URLs earlier. You can read more about the parameters on the Mediawiki API main page.

This declares that we will use the parse

action

for our request.

This declares that we expect the data to be returned in JSON

format

This declares that we would only like the 'text'

prop

erty to be returned from the parse

action

This declares that we would only like

section

0 to be returned from the parse

action

This declares that we would like the data to be returned from this specific page. Note that this parameter has to be the exact page title or no data will be returned.

This is used to specify that the request is unauthenticated to prevent cross-domain errors. All user-specific data is excluded from these types of request but no authentication is required which is good.

End of the data object.

If the AJAX call is successful, the function contained in this setting is executed.

This function is executed on success. The AJAX call returns three objects - the response, the status of the response and the jsXHR header object.

If no data is returned from a successful request, the 'parse' method of the response returns 'undefined'. This will normally be because the page name was incorrect. In this case, check if it's NOT undefined (i.e. data has been returned) and then deal with a correct response.

Store the contents of the

child of the text node in the variable

markup

Now, use some funky JQuery methods on

markup

(that's why it's enclosed in

$()

Find all the paragraphs. This will store them a Javascript array.

Use the JQuery

map

function to join together all the elements in the array with two line breaks.

Remove all the citations and references contained in square brackets to stop it looking like a Wikipedia article.

Remove any remaining newline characters.

Remove any HTML break tags at the beginning of the summary.

Remove any HTML break tags at the end of the summary.

Finally, insert the summary in the empty

div

that we created earlier.

If the initial response returned 'undefined'...

...display an error message in the div...

...and log the response in the console.

Finally, if some other AJAX method occurred, run the function.

Run this function if there is an AJAX error.

Display the error message in the

div

That's it. Simply enough? One of the main issues is finding the exact page name. Obviously, you can search Wikipedia for it first, but I've extended this example to include a search function to help you explore other MediaWiki API functions.

Have a play with my MediaWiki API demo.

Fork your own version of my MediaWiki API demo.

Last modified: April 22nd, 2022

Login

mediawiki api