Qualitative Methods in TV Research - Appreciation

Peter Diem

There are two basic approaches to practically-oriented qualitative television research. One is based on data delivered as the result of appreciation scaling. Such data can be collected by meter, diary, phone or with the help of online surveys. The second approach is based on the use of in-depth interviews, group discussions or auditorium tests for data collection. These studies can be conducted both offline and online.

Appreciation Measurement by Meter

In the Austrian Public Service TV (ORF) use was made of both methods. Austria is probably one of the last countries to run a meter system with integrated program appreciation. Years ago such systems were employed by Belgium, Denmark, Finland, The Netherlands, Norway, Portugal, Spain and Switzerland, while in the UK a supplementary appreciation diary was used which provided Appreciation Indices (AI) for all programs. It is interesting to note that at that time the three major European markets Germany, France, and Italy did not employ such a system. Countries formerly relying on diary collected data, such as the Czech Republic and Slovakia, also used appreciation. The University of Hong Kong in its reports on TV program quality uses the UK-type Appreciation Index (AI).

In the meantime, practically all of the countries mentioned above, have „lost“ appreciation by meter. As TAM contracts were renewed, important customers (chiefly media and advertizing agencies) said they were not interested in appreciation. But public service stations were – due to their „public value“ obligation which calls for the proof of „quality“.

Former Telecontrol VIII People Meter Traditional Nielsen PeopleMeter

So in the course of time, a number of nations ruefully paddled back to appreciation – albeit on the basis of a completely new system: the „TV Appreciation Panel“. This type of research was invented and developed by GfK Intomart in the Netherlands
(its TV system was formerly very fond of appreciation by meter.) The next to sign up was the UK (BBC and ITV). While Germany (ZDF) and half of UK (ITV) dropped out after a few years, the following countries continue to run a TV Appreciation Panel:

à For more information and a discussion about the use of TV Appreciation Panels
cf. Appendix 1 and 2

Rating vs. Appreciation

Note: There is no fixed correlation between ratings and appreciation.

Large TV audiences are by no means always satisfied audiences. Large audiences are composite audiences, which means that a considerable number of viewers is sitting before the screen without really having personally chosen to watch this particular program. And sometimes it even seems that viewers hate what they have just watched. This is not only the case when the national soccer team suffers a painful loss against an 'inferior' opponent or when ski champions miss a victory by milliseconds. There are also other instances in which rather large audiences may react with disgust. Current affairs programs with violent pictures from a theater of war or discussion programs about unpopular topics are of this type. Thus in many instances, high ratings may go along with low appreciation. Relatively low appreciation often occurs with typical Friday or Sunday night thrillers – one has the impression, viewers regret that they „wasted“ time again on some murder case…

The second category is low appreciation combined with low ratings. In such cases, a minority program has failed to draw enough satisfied viewers within the target audience. This occurs quite often with programs about modern art or contemporary music which do not appeal to many viewers. Such broadcasts tend to draw small audiences, who on top express their criticism about what they have been offered by low appreciation scores.

A third category comprises programs in which high appreciation is combined with low ratings. Good public service broadcasters produce quite a lot of such programs. These programs are aimed at small target groups - opera fans are a typical example - who take pleasure in special offerings. Another examples are religious broadcasts which tend to be highly appreciated by small audiences.

Program makers are naturally most satisfied with the fourth category: high ratings accompanied by high appreciation. This type is by no means an exception. There are some program categories which tend to score well in both respects. These include nature/animal programs or expeditions, top-flight cinema films, special folk music offerings etc. Here is a general overview:

What are the advantages of appreciation?

First of all, appreciation is used by public service broadcasters to document the public value of their program. Appreciation is a major performance indicator.
For details on the concept of „performance indicators“ cf. Appendix 2.

Besides helping public broadcasters to justify license fees, an important function of appreciation scores is their predictive value. Many cases have shown that a first episode which is well accepted – not only according to rating but also with respect to appreciation – will result in good ratings for subsequent episodes. Conversely – if a series of episodes starts with a good or satisfactory rating but with low appreciation, it can be assumed that the following episodes will not perform well – except if substantial improvements are being made in the program.

A third intersting feature of high appreciation scores is their value for the acceptance of TV advertizing. In contrast to the opinion of most advertizing agents/media agencies, a TV program with high appreciation has a positive influence on ads placed immediately before or after such a program. According to empirical studies performed in the Netherlands (Ster/Intomart/MarketResponse) , ads after well-accepted programs produce a higher buying intention than ads placed in other program environments. According to these findings, even a slight increase in appreciation has a measurable effect on the propensity to buy.

Finally, it should be noted that appreciation is an „incentive“ for the members of a TAM panel. Respondents who are invited to judge the programs they watch by having to enter an appreciation score, have more fun in participating and are therefore more disciplined in operating the meter than panelists who do not have a chance to express their opinion in the form of appreciation.

Some methodological consideratons

There are some methodological objections against the use of appreciation as an 'one-dimensional' measurement. It is sometimes contended that only by means of a cluster of qualitative metrics (measuring a variety of program properties), is it possible to determine the „quality“ of a program. Still, experience with more than 12,000 ORF TV programs per year over a period of almost 15 years has shown that the interpretation of appreciation scores in the context of the respective program genre by experienced researchers and program makers allows for unbiased insight into audience satisfaction.

Of course, additional qualitative questioning will lead to a still deeper understanding of program quality. Therefore, in all TV Appreciation Panels, questions are included to measure such factors as viewing attention, personal enrichment, willingness to recommend a program to others etc. In particular, answers to open questions (about „likes“ and „dislikes“) collected in online panels give insight into the different factors and motives constituting program „quality“.

The „viewing experience („Sendungserlebnis“)

Note: The general appreciation score measures the individual viewing experience, not the artistic value of a program or the journalistic quality of its content. The motives underlying program appreciation may differ from case to case. As we said, the transmission of a sports event may be of high technical quality – but if the national team is defeated, appreciation will be low.

Nevertheless, the average numerical appreciation score (which also needs no further processing) together with the respective program category is a most practical instrument of measuring program quality – simple, cheap and effective.

Other qualitative methods of TV research

In the sphere of radio research the in-depth interview and call-out testing are well established techniques. But also in the field of television, program pretesting by means of focus groups has proved to be a very good solution. In order to collect as much information as possible, a special setting for the use of a focus group was developed by the ORF media research department.

The Videotest

In cooperation with the market research institute Triconsult (http://triconsult.at), the
so-called „Videotest“ was used until the year 2000. This was a special form of group discussion in which the respondents operated electro-mechanical „sliders“ while viewing a live broadcast or a video tape. The data recorded by means of the sliders were fed into a PC and evaluated on a scale from 0,0 („very bad“) to 10,0 („very good“). The results were available immediately after the viewing and could be displayed along a time axis, thus showing the acceptance of the program parts from beginning to end (see below).

Old-fashioned slider device Slider scale designed for use on tablet

Slider programmed as an „App“ to be used on a smartphone

Sample read-out of a number of sliders operated during the viewing of a TV program. In this case, the time-weighted average (or median) appreciation score of the program (without ad breaks) is 8,1. It shows that the first part of the program was better received than the second one.

It should be mentioned in this connection that the Austrian sociologist Paul F. Lazarsfeld while doing research in the New York based Office of Radio Research in the early 1940s, already had test persons judge radio boradcasts by operating a red and a green pen on a moving paper band to record their appreciation of what they were listening to.

After having viewed the program, the Videotest group was led into a one-hour session to discuss the pros and cons of the program. To produce a perfect videotape of this discussion – including body language and facial expression – the group was seated in a semicircle and was being filmed by a semi-professional camera man. Emphasis was led on good lighting and the use of a high quality directional microphone in order to produce a videotape which professional program makers would be willing to view. Body language and facial expression could thus be recorded. Here is a sketch of the set-up.

If electronic sliders or other sentiment recording equipment is not at hand, the Appreciation Index can also be produced by paper and pencil work. With regard to the scale form 0,0 to 10,0, long-term benchmarking has taught us that products, services or media content (such as TV programs) which receive a score lower than 6.5 are serious marketing risks. We have proposed to GEAR, the professional Group of European Audience Researchers, that this scale - which is widely used in the US, England and Germany - be accepted as a kind of 'harmonized' international measure. This would enable researchers to make simple comparisons in the field of globalized television.

Since quite a number of years, the Internet has opened the possibility of doing qualitative TV research online. One possibility is the Online Focus Group (OFG), a discussion held online among up to 8 participants. For practical reasons (formalities, limited time), simultaneous online discussions tend to produce rather „thin“ results.

Another method, the Online Bulletin Board (OBB) has established itself as the most practical and most effective method of qualitative TV research. Engaging a number of 12 to 20 participants and lasting a series of days – up to a fortnight – the OBB produces very rich results. This is because the participants can follow up at their convenience at home and take their time to express themselves without stress.
There is also enough time to view sample footage shown online or delivered to the home on disk, as participants can enter their opinion at any time during the exercise.
There are a few other possibilities of qualitative online research for TV - such as diaries, web logs or the creation of „viewer communities“. The number of participants can be higher in these cases. For logistic reasons, these methods are rather complex and therefore also expensive.

Note: Due to the rapid spread of the internet also in developing markets, qualitative online TV research is no longer restricted to young and up-market segments of the population. The small numbers of participants required for a targeted Online Bulletin Board can be easily recruited by professional market research institutes.

The following table shows the penetration of Internet in different age groups in Austria and Russia.

Source: This presentation is based on a paper published in „Media-Perspektiven“ 2/94, an article published in

„Communications“ - the „European Journal of Communication Research“ # 21 (1996), and a presentation given in at the Higher School of Economics in Moscow (April 2013).

Appendix 4 is a German-language article on the same subject by the Dutch TV researchers Lex van Meurs and Erica Simpson

Appendix 1

TV Appreciation Panels

As can be seen in the graphs below, appreciation panels use different types of „additional“ questions after „general“ measurement by the „Appreciation Index“.

1. Selection of programm(s) viewed „yesterday“

2. Adding appreciation index (1-10), effort to watch, amount of viewing a program

2. Respondents agree/disagree to decriptive statements, enter likes and dislikes.

Note: For expressing „likes“ and „dislikes“ (the so-called „verbatims“) it is advisable to use two separate text fields. This has already been realized by VRT/ Belgium.

Here are a few examples of general and genre-specific questions used in TV appreciation panels:

All genres except film & children

- This was a high quality programme.

- It is the kind of programme I would talk to other people about.

- Did you feel you learnt anything from watching this programme?

- This programme was thought-provoking.

- Would you recommend this programme to a friend? Please give a mark out of 10, where 10 indicates 'definitely yes', and 1 indicates 'definitely not'.

All genres except film, children & news

- This programme felt original and different from most other TV programmes I've seen.

- This programme was inspiring.

Drama (no film)

- What is your opinion of the storyline?

- What is your opinion of the characters in it?

Entertainment

- How entertaining did you think this programme was?

- And how much did you like the presenter(s), if there were any?

News

- How trustworthy did you think this programme was?

- How impartial did you think this programme was?

- What did you think about the way the programme was presented?

Discussion

The author contends that it would be more efficient to include appreciation in the people meter itself instead of collecting all sorts of „qualitative“ information by a special TV Appreciation Panel. Day-by-day „qualitative“ questionnaires are bound to produce lots of redundant data (which – by the way – become available only two days after transmission). It is much cheaper to conduct ad hoc qualitative online research to find out details about viewers‘ attitudes towards certain programs.

At the moment, there is need for a completely new generation of TAM meters because of digital signal transmission plus internet-based and mobile TV reception. „Sound matching“ and „watermarking“ are the catchwords for the new technologies. Portable recording devices such as „media watches“ and „pagers“ are among the instruments required to meet the new challenges. Such devices are already being used in Kasachstan; and a Moscow based firm is also trying to produce such cutting-edge apparatus. In the view of the author it would be a real pity if these new meters would not provide for appreciation!

The need for a TV Appreciation Panel

If, however, a TAM system does not provide (or has de-activated) the feature of appreciation in the people meters, the TV industry should certainly decide to commission a TV Appreciation Panel. Such a system has several advantages.

First of all, a TV Appreciation Panel delivers quality data in the form of numerical and verbal TV metrics. Secondly, it enables researchers to include ad hoc questions from time to time - such as testing TV personalities. The third advantage of an online operated panel is the possibility to extend quality research also to radio and internet.

When negotiating an Appreciation Panel, a few modifications should be considered. As shown below, it is suggested to use simple „checkmark“ questions for determining programm „properties“ („entertaining“, „informative“, „practical“ „suspenseful“, „well presented“, „high quality“ „none of these“ etc.) The proportion of such properties will explain why the respective Appreciation Index was given. In contrast to the evaluation of the „verbatims“ which requires additional effort (e.g. text analysis software) these data are delivered automatically by the system. Furthermore, if the same „properties“ are being asked across all program types, this will facilitate the creation of a reception quality catalogue or „genre mapping“ which in turn could be used to optimize program schedules. Sometimes progam makers are not fully aware of factual audience tastes. For example, TV programs about animals are not only considered „informative“, but also „entertaining“ and „useful“.

As shown above, three types of questions can be used to record appreciation. In spite of the possibility of „straight-lining“ (entering the same score in all lines), the matrix question with radio buttons is the most popular one.

Appendix 2

BBC TV Quality Measures (2011)

Appreciation Index (AI)

The AI (or Appreciation Index) is the main currency inside the BBC for measuring how much the audience appreciates and enjoys BBC programmes. The Appreciation scores are collected via the 'Pulse' panel; an online panel of 19,000 adults aged 16+ (representative of the UK) run by an independent research agency GfK. The panel enables the BBC to collect over 5,000 responses a day on TV and Radio programmes.

AI Data

The BBC now holds over five years of AI data from this panel (and has experience of this currency going back to the early days of television) which has enabled the organisation to build up robust benchmarks along with a good understanding of what makes for a good or bad score. Following an extensive study undertaken by the BBC into how people score programmes and what variables help to make up an AI score, it was found that one of the leading factors in determining an AI score was whether a programme was deemed to be "high quality". Other factors such as the amount of effort people take to view the programme or whether viewers found it entertaining were also important. The AI score is now the lead metric in assessing programme quality within the BBC. This said, although it is the lead metric it is not the only benchmark for quality, the BBC uses a number of other tools alongside AI such as whether the programmes are different from other TV programmes in the market, whether they are fresh and new or, in the case of factual programmes, whether the viewers learnt anything from it. AI scores are also judged in the context of the type of programme it is and whether it is a new or established programme - in fact, there are many factors that can influence AI scores and it is important to understand the context of these scores in order to interpret them confidently.

The main factors that affect the AI score

o Type of programme - different types of programmes achieve different levels of scores. For example a score for a sport programme on TV can sometimes be dependent on whether or not you support the team that won or lost. In addition, scores can vary according to whether the programme is new or long running, for example, new comedy typically starts with a lower average score as viewers become accustomed to the characters/sketches and this then tends to increase over time.

o Changes in the TV market - the introduction of expanded choice via digital TV, introduction of HD and the ability to watch what you want when you want via PVRs such as Sky Plus and catch-up services such as BBC iPlayer, has been helping to drive up the quality of the experience and hence, AI scores. For example, people typically give programmes that they have recorded or have watched in HD a higher score than the same programme watched live or in standard definition. This explains why BBC HD has a higher average AI score than other channels.

o Size of Audience - niche, targeted programming, even though attracting a limited number of viewers, can often score very highly, as it is actively chosen and appreciated by the viewer; programming aimed at a wider audience, attracting incidental viewers as well as those keen to see the programme, may score lower as it is scored by a wider, more varied group of people. This explains why BBC Three and Four have higher average AI scores than BBC One.

The Method:

Daily Pulse (Pulse) is a continuous BBC survey designed to assess what audiences think of the programmes they have seen on TV and listened to on the radio on a daily basis. It is an online survey, recruited and administered online by GfK and has been running since 2005. Key Pulse measures include the Appreciation Index and programme ratings for quality and distinctiveness.

The Pulse is a panel of around 20,000 people who are invited to complete a survey every day to say what they have watched and listened to and what they thought of each programme. Pulse measures a wide range of BBC and competitor stations.

The daily reporting panel is weighted for age, social grade, sex, presence of children, region and the household digital type on a daily basis to ensure it is representative of the UK as a whole.

The main Pulse measure is the AI or Appreciation Index – respondents are asked to mark out of 10 each of the programmes they watched or listened to the previous day, where 10 is the highest score and 1 the lowest. The average of all these marks out of 10 for a programme is then multiplied by 10 to give an Appreciation score (AI). For example, when aggregating scores for all programmes together for BBC TV we get an average score of 8.2, which gives us an AI of 82.

Additional measures include ratings for the quality and distinctiveness of the programmes.

Source: BBC Audience Information January – March 2011

Appendix 3

Performance Indicators for (Public Service) TV

In an article published in the German media research journal „Media Perspektiven“ 2(1994), the author suggested the combination of four dimensions of quality control and performance measurement for TV programs:

1. Rating (percentage of adult viewers)
2. Appreciation (measured in the form of an index and recorded by meter, diary, online panel, or ad-hoc research)
3. Cost per minute (preliminary internal and external expenses, indexed over the average program cost)
4. Cultural/intellectual level (from a purely commercial to a high cultural standard)

Without going into details, the idea of this system is the following: TV programs are being classified according to their audience size (rating), their perceived quality (appreciation), their cost efficiency (total cost per minute) and their fulfilment of cultural standards. The latter is based on a collective assessment by program makers. (Other authors have suggested to measure this dimension by means of an analysis of program reviews by public critics.)

Each factor is expressed by a range of 5 classes (very high, high, medium, low, very low). Here is an example:

In spite of a satisfactory rating, the program in this example has a
low overall performance due to its low appreciation, ist low esteem and ist relatively high cost per minute.

The larger the total area, the better the performance.

Appendix 4