Thematic Analysis: A 6-Phase Guide

6-Step Guide

Thematic analysis is one of the most widely used analytic methods in qualitative research, yet it is also one of the most frequently misunderstood and poorly executed. At its best, thematic analysis provides a flexible, rigorous framework for identifying, organizing, and interpreting patterns of meaning across a qualitative data set. At its worst, it becomes a superficial exercise in topic summarization that fails to move beyond description to genuine interpretation. This guide walks you through Braun and Clarke's reflexive thematic analysis --- the most influential and methodologically developed version of the approach --- across all six phases.

Understanding Reflexive Thematic Analysis

Virginia Braun and Victoria Clarke first published their six-phase framework in 2006, and it has since become the most cited guide to thematic analysis in the social sciences. In their more recent work, they have clarified that their approach is specifically reflexive thematic analysis, distinguishing it from coding reliability approaches (which emphasize inter-coder agreement and treat themes as domain summaries) and codebook approaches (which use a structured codebook to guide coding).

In reflexive thematic analysis, the researcher's subjectivity is a resource, not a problem to be eliminated. Themes do not "emerge" passively from the data --- the researcher actively generates them through sustained engagement with the data, informed by their theoretical commitments and research questions. This does not mean analysis is arbitrary. It means the researcher must be transparent, reflexive, and systematic in their analytic choices.

Inductive vs. Deductive Approaches

Thematic analysis can be conducted inductively or deductively. Inductive analysis is data-driven: codes and themes are generated from the content of the data without trying to fit them into a preexisting theoretical framework. Deductive analysis is theory-driven: you approach the data with specific theoretical questions or concepts and code for those specifically. Most studies fall somewhere on this continuum rather than at either extreme.

Semantic vs. Latent Themes

Semantic themes capture the explicit, surface-level meaning of the data --- what participants actually said. Latent themes go beneath the surface to identify underlying assumptions, conceptualizations, or ideologies that shape what participants say and how they say it. A semantic analysis might identify a theme like "feeling unprepared for graduate school." A latent analysis might identify the underlying theme of "meritocratic ideology as a barrier to seeking help" --- the assumption that if you earned your place, you should be able to manage without support.

Phase 1: Familiarization with the Data

Before coding begins, immerse yourself in your data. Read and re-read every transcript, listen to audio recordings, and review field notes. This is not passive reading --- it is active, analytic reading in which you begin to notice patterns, ask questions, and generate ideas.

During familiarization, keep a notebook or memo document where you record your initial observations. What topics recur? What surprises you? What contradictions do you notice? What is absent that you expected to find? These early notes will seed your coding in the next phase.

If you conducted the interviews yourself, you already have some familiarity with the data. But conducting an interview and analyzing an interview are different cognitive acts. Do not assume that your memory of the conversation is sufficient. Return to the transcripts and read them with analytic eyes.

The hardest part wasn't the coursework. I could handle the reading, the papers. The hardest part was figuring out the unwritten rules. Nobody tells you how to talk to your advisor, or that you're supposed to go to department events, or that you need to start thinking about publications in your first year. I spent the whole first year feeling like everyone else got a handbook that I never received.
Participant 7, Maria

Phase 2: Generating Initial Codes

Systematic coding begins in this phase. Work through each data item, identifying segments of data that are relevant to your research questions and assigning concise code labels to each segment. Code for as many potential themes and patterns as possible --- you can always discard codes that prove unproductive later.

Code the entire data set, not just the parts that seem obviously relevant. Interesting findings often emerge from data you did not expect to be important. Pay attention to both the common and the unusual --- deviant or contradictory data can be analytically powerful.

**Data extract:** "I remember sitting in my first seminar and the professor asked us to 'problematize the epistemological assumptions.' I had no idea what that meant. I just sat there nodding like I understood. Later I Googled every word in that sentence."

Codes:

  • Academic language barrier
  • Performative competence
  • Self-directed coping
  • Knowledge gap (epistemology)
  • First seminar experience

A single data segment can --- and usually should --- be assigned multiple codes. Coding is not about putting data into mutually exclusive boxes. It is about flagging the multiple layers of meaning within a single passage.

Phase 3: Searching for Themes

Once you have coded the entire data set, step back from individual codes and begin to consider how they might combine into broader themes. A theme is not just a topic that recurs frequently. A theme captures something important about the data in relation to your research question and represents a patterned response or meaning within the data set.

Start by listing all your codes and examining them for overlap, similarity, and connection. You might use visual strategies: write each code on a sticky note and physically sort them into clusters, or create a thematic map using diagramming software. Group related codes together and consider what each cluster is really about at a deeper level.

At this stage, you will likely generate candidate themes that are too broad, too narrow, overlapping, or conceptually unclear. That is expected. This is a generative phase --- you are brainstorming possible thematic structures, not finalizing them.

Phase 4: Reviewing Themes

This phase involves two levels of review. First, review each candidate theme against the coded data extracts that support it. Do the extracts form a coherent pattern? Does the theme accurately reflect the data it encompasses? If not, the theme may need to be reworked, split into two themes, merged with another, or discarded.

Second, review the candidate themes against the entire data set. Re-read all your data with your thematic structure in mind. Does this set of themes tell a convincing, accurate, and complete story about the data? Are there important aspects of the data that your themes miss? This is your quality-control phase --- be willing to revise substantially if the thematic structure does not fit the data.

A common mistake is generating themes that are really just code categories or topic summaries. "Challenges" is a topic, not a theme. "Navigating institutional gatekeeping through strategic self-silencing" is a theme --- it captures a specific, interpretive insight about what participants are doing and why.

Phase 5: Defining and Naming Themes

For each theme that survives the review phase, write a detailed definition. A theme definition answers several questions: What is the essence of this theme? What does it capture? What are its boundaries --- what does it include and exclude? How does it relate to the other themes in your analysis?

Each theme should have a clear, concise name that conveys its core meaning. Good theme names are informative and specific. Compare these examples:

  • Weak theme name: "Academic experiences"
  • Strong theme name: "Learning the hidden curriculum through costly mistakes"

The weak name describes a topic area. The strong name captures a specific pattern of meaning --- that participants learned implicit academic norms not through instruction but through the consequences of violating them.

Write a brief narrative for each theme, telling the "story" of what the theme captures and how it fits into the broader analytic narrative. This is preparation for the final phase.

Phase 6: Producing the Report

The final phase involves weaving together your analytic narrative, data extracts, and scholarly literature into a coherent written account. Your report should do more than summarize themes --- it should make an argument. What do your themes, taken together, tell us about the phenomenon you studied? How do they answer your research questions? What do they contribute to existing knowledge?

Each theme should be illustrated with vivid data extracts that are embedded within your analytic narrative, not simply listed as stand-alone quotes. The data extract should illustrate a specific analytic point, and your surrounding text should explain what the extract demonstrates and why it matters.

I learned more about how academia works from getting things wrong than from any orientation session. Nobody told me I was supposed to read the syllabus before the first class. Nobody told me you don't call your committee members by their first name until they invite you to. I figured it out, but it cost me. Every mistake felt like proof that I didn't belong.
Participant 12, James

This extract does not stand alone --- the researcher's interpretive commentary would contextualize it within the theme, connect it to other participants' experiences, and develop the analytic argument about hidden curriculum acquisition.

Quality Criteria and Common Mistakes

Braun and Clarke have identified several markers of good thematic analysis and common pitfalls to avoid:

Do not simply paraphrase data content without interpretation. Analysis requires that you go beyond what participants said to explore what it means.

Do not use your data collection questions as your themes. If your interview guide had five questions and you have five themes that map directly onto those questions, you have not conducted analysis --- you have organized your data by topic.

Do not treat theme frequency as the only indicator of importance. A theme present in only three participants' data can be analytically significant if it captures something conceptually important.

Do ensure each theme is distinct --- themes should not substantially overlap. If two themes are difficult to distinguish, they may need to be merged.

Do present both the pattern and the exceptions. Acknowledging data that does not fit your themes strengthens rather than weakens your analysis.

Do maintain an audit trail documenting your analytic decisions --- which codes were merged, which themes were discarded and why, how your thematic structure evolved. This supports the transparency and rigor of your analysis.

Reflexive thematic analysis is not a linear process that moves neatly from Phase 1 to Phase 6. It is recursive --- you will cycle back through earlier phases as your understanding deepens. The phases are guideposts, not a rigid sequence. Trust the process, remain reflexive about your own influence on the analysis, and prioritize depth of interpretation over breadth of coverage.

Ready to build your codebook? Use the free Subthesis Codebook Generator.

Get Started