Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Symptom clustering analysis
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Requirements
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
mcoth
Symptom clustering analysis
Commits
6f8ffe4d
Commit
6f8ffe4d
authored
4 months ago
by
mcoth
Browse files
Options
Downloads
Patches
Plain Diff
Upload New File
parent
ee497bbf
No related branches found
No related tags found
No related merge requests found
Pipeline
#38564
passed with warnings
4 months ago
Stage: test
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
Perform_pca.py
+78
-0
78 additions, 0 deletions
Perform_pca.py
with
78 additions
and
0 deletions
Perform_pca.py
0 → 100644
+
78
−
0
View file @
6f8ffe4d
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Sep 1 15:27:56 2024
@author: Maya Coulson Theodorsen (mcoth@dtu.dk)
This script performs Principal Component Analysis (PCA) on standardized data.
It includes:
- Bartlett
'
s test for sphericity to check PCA suitability.
- Kaiser-Meyer-Olkin (KMO) measure for sampling adequacy.
- PCA scree plot visualization and explained variance analysis.
- Calculation of PCA loadings and transformed data.
Functions:
- perform_pca: Performs PCA and diagnostic tests.
Returns:
- pca (PCA fitted model)
- loadings (PCA loadings for each feature and component)
- principleComponents
"""
import
pandas
as
pd
import
numpy
as
np
import
matplotlib.pyplot
as
plt
from
sklearn.decomposition
import
PCA
from
factor_analyzer.factor_analyzer
import
calculate_bartlett_sphericity
,
calculate_kmo
def
perform_pca
(
std_data
,
PCAcolumns
,
columnNames
):
# Bartlett sphericity to check if suitable for PCA
chi_square_value
,
p_value
=
calculate_bartlett_sphericity
(
std_data
)
print
(
"
Bartlett
'
s sphericity chi-square:
"
,
chi_square_value
)
print
(
f
"
p_value:
{
p_value
:
.
30
f
}
"
)
# Kaiser-Meyer-Olkin(KMO) Test for sampling accuracy (0.8-1.0 is excellent)
# Test if data is appropriate for FA/PCA
kmo_all
,
kmo_model
=
calculate_kmo
(
std_data
)
print
(
"
KMO model:
"
,
kmo_model
)
# PCA (note: variance in eigenvalues, not percentage)
pca
=
PCA
()
principleComponents
=
pd
.
DataFrame
(
pca
.
fit_transform
(
std_data
))
# Bar plot of the variances of PCA features
features
=
range
(
pca
.
n_components_
)
plt
.
subplots
(
figsize
=
(
20
,
15
))
plt
.
bar
(
features
,
pca
.
explained_variance_
)
plt
.
xticks
(
features
)
plt
.
ylabel
(
'
Eigenvalue
'
,
fontsize
=
16
)
plt
.
xlabel
(
'
PCA feature
'
,
fontsize
=
16
)
plt
.
axhline
(
y
=
1
,
linewidth
=
2
,
color
=
'
r
'
)
plt
.
title
(
'
Scree plot
'
,
fontsize
=
30
)
plt
.
show
()
# Display the actual amount of explained variance per component
print
(
'
PCA explained variance:
'
)
print
(
pca
.
explained_variance_
)
# Print the ratios of percentage explained by each feature
print
(
'
PCA explained variance ratio:
'
)
print
(
pca
.
explained_variance_ratio_
)
# Cumulative summation of the ratio explained variance of each feature
print
(
'
PCA explained variance ratio cumulative summation:
'
)
print
(
pca
.
explained_variance_ratio_
.
cumsum
())
# Loadings for each PC
loadings
=
pd
.
DataFrame
(
pca
.
components_
.
T
*
np
.
sqrt
(
pca
.
explained_variance_
))
loadings
.
columns
=
PCAcolumns
loadings
.
index
=
[
columnNames
]
return
pca
,
loadings
,
principleComponents
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment