rkn2
diff --git a/‎FactorAnalysisPersonality.ipynb‎
Lines changed: 64 additions & 34 deletions b/‎FactorAnalysisPersonality.ipynb‎
Lines changed: 64 additions & 34 deletions
@@ -5,6 +5,7 @@
     "colab": {
       "name": "FactorAnalysisPersonality.ipynb",
       "provenance": [],
+      "toc_visible": true,
       "include_colab_link": true
     },
     "kernelspec": {
@@ -24,17 +25,30 @@
       ]
     },
     {
-      "cell_type": "code",
+      "cell_type": "markdown",
       "metadata": {
-        "id": "_zn3Cg6ECVcW",
-        "colab_type": "code",
-        "colab": {}
+        "id": "qTHBjB_hPxZ1",
+        "colab_type": "text"
       },
       "source": [
-        ""
-      ],
-      "execution_count": 0,
-      "outputs": []
+        "# Background on the data"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "814cvzSKQBjL",
+        "colab_type": "text"
+      },
+      "source": [
+        "A personality test is given to a large group of people. There are 5 questions which deal with agreeableness (A1-5), 5 questions which deal with conscientiousness (C1-5), 5 questions which deal with Extraversion(E1-5), 5 questions which deal with Neuroticism (N1-5) and 5 questions which deal with Openness (O1-5). \n",
+        "\n",
+        "Those answers are directly related with one another.\n",
+        "\n",
+        "We want to see if based on people's answers, we can regroup those categories.\n",
+        "If we can, we will end up with 5 factors and each factor will be influenced by the questions which correspond to it. \n",
+        "\n"
+      ]
     },
     {
       "cell_type": "markdown",
@@ -71,7 +85,7 @@
       "metadata": {
         "id": "tP4cR7ZrSuLc",
         "colab_type": "code",
-        "outputId": "d52f563c-c0d0-43ca-a322-8f1885cc7dc8",
+        "outputId": "611e1c92-4bc6-4193-d21b-5b717b8bf307",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 207
@@ -99,16 +113,16 @@
         "gauth.credentials = GoogleCredentials.get_application_default()\n",
         "drive = GoogleDrive(gauth)"
       ],
-      "execution_count": 0,
+      "execution_count": 2,
       "outputs": [
         {
           "output_type": "stream",
           "text": [
             "Collecting factor_analyzer==0.2.3\n",
             "  Downloading https://files.pythonhosted.org/packages/79/1b/84808bbeee0f3a8753c3d8034baf0aa0013cf08957eff750f366ce83f04a/factor_analyzer-0.2.3-py2.py3-none-any.whl\n",
+            "Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from factor_analyzer==0.2.3) (1.16.5)\n",
             "Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from factor_analyzer==0.2.3) (0.24.2)\n",
             "Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from factor_analyzer==0.2.3) (1.3.1)\n",
-            "Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from factor_analyzer==0.2.3) (1.16.5)\n",
             "Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->factor_analyzer==0.2.3) (2018.9)\n",
             "Requirement already satisfied: python-dateutil>=2.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas->factor_analyzer==0.2.3) (2.5.3)\n",
             "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.5.0->pandas->factor_analyzer==0.2.3) (1.12.0)\n",
@@ -126,7 +140,7 @@
         "colab_type": "text"
       },
       "source": [
-        "Now we will load the data. This line reads in the comma separated value sheet that I made in excel.\n",
+        "Now we will load the data. This line reads in the comma separated value sheet.\n",
         "\n",
         "If you are given sensor data in excel and want to export it to csv in the future, see this link:\n",
         "https://www.ablebits.com/office-addins-blog/2014/04/24/convert-excel-csv/"
@@ -175,7 +189,7 @@
       "metadata": {
         "id": "cbE21AiDU_5L",
         "colab_type": "code",
-        "outputId": "0f51b05e-0763-4115-8322-0d1ff60199f6",
+        "outputId": "2a0c7e8c-e5f4-4a9e-f4ba-12ef316d9f9e",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 544
@@ -200,7 +214,7 @@
         "#calcualte the number of variables\n",
         "numVars = df.shape[1]-len(unnecessaryColumns)\n"
       ],
-      "execution_count": 0,
+      "execution_count": 5,
       "outputs": [
         {
           "output_type": "stream",
@@ -288,7 +302,7 @@
       "metadata": {
         "id": "aXtTewA1WHwR",
         "colab_type": "code",
-        "outputId": "e579ef8c-4963-45a7-e4a2-2041627acdf1",
+        "outputId": "3b7b6a91-717d-4ad4-e23b-2ee1316d3bb4",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 34
@@ -300,7 +314,7 @@
         "chi_square_value,p_value=calculate_bartlett_sphericity(df)\n",
         "chi_square_value, p_value"
       ],
-      "execution_count": 0,
+      "execution_count": 6,
       "outputs": [
         {
           "output_type": "execute_result",
@@ -312,7 +326,7 @@
           "metadata": {
             "tags": []
           },
-          "execution_count": 4
+          "execution_count": 6
         }
       ]
     },
@@ -351,7 +365,7 @@
       "metadata": {
         "id": "gJbu2WwCXBa0",
         "colab_type": "code",
-        "outputId": "d27d5cbe-3eb2-4dec-f19d-10d268bd818c",
+        "outputId": "17522ec2-a2c3-4a8c-cb99-97e23c603420",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 34
@@ -364,7 +378,7 @@
         "\n",
         "kmo_model"
       ],
-      "execution_count": 0,
+      "execution_count": 7,
       "outputs": [
         {
           "output_type": "execute_result",
@@ -376,7 +390,7 @@
           "metadata": {
             "tags": []
           },
-          "execution_count": 5
+          "execution_count": 7
         }
       ]
     },
@@ -425,7 +439,7 @@
       "metadata": {
         "id": "Z6zPLCYTXNgL",
         "colab_type": "code",
-        "outputId": "49e91e74-94aa-4158-8d27-7437f971fac9",
+        "outputId": "37c538c0-50ca-4b5d-b43d-177486fd110f",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 855
@@ -439,7 +453,7 @@
         "ev, v = fa.get_eigenvalues()\n",
         "ev"
       ],
-      "execution_count": 0,
+      "execution_count": 8,
       "outputs": [
         {
           "output_type": "execute_result",
@@ -608,7 +622,7 @@
           "metadata": {
             "tags": []
           },
-          "execution_count": 6
+          "execution_count": 8
         }
       ]
     },
@@ -627,7 +641,7 @@
       "metadata": {
         "id": "8EcInPmvXS4E",
         "colab_type": "code",
-        "outputId": "6050dd3b-4043-4c62-9c88-bcb005c4d5e7",
+        "outputId": "6589d02c-5353-457b-a17b-c79dd90408d2",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 295
@@ -643,7 +657,7 @@
         "plt.grid()\n",
         "plt.show()"
       ],
-      "execution_count": 0,
+      "execution_count": 9,
       "outputs": [
         {
           "output_type": "display_data",
@@ -689,7 +703,7 @@
       "metadata": {
         "id": "es4vNDC1XYdW",
         "colab_type": "code",
-        "outputId": "399e5d49-3f56-4be5-e704-5c7c6f660c6e",
+        "outputId": "c5735c87-1ad2-4027-ead4-1dfeed33b5a4",
         "colab": {
           "base_uri": "https://localhost:8080/",
           "height": 855
@@ -702,7 +716,7 @@
         "fa.analyze(df, numFactors, rotation=\"varimax\")\n",
         "fa.loadings"
       ],
-      "execution_count": 0,
+      "execution_count": 10,
       "outputs": [
         {
           "output_type": "execute_result",
@@ -979,7 +993,7 @@
           "metadata": {
             "tags": []
           },
-          "execution_count": 8
+          "execution_count": 10
         }
       ]
     },
@@ -998,11 +1012,11 @@
       "metadata": {
         "id": "jXtE5F5VaLvW",
         "colab_type": "code",
-        "outputId": "ee31f049-282c-443d-9edc-4389d8ac7cf7",
+        "outputId": "5ebab41d-9386-42c6-de90-0ed765640d62",
         "cellView": "code",
         "colab": {
           "base_uri": "https://localhost:8080/",
-          "height": 102
+          "height": 122
         }
       },
       "source": [
@@ -1015,7 +1029,7 @@
         "  contributions = [(np.round(factor[x],2),headings[x]) for x in descending if np.abs(factor[x])>factor_threshold]\n",
         "  print('Factor %d:'%(i+1),contributions)"
       ],
-      "execution_count": 0,
+      "execution_count": 13,
       "outputs": [
         {
           "output_type": "stream",
@@ -1059,17 +1073,20 @@
         "colab_type": "text"
       },
       "source": [
-        "Factor 1 is mostly influenced by the questions about extrovertedness (E); some parts of agreeableness, openness, and nueroticism, influence this, but it is mostly E.\n",
+        "Factor 1 is mostly influenced by the questions about extrovertedness (E); some parts of agreeableness, openness, and nueroticism, influence this, but it is mostly E. These were the original E questions.\n",
         "\n",
-        "Factor 2 is mostly influenced by neuroticism (N), but smoe parts of extrovertedness and conscientiousness play a role.\n",
+        "Factor 2 is mostly influenced by neuroticism (N), but smoe parts of extrovertedness and conscientiousness play a role. These were the original N questions.\n",
         "\n",
         "Factor 3 is mostly influenced by conscientiousness, but extrovertedness can play a role.\n",
+        "These were the original C questions.\n",
         "\n",
         "Factor 4 is mostly influenced by openness but extrovertedness plays a role.\n",
+        "These were the original O questions.\n",
         "\n",
         "Factor 5 is mostly influenced by agreeableness but extrovertedness plays a role.\n",
+        "These were the original A questions.\n",
         "\n",
-        "In summary, extrovertedness played a role in all of the factors so it is dominanting the system. However looking at individual factors, we can see that specific traits influence specific factors. "
+        "In summary, we were able to extract the original groupings however it was interesting to note the role that E plays in all of the questions. This could suggest that there is some bias or systematic error at play. "
       ]
     },
     {
@@ -1082,6 +1099,19 @@
         "# Computing the variance"
       ]
     },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "bxzTgYfzRvsp",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
     {
       "cell_type": "code",
       "metadata": {